Today, Amazon SageMaker Clarify announces a new capability to support foundation model (FM) evaluations. AWS customers can compare, and select FMs based on metrics such as accuracy, robustness, bias, and toxicity, in minutes.
Today, customers have a wide range of options when choosing an FM to power their generative AI applications, and they want to compare these models quickly to find the best option for their use case. To start comparing models, customers first spend days identifying relevant benchmarks, setting up evaluation tools, and running assessments on each model. And they often receive results that are hard to decipher.
SageMaker Clarify now supports FM evaluations during model selection and throughout the model customization workflow. Customers get started with FM evaluations by leveraging curated prompt datasets that are purpose-built for common tasks, including open-ended text generation, summarization, question answering, and classiﬁcation. Customers can extend FM evaluation with their own custom prompt datasets. Human evaluations can be used for more subjective dimensions, such as creativity and style. After each evaluation, customers receive an evaluation report that summarizes the results in natural language and includes visualizations and examples. Customers can download all metrics and reports and integrate them into their SageMaker ML workﬂows.
This capability is available in select regions in preview: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Singapore), Europe (Frankfurt), Europe (Ireland). For additional details, see our documentation and pricing page.