Changelog

Start

End

October 4, 2024

Multi-select filters, track status codes, benchmark latency and token usage + more

Another week at Athina: a new bundle of improvements to the platform.

your image alt text

Observe

Multi-select filters in the Dashboard, Analytics, and Compare pages
Support for tracking status codes and errors in LLM requests
Include chat history in custom prompt evals
Ability to split custom attributes into separate columns when creating a dataset

+++ many minor improvements

Develop

Support for structured outputs, JSON mode and tool calling in Dynamic Columns, Experiments and Prompt Playground.
View response times and token usage when running a Dynamic Column or Experiment
OpenAI Assistant is now supported as a new dynamic column
You can now rename prompts when configuring an experiment

+++ many minor improvements

+++

🐛 Note: Countless bugs were killed in the making of this week at Athina.

September 22, 2024

Athina is now SOC-2 Type II Compliant

🎉 Big milestone for the Athina team: we are now officially SOC-2 Type II compliant!

This is an important milestone and reflects our commitment to security and privacy for our customers.

Over the past few months, we’ve been hard at work ensuring that Athina is ready to support clients with the strictest compliance requirements.

SOC-2 Type II compliance ✅
Deployable in customer VPCs (AWS, Azure, GCP) ✅
Support for custom models across AWS Bedrock, Azure, GCP, and more ✅
Role-based access controls for advanced team management ✅

More to come…

September 16, 2024

Annotation Mode: A powerful UI for human evaluation and labeling

Automatic evaluation is necessary, but not sufficient for a high-quality AI.
Automated evaluation now allows teams to move much faster, but it's important to note that it can't fully replace human judgment.

Human annotation is still essential to validate responses and ensure they work well in real-world scenarios.

What we built

Annotation mode is a flexible UI within Athina IDE that lets your team collaborate to annotate datasets rapidly.

your image alt text

Support for Multiple Annotators: Multiple annotators can add scores and labels to the same dataset. You can view aggregate scores, or the scores by an individual annotator.
Flexible Annotation Views: You can configure an annotation view with exactly the fields you need, in the format you need them.
Side-by-side viewing option: We added support for a side-by-side view mode, so annotators can easily compare 2 responses.
Flexible Scoring: Annotation mode supports both categorical and numeric scoring of LLM responses to provide more flexibility in how responses are evaluated.
Freeform Comments: Annotators can leave detailed comments alongside numeric or categorical scores.
Response Editing: Annotators can edit LLM responses directly in annotation mode, making it easier to refine outputs in a dataset.
Multiple Viewing Modes: You can view the annotation scores and comments in two ways:
- Spreadsheet UI: A table-like view for easy navigation of annotations.
- Metrics View: A visual representation of the scores and labels.

We're super excited about this because teams can now manage all their datasets, evaluations, and annotations in a centralized place, streamlining the workflow and improving collaboration.

September 13, 2024

🍱 Added support for custom models

Athina now supports:

OpenAI's latest models (o1-preview and o1-mini) 🍓
Custom models hosted on Google Vertex, AWS Bedrock, Azure, Together AI, Hugging Face or even your own custom endpoints.

So users can now run prompts + evaluations using any model on any provider within their own environment.

This is a must-have for teams that need to keep their models and data within their own environments for privacy and security reasons.

Enterprise customers have been asking for more flexibility and control, and we’re shipping.

This is just the start — more coming soon! 🚀

Athina AI