Skip to main content
AutoEvals is a tool to quickly and easily evaluate AI model outputs.

Installation

npm install autoevals

RAGAS Evaluators

AnswerCorrectness

Measures answer correctness compared to ground truth using a weighted average of factuality and semantic similarity.
args
ScorerArgs

AnswerRelevancy

Scores the relevancy of the generated answer to the given question. Answers with incomplete, redundant or unnecessary information are penalized.
args
ScorerArgs

AnswerSimilarity

Scores the semantic similarity between the generated answer and ground truth.
args
ScorerArgs

ContextEntityRecall

Estimates context recall by estimating TP and FN using annotated answer and retrieved context.
args
ScorerArgs

ContextPrecision

ContextPrecision evaluator function.
args
ScorerArgs

ContextRecall

ContextRecall evaluator function.
args
ScorerArgs

ContextRelevancy

ContextRelevancy evaluator function.
args
ScorerArgs

Faithfulness

Measures factual consistency of the generated answer with the given context.
args
ScorerArgs

LLM Evaluators

Battle

Test whether an output better performs the instructions than the original (expected) value.
args
ScorerArgs

ClosedQA

Test whether an output answers the input using knowledge built into the model. You can specify criteria to further constrain the answer.
args
ScorerArgs

Factuality

Test whether an output is factual, compared to an original (expected) value.
args
ScorerArgs

Humor

Test whether an output is funny.
args
ScorerArgs

Possible

Test whether an output is a possible solution to the challenge posed in the input.
args
ScorerArgs

Security

Test whether an output is malicious.
args
ScorerArgs

Sql

Test whether a SQL query is semantically the same as a reference (output) query.
args
ScorerArgs

Summary

Test whether an output is a better summary of the input than the original (expected) value.
args
ScorerArgs

Translation

Test whether an output is as good of a translation of the input in the specified language as an expert (expected) value.
args
ScorerArgs

String Evaluators

EmbeddingSimilarity

A scorer that uses cosine similarity to compare two strings.
args
ScorerArgs

ExactMatch

A simple scorer that tests whether two values are equal. If the value is an object or array, it will be JSON-serialized and the strings compared for equality.
args
reflection

Levenshtein

A simple scorer that uses the Levenshtein distance to compare two strings.
args
reflection

LevenshteinScorer

LevenshteinScorer evaluator function.
args
reflection

JSON Evaluators

JSONDiff

A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).
args
ScorerArgs

ValidJSON

A binary scorer that evaluates the validity of JSON output, optionally validating against a JSON Schema definition (see https://json-schema.org/learn/getting-started-step-by-step#create).
args
ScorerArgs

Custom Evaluators

LLMClassifierFromSpec

LLMClassifierFromSpec evaluator function.
name
string
spec
reflection

LLMClassifierFromSpecFile

LLMClassifierFromSpecFile evaluator function.
name
string
templateName
literal | literal | literal | literal | literal | literal | literal | literal | literal

LLMClassifierFromTemplate

LLMClassifierFromTemplate evaluator function.
__namedParameters
reflection

OpenAIClassifier

OpenAIClassifier evaluator function.
args
ScorerArgs

buildClassificationTools

buildClassificationTools evaluator function.
useCoT
boolean
choiceStrings
array

List Evaluators

ListContains

A scorer that semantically evaluates the overlap between two lists of strings. It works by computing the pairwise similarity between each element of the output and the expected value, and then using Linear Sum Assignment to find the best matching pairs.
args
ScorerArgs

Moderation

Moderation

A scorer that uses OpenAI’s moderation API to determine if AI response contains ANY flagged content.
args
ScorerArgs

Numeric Evaluators

NumericDiff

A simple scorer that compares numbers by normalizing their difference.
args
reflection

Configuration

init

init evaluator function.
__namedParameters
reflection

Utilities

makePartial

makePartial evaluator function.
fn
Scorer
name
string

normalizeValue

normalizeValue evaluator function.
value
unknown
maybeObject
boolean

Source Code

For the complete TypeScript source code and additional examples, visit the autoevals GitHub repository.