Installation
RAGAS Evaluators
AnswerCorrectness
Measures answer correctness compared to ground truth using a weighted average of factuality and semantic similarity.AnswerRelevancy
Scores the relevancy of the generated answer to the given question. Answers with incomplete, redundant or unnecessary information are penalized.AnswerSimilarity
Scores the semantic similarity between the generated answer and ground truth.ContextEntityRecall
Estimates context recall by estimating TP and FN using annotated answer and retrieved context.ContextPrecision
ContextPrecision evaluator function.ContextRecall
ContextRecall evaluator function.ContextRelevancy
ContextRelevancy evaluator function.Faithfulness
Measures factual consistency of the generated answer with the given context.LLM Evaluators
Battle
Test whether an output better performs theinstructions than the original
(expected) value.
ClosedQA
Test whether an output answers theinput using knowledge built into the model.
You can specify criteria to further constrain the answer.
Factuality
Test whether an output is factual, compared to an original (expected) value.
Humor
Test whether an output is funny.Possible
Test whether an output is a possible solution to the challenge posed in the input.Security
Test whether an output is malicious.Sql
Test whether a SQL query is semantically the same as a reference (output) query.Summary
Test whether an output is a better summary of theinput than the original (expected) value.
Translation
Test whether anoutput is as good of a translation of the input in the specified language
as an expert (expected) value.
String Evaluators
EmbeddingSimilarity
A scorer that uses cosine similarity to compare two strings.ExactMatch
A simple scorer that tests whether two values are equal. If the value is an object or array, it will be JSON-serialized and the strings compared for equality.Levenshtein
A simple scorer that uses the Levenshtein distance to compare two strings.LevenshteinScorer
LevenshteinScorer evaluator function.JSON Evaluators
JSONDiff
Compare JSON objects for structural and content similarity. This scorer recursively compares JSON objects, handling:- Nested dictionaries and arrays
- String similarity using Levenshtein distance (or custom scorer)
- Numeric value comparison (or custom scorer)
- Automatic parsing of JSON strings
ValidJSON
Validate if a value is valid JSON and optionally matches a JSON Schema. This scorer checks if:- The input can be parsed as valid JSON (if it’s a string)
- The parsed JSON matches an optional JSON Schema
- Handles both string inputs and pre-parsed JSON objects
Custom Evaluators
LLMClassifierFromSpec
LLMClassifierFromSpec evaluator function.LLMClassifierFromSpecFile
LLMClassifierFromSpecFile evaluator function.templateName
literal | literal | literal | literal | literal | literal | literal | literal | literal
LLMClassifierFromTemplate
LLMClassifierFromTemplate evaluator function.OpenAIClassifier
OpenAIClassifier evaluator function.buildClassificationTools
buildClassificationTools evaluator function.List Evaluators
ListContains
A scorer that semantically evaluates the overlap between two lists of strings. It works by computing the pairwise similarity between each element of the output and the expected value, and then using Linear Sum Assignment to find the best matching pairs.Moderation
Moderation
A scorer that uses OpenAI’s moderation API to determine if AI response contains ANY flagged content.Numeric Evaluators
NumericDiff
A simple scorer that compares numbers by normalizing their difference.Other
computeThreadTemplateVars
Compute template variables from a thread for use in mustache templates. Uses lazy getters so expensive computations only run when accessed. Note:thread (and other message variables) will automatically render as
human-readable text when used in templates like {{thread}} due to the
smart escape function in renderMessages.
formatMessageArrayAsText
Format an array of LLM messages as human-readable text.getDefaultModel
Get the configured default model, or “gpt-4o” if not set.isLLMMessageArray
Check if a value is an array of LLM messages.isRoleContentMessage
Check if an item looks like an LLM message (has role and content).templateUsesThreadVariables
Check if a template string might use thread-related template variables. This is a heuristic - looks for variable names after{{ or {% syntax.