What is RAGAS?
RAGAS is a framework with metrics and LLM-generated data to evaluate the performance of your Retrieval-Augmented Generation (RAG) Pipeline.
RAG Pipeline consists of two components:
Retrieval Component- Retrieves additional context from an
external database for the LLM to answer the query.
Generator Component- Generates an answer based on a
prompt augmented with the retrieved information.
RAGAS provides metrics to evaluate both the components of a RAG Pipeline.
Data Required:
To evaluate the RAG pipeline, RAGAS expects the following information:
Question: The user query that is the input of the RAG pipeline. “The input”.
Answer: The generated answer from the RAG pipeline. “The output”.
Contexts: The contexts retrieved from the external knowledge source used to answer the question.
Ground_truths: The ground truth answer to the question. This is the only human-annotated information.
RAGAS Metrics:
Context Precision: Context Precision is a metric that evaluates whether all the ground-truth relevant items present in the contexts are ranked higher or not. Ideally all the relevant chunks must appear at the top ranks. It checks the distribution of the chunks.
Context Recall: Context recall measures how well the retrieved context matches the annotated ground truth answer. It is calculated by comparing each claim in the ground truth answer to the retrieved context to see if it can be attributed to it.
Answer Relevancy: The evaluation metric, Answer Relevancy, focuses on assessing how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information and higher scores indicate better relevancy.
Faithfulness: This measures the factual consistency of the generated answer against the given context. It is calculated from answer and retrieved context. Checks for hallucinations.
Answer Similarity: This metric finds the semantic similarity between the generated answer and the ground truth answer.
There are 12 defined metrics in RAGAS.
How to set it up?
Requirements:
1. Evaluator LLM
2. An Embedding model
3. Data
Source: ragas/docs/howtos/customizations/customize_models.md at main · explodinggradients/ragas · GitHub
Evaluation Code: Define the metrics according to your use case
Use-case:
For our use case, we need-
1. To check how our retriever is performing.
2. Check how similar is LLM answer to the reference.
Source: RAGAS Evaluation.xlsx