explodinggradients · kwinkunks · May 16, 2025
diff --git a/docs/getstarted/evals.md b/docs/getstarted/evals.md
@@ -11,7 +11,7 @@ In this guide, you will evaluate a **text summarization pipeline**. The goal is
 
 ### Evaluating using a Non-LLM Metric
 
-Here is a simple example that uses `BleuScore` score to score summary
+Here is a simple example that uses `BleuScore` to score a summary:
 
 ```python
 from ragas import SingleTurnSample
@@ -40,9 +40,9 @@ Here we used:
 
 As you may observe, this approach has two key limitations:
 
-- **Time-Consuming Preparation:** Evaluating the application requires preparing the expected output (`reference`) for each input, which can be both time-consuming and challenging.
+- **Time-consuming preparation:** Evaluating the application requires preparing the expected output (`reference`) for each input, which can be both time-consuming and challenging.
 
-- **Inaccurate Scoring:** Even though the `response` and `reference` are similar, the output score was low. This is a known limitation of non-LLM metrics like `BleuScore`. 
+- **Inaccurate scoring:** Even though the `response` and `reference` are similar, the output score was low. This is a known limitation of non-LLM metrics like `BleuScore`. 
 
 
 !!! info
@@ -51,7 +51,7 @@ As you may observe, this approach has two key limitations:
 To address these issues, let's try an LLM-based metric.
 
 
-### Evaluating using a LLM based Metric
+### Evaluating using a LLM-based Metric
 
 
 **Choose your LLM**
@@ -62,7 +62,7 @@ choose_evaluator_llm.md
 **Evaluation**
 
 
-Here we will use [AspectCritic](../concepts/metrics/available_metrics/aspect_critic.md), which an LLM based metric that outputs pass/fail given the evaluation criteria.
+Here we will use [AspectCritic](../concepts/metrics/available_metrics/aspect_critic.md), which is an LLM-based metric that outputs pass/fail given the evaluation criteria.
 
 
 ```python
@@ -88,7 +88,7 @@ Output
 Success! Here 1 means pass and 0 means fail
 
 !!! info
-    There are many other types of metrics that are available in ragas (with and without `reference`), and you may also create your own metrics if none of those fits your case. To explore this more checkout [more on metrics](../concepts/metrics/index.md). 
+    There are many other types of metrics that are available in `ragas` (with and without `reference`), and you may also create your own metrics if none of those fits your case. To explore this more checkout [more on metrics](../concepts/metrics/index.md). 
 
 ### Evaluating on a Dataset
 
@@ -148,7 +148,7 @@ Output
 {'summary_accuracy': 0.84}
 ```
 
-This score shows that out of all the samples in our test data, only 84% of summaries passes the given evaluation criteria. Now, **It
+This score shows that out of all the samples in our test data, only 84% of summaries passes the given evaluation criteria. Now, **it
 s important to see why is this the case**.
 
 Export the sample level scores to pandas dataframe
@@ -187,4 +187,4 @@ If you want help with improving and scaling up your AI application using evals.
 
 ## Up Next
 
-- [Evaluate a simple RAG application](rag_eval.md)
+- [Evaluate a simple RAG application](rag_eval.md)