Skip to content

Support templates in our string evaluators #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

maraisr
Copy link
Member

@maraisr maraisr commented Jun 7, 2025

Much like our web prompt tooling, we should allow the models cli to support templated variables in string evaluators.

Before

Running test case 1/1...
  ✗ FAILED
    Model Response: Goodbye! Take care and see you next time! 🌍👋
    ✗ string evaluator (score: 0.00)
      Expected to contain: '{{expected}}' ⬅️⬅️⬅️⬅️
    ✓ similarity check (score: 0.25)
      LLM evaluation matched choice: '2'

After

Running test case 1/1...
  ✗ FAILED
    Model Response: Hello there! How can I assist you today? 😊
    ✓ string evaluator (score: 1.00)
      Expected to contain: 'hello' ⬅️⬅️⬅️⬅️
    ✗ similarity check (score: 0.00)
      LLM evaluation matched choice: '1'

@Copilot Copilot AI review requested due to automatic review settings June 7, 2025 06:32
@maraisr maraisr requested a review from a team as a code owner June 7, 2025 06:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Support templated variables in string evaluators for the CLI, allowing test cases to inject dynamic values into string checks.

  • Introduces string fields in example prompts and switches evaluator contains to use {{string}}
  • Updates runStringEvaluator signature to accept a testCase map and applies templateString in each comparison
  • Adjusts tests to pass an empty testCase into runStringEvaluator

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
examples/sample_prompt.yml Added string keys under testData and updated evaluator contains
cmd/eval/eval_test.go Changed runStringEvaluator calls to include the new testCase param
cmd/eval/eval.go Refactored runStringEvaluator to template all string criteria
Comments suppressed due to low confidence (2)

examples/sample_prompt.yml:9

  • [nitpick] The key string in testData is very generic and may be confused with the evaluator’s string block. Consider renaming it to something more descriptive like templateVar or placeholder.
    string: hello

cmd/eval/eval_test.go:132

  • There are no tests covering the new templating functionality or error paths when a placeholder is missing. Consider adding tests that include {{string}} substitutions and missing-key scenarios to validate templateString behavior.
result, err := handler.runStringEvaluator("test", tt.evaluator, map[string]interface{}{}, tt.response)

@maraisr maraisr requested a review from sgoedecke June 7, 2025 06:37
Comment on lines +376 to +378
if err != nil {
return EvaluationResult{}, fmt.Errorf("failed to template message content: %w", err)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I do agree with Copilot in that this looks a little gnarly. But with the way that Go wants to handle errors we kinda need this everywhere.

That is unless, we create the helper method here — that should there be an error, we just default back to the provided string. But open to suggestions/opinions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant