Skip to content

ToolCallAccuracy returns score higher than 1.0 #2079

Open
@mpanasiuk-suplari

Description

@mpanasiuk-suplari
  • I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
If multiple tool calls match the reference call, the returned score can be higher than 1.0.

This can happen in various scenarios, e.g.:

  • The tool call returned error message. The agent retried the tool call again (with the same args). As a result, there are multiple tool call entries recorded in the sample.
  • The number of tool calls (of the same tool) is up to the agent to decide (can be one, two or more). The ToolCallAccuracy is used to ensure at least one call has been made (see code example below).

Ragas version: 0.2.15
Python version: 3.13.2

Code to Reproduce

    messages = [
        HumanMessage(content="What's the weather in Shanghai at different time of the day tomorrow?"),
        AIMessage(content="...", tool_calls=[
            ToolCall(name="weather_check", args={"location": "Shanghai", "time": "9:00"}),
            ToolCall(name="weather_check", args={"location": "Shanghai", "time": "13:00"}),
            ToolCall(name="weather_check", args={"location": "Shanghai", "time": "21:00"}),
        ]),
    ]

    sample = MultiTurnSample(
        user_input=messages,
        reference_tool_calls=[
            ToolCall(name="weather_check", args={"location": "Shanghai"})
        ]
    )

    scorer = ToolCallAccuracy()
    result = asyncio.run(scorer.multi_turn_ascore(sample))

    assert 0.0 <= result <= 1.0, "Score should be between 0.0 and 1.0"
    # Assertion fails because the result is 3.0

Expected behavior
ToolCallAccuracy returns 1.0 if there is a match (one or more) for the reference tool call.

According to the documentation, the score must never be above 1.0:

Score can range from 0 to 1 with 1 being the best.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodule-metricsthis is part of metrics module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions