Skip to content

Feature: improve relationship builders for better async and reduced memory utilization #2077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ahgraber
Copy link
Contributor

  • CosineSimilarityBuilder now uses a sharded/chunked similarity calculation to significantly reduce memory requirements
  • CosineSimilarityBuilder and JaccardSimilarityBuilder now leverage generate_execution_plan to support async iteration over tasks (for potential future multithreading or improved concurrency)
  • Added unit tests

ahgraber added 4 commits June 12, 2025 15:17
…methods

- Refactored the JaccardSimilarityBuilder to use async methods for finding similar embedding pairs.
- Introduced a new method `generate_execution_plan` to generate coroutines of comparisons for better tracking and potential concurrency
- Updated the `transform` method to utilize the new async functionality.
- Added comprehensive test coverage for the new features in the JaccardSimilarityBuilder.
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant