Skip to content

Add Formatron framework #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

adrianeboyd
Copy link
Contributor

@adrianeboyd adrianeboyd commented Sep 21, 2024

Summary by Sourcery

Add the FormatronFramework to the project, enabling new tasks like multilabel classification and synthetic data generation with specific model configurations. Update the configuration file to include settings for the new framework.

New Features:

  • Introduce the FormatronFramework to support tasks such as multilabel classification and synthetic data generation using the 'unsloth/llama-3-8b-Instruct-bnb-4bit' model.

Enhancements:

  • Add configuration for the FormatronFramework in the config.yaml file, specifying tasks, model details, and parameters.

Copy link

sourcery-ai bot commented Sep 21, 2024

Reviewer's Guide by Sourcery

This pull request introduces the Formatron framework, a new machine learning framework for various NLP tasks. The changes include adding configuration for the Formatron framework in the config.yaml file and implementing the FormatronFramework class in a new file.

File-Level Changes

Change Details Files
Added configuration for the Formatron framework
  • Configured tasks for multilabel classification and synthetic data generation
  • Set up parameters such as n_runs, prompt, LLM model, and other initialization arguments
  • Commented out configuration for NER task
config.yaml
Implemented FormatronFramework class
  • Created a new class that inherits from BaseFramework
  • Implemented initialization method with model loading and configuration
  • Added support for different tasks (multilabel classification and others)
  • Implemented run method for executing experiments
  • Integrated with Formatron library for formatting and processing
frameworks/formatron_framework.py

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @adrianeboyd - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider removing the commented-out code for the 'ner_required_fields' task if it's not being used. This will improve code cleanliness and readability.
  • The fallback to a regex-based approach for multilabel classification due to issues with Formatron might be worth investigating further. Consider looking into why Formatron isn't handling this case well and potentially contributing a fix upstream.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

@adrianeboyd
Copy link
Contributor Author

Some example results (1 run instead of 10, on an RTX A5000):

  • multilabel classification
           Reliability
Outlines          1.00
Formatron         0.99

           Latency_p95(s)
Outlines            1.804
Formatron          13.710
  • ner required fields
                  Reliability
Outlines                 1.00
Formatron                0.99
LMFormatEnforcer         0.98

                  Latency_p95(s)
Formatron                 16.950
Outlines                  31.033
LMFormatEnforcer          45.598

          framework  micro_precision  micro_recall  micro_f1
0          Outlines         0.656250      0.546243  0.596215
1         Formatron         0.762590      0.614493  0.680578
2  LMFormatEnforcer         0.648464      0.562130  0.602219
  • synthetic data generation
           Reliability
Formatron          1.0

           Latency_p95(s)
Formatron           4.761

           Variety
Formatron      0.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant