Skip to content

Add NER model variant with required fields #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adrianeboyd
Copy link
Contributor

@adrianeboyd adrianeboyd commented Sep 21, 2024

In order to have an NER model that is simpler for internal regex/CFG representations, add an NER variant that requires all fields and does not include a default value.

In particular, this makes it possible to evaluate a version of NER for Outlines and provides an additional point of comparison for other libraries.

Summary by Sourcery

Add a new NER model variant 'ner_required_fields' to support simpler internal representations by requiring all fields without default values. Update configuration and experiment framework to accommodate this new task.

New Features:

  • Introduce a new NER model variant called 'ner_required_fields' that requires all fields and does not include default values, facilitating simpler internal regex/CFG representations.

Enhancements:

  • Update the configuration to include the 'ner_required_fields' task with specific parameters for model initialization and execution.
  • Modify the experiment framework to support the 'ner_required_fields' task, allowing it to be executed alongside existing tasks.

Copy link

sourcery-ai bot commented Sep 21, 2024

Reviewer's Guide by Sourcery

This pull request introduces a new NER (Named Entity Recognition) model variant with required fields. The changes primarily affect the configuration, base framework, data models, and main execution file. The new variant aims to simplify internal regex/CFG representations and provide an additional point of comparison for other libraries.

File-Level Changes

Change Details Files
Added a new NER task variant with required fields
  • Introduced 'ner_required_fields' task in the configuration
  • Added initialization parameters for the new task
  • Created a new data model function 'ner_required_fields_model'
config.yaml
frameworks/base.py
data_sources/data_models.py
Updated task lists and conditional logic to include the new NER variant
  • Added 'ner_required_fields' to the list of allowed tasks
  • Modified conditional statements to handle the new task type
frameworks/base.py
main.py
Adjusted metric calculation to support the new NER variant
  • Updated the condition for NER metric calculation to include 'ner_required_fields'
  • Modified the NER micro metrics calculation to support the new task
frameworks/base.py
main.py

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @adrianeboyd - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

@adrianeboyd
Copy link
Contributor Author

I have no attachment to the name NERRequiredFields, you can treat it as a placeholder.

Some example results (just 1 run instead of 10, on an RTX A5000, I'm not sure what's going on with the LMFormatEnforcer latency):

                  Reliability
Outlines                 1.00
Formatron                0.99
LMFormatEnforcer         0.98

                  Latency_p95(s)
Formatron                 16.950
Outlines                  31.033
LMFormatEnforcer          45.598

          framework  micro_precision  micro_recall  micro_f1
0          Outlines         0.656250      0.546243  0.596215
1         Formatron         0.762590      0.614493  0.680578
2  LMFormatEnforcer         0.648464      0.562130  0.602219

The LMFormatEnforcer F1 is a lot higher than for the current NER (and my initial expectation was that required fields version would lower the performance).

In order to have an NER model that is simpler for internal regex/CFG
representations, add an NER variant that requires all fields and does
not include a default value.

In particular, this makes it possible to evaluate a version of NER for
Outlines and provides an additional point of comparison for other
libraries.
@adrianeboyd adrianeboyd force-pushed the add-ner-required-fields branch from d8ae434 to e5a0db6 Compare September 22, 2024 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant