ValueError: [E949] Unable to align tokens for the predicted and reference docs. #12932
Replies: 1 comment 4 replies
-
Hi! Sorry to hear you've been having issues with this, let's look into this in more detail. You didn't include the full stack trace, and there are two code paths from where So the tokenizer function that you created defines how the words/characters are split into tokens, but some sort of alignment still needs to happen when you're training. In spaCy terminology, an The So what seems to happen with your So the main question is this: does your custom tokenizer actually change the underlying text (other than whitespace)? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I referred to spacy's custom tokenization doc here: https://spacy.io/usage/linguistic-features#custom-tokenizer-training
and tried using a custom-trained tokenizer in my NER project.
Here is my functions.py file:
and in my config.cfg:
I trained different tokenizers, and the BPE one worked without any hiccups but when training using the WordLevel tokenizer:
It seems that spacy is not using my custom tokenizer for prediction. Or is it an issue with an additional alignment step I have to include in the config?
I used https://huggingface.co/docs/tokenizers/quicktour to train my custom tokenizers.
Beta Was this translation helpful? Give feedback.
All reactions