Should Spacy move to UniversalDependencies (controversy)? #13738
Replies: 1 comment 1 reply
-
While this is an important observation that boils down to different perspectives (regarding "what is grammar" and "whose grammar", still questions under much debate in linguistics), I think what people are actually asking for, from a practical standpoint, is a decent way of mapping from one set of (English) dependency tags to another, less language-specific set. I say this because the UD annotations are very much oriented toward cross-linguistic applications. So while the current SpaCy dependency annotations for English are great for applications intended for English text, if you want to do any kind of localization or cross-linguistic transfer, it would be nice to know what the English annotations convert to in terms of Universal Dependencies. This is particularly important for leveraging parallel text data, and translation tasks, as you note, but could be useful for other applications as well. So I think a clear set of guidelines for conversion from ClearNLP-style dependencies to UD-style dependencies would be quite helpful, if only as a reference and not necessarily integrated into the SpaCy code itself. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The topic of UD was raised multiple times, e.g in #2485
But mostly as "How soon Spacy will switch to Universal Dependencies (UD)?" My question is different, hovewer.
Should Spacy transition to Universal Dependencies? 🤔
So I've been comparing Spacy graphs with CoreNLP graphs for a while... I've initially found that it's trivial to get to a master verb (to check if it's negated, its tense, etc.) from some matched token in Spacy and not so much in CoreNLP. Then I got a bad general sensation that a new approach will be harder and less performant to work with, at least for my tasks. And then I found this rabbit hole of UD vs DG (dependency grammar) – a polarising topic amonst linguists.
For non-specialists, to simplify: UD puts semantics over grammar and DG puts grammar over semantics. Imagine a Python parser favoring semantics over syntax... Sounds disturbing.
Here's an authoritative research with solid counter-arguments against UD:
The status of function words in dependency grammar: A critique of Universal Dependencies
Most of all, I'm concerned that this topic is casually discussed in other threads, like it's not a big deal, just a matter of some corpus refactoring 😨 I'm not a linguist, but my engineering experience is enough to see that UD is a huge breaking change. It also revisits foundations of linguistics since, I dunno, 1980 for benefits mostly focused on language translation. Undoubtedly an important topic, but not all linguistics and NLP boils down to that.
A migration to Spacy V4, shall it be UD-based, might be very hard for larger systems. I imagine a lot of graph-traversal algorithms would have to be revisited and replaced. One potential solution would be to support both approaches in parallel, but I'm not sure if the amount of work for that is tolerable.
More resources:
Assessing Theoretical and Practical Issues of Universal Dependencies
UD are fundamentally flawed
As an outsider, I can't speak for trends. Maybe UD is clearly winning in minds, so it's already decided in 2025.
But for me, at the moment, it doesn't look like that. It seems to be primarily pushed by Google and Stanford University.
Beta Was this translation helpful? Give feedback.
All reactions