Should Spacy move to UniversalDependencies (controversy)? #13738

ivan-kleshnin · 2025-01-30T07:10:22Z

ivan-kleshnin
Jan 30, 2025

The topic of UD was raised multiple times, e.g in #2485
But mostly as "How soon Spacy will switch to Universal Dependencies (UD)?" My question is different, hovewer.

Should Spacy transition to Universal Dependencies? 🤔

So I've been comparing Spacy graphs with CoreNLP graphs for a while... I've initially found that it's trivial to get to a master verb (to check if it's negated, its tense, etc.) from some matched token in Spacy and not so much in CoreNLP. Then I got a bad general sensation that a new approach will be harder and less performant to work with, at least for my tasks. And then I found this rabbit hole of UD vs DG (dependency grammar) – a polarising topic amonst linguists.

For non-specialists, to simplify: UD puts semantics over grammar and DG puts grammar over semantics. Imagine a Python parser favoring semantics over syntax... Sounds disturbing.

Here's an authoritative research with solid counter-arguments against UD:

The status of function words in dependency grammar: A critique of Universal Dependencies

The desire to subordinate function words to content words imposes a binary classification on all words; a given word is classified either as a function word or a content word. This is problematic, since the distinction between function and content word is not black and white. The distinction is, rather, more accurately captured in terms of a continuum, whereby prototypical function words and content words appear at opposite ends of the continuum, non-prototypical cases appearing somewhere on the continuum in-between.

Most of all, I'm concerned that this topic is casually discussed in other threads, like it's not a big deal, just a matter of some corpus refactoring 😨 I'm not a linguist, but my engineering experience is enough to see that UD is a huge breaking change. It also revisits foundations of linguistics since, I dunno, 1980 for benefits mostly focused on language translation. Undoubtedly an important topic, but not all linguistics and NLP boils down to that.

A migration to Spacy V4, shall it be UD-based, might be very hard for larger systems. I imagine a lot of graph-traversal algorithms would have to be revisited and replaced. One potential solution would be to support both approaches in parallel, but I'm not sure if the amount of work for that is tolerable.

More resources:

Universal Dependencies are hard to parse – or are they?

Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not
optimised for statistical parsing.

As an outsider, I can't speak for trends. Maybe UD is clearly winning in minds, so it's already decided in 2025.
But for me, at the moment, it doesn't look like that. It seems to be primarily pushed by Google and Stanford University.

lingdoc · 2025-05-23T02:57:53Z

lingdoc
May 23, 2025

While this is an important observation that boils down to different perspectives (regarding "what is grammar" and "whose grammar", still questions under much debate in linguistics), I think what people are actually asking for, from a practical standpoint, is a decent way of mapping from one set of (English) dependency tags to another, less language-specific set.

I say this because the UD annotations are very much oriented toward cross-linguistic applications. So while the current SpaCy dependency annotations for English are great for applications intended for English text, if you want to do any kind of localization or cross-linguistic transfer, it would be nice to know what the English annotations convert to in terms of Universal Dependencies. This is particularly important for leveraging parallel text data, and translation tasks, as you note, but could be useful for other applications as well.

So I think a clear set of guidelines for conversion from ClearNLP-style dependencies to UD-style dependencies would be quite helpful, if only as a reference and not necessarily integrated into the SpaCy code itself.

1 reply

lingdoc May 23, 2025

As an example, here is a mapping that I made for my particular use-case between SpaCy and UD. This is taken from the English parser implemented in SpaCy english_core_web_sm_3.8.0: https://github.com/explosion/spacy-models/blob/master/meta/en_core_web_sm-3.8.0.json

I've added comments that include the references from ClearNLP for particular dependencies that didn't have clear mappings in UD: https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md

You will notice that in some cases there are specific choices you have to make about the mappings, based on the ClearNLP definitions, and some things (like prepositions/adpositions and negation markers) are simply not identified in the UD annotations (see https://universaldependencies.org/u/dep/index.html). So if you're modifying this for your own use, you might need to make other (language-specific) adjustments. I've noted the places where I modified the mapping using the [DIFF] tag.

repldict = {
            "ROOT": "root",
            "acl": "acl", # clausal modifier of noun
            "acomp": "amod", # adjectival complement [DIFF]
            "advcl": "advcl", # adverbial clause modifier
            "advmod": "advmod", # adverbial modifier
            "agent": "obl:agent", # agent in passive clause
            "amod": "amod", # adjectival modifier
            "appos": "appos", # appositional modifier
            "attr": "nmod", # attribute (noun phrase that is a non-VP (verb phrase) predicate usually following a copula verb) [DIFF]
            "aux": "aux", # auxiliary
            "auxpass": "aux:pass", # passive auxiliary [DIFF]
            "case": "case", # case marker
            "cc": "cc", # coordinating conjunction
            "ccomp": "ccomp", # clausal complement
            "compound": "compound", # compound modifier
            "conj": "conj", # conjunct
            "csubj": "csubj", # clausal subject
            "csubjpass": "csubj:pass", # passive clausal subject
            "dative": "obl", # dative (a nominal or prepositional object of dative-shifting verb); (oblique?) [DIFF]
            "dep": "dep", # unclassified dependent
            "det": "det", # determiner
            "dobj": "obj", # direct object [DIFF]
            "expl": "expl", # expletive
            "intj": "discourse", # interjection [DIFF]
            "mark": "mark", # marker
            "meta": "dislocated", # meta modifier (extraneous, random info or annotation?) [DIFF]
            "neg": "advmod", # negation modifier [DIFF]
            "nmod": "nmod", # nominal modifier (`nounmod`)
            "npadvmod": "nmod", # noun phrase as adverbial modifier (`npmod`) [DIFF]
            "nsubj": "nsubj", # nominal subject
            "nsubjpass": "nsubj:pass", # passive nominal subject [DIFF]
            "nummod": "nummod", # number modifier
            "oprd": "nmod", # object predicate (a non-VP predicate in a small clause that functions like the predicate of an object) [DIFF]
            "parataxis": "parataxis", # parataxis
            "pcomp": "nmod", # complement of preposition (a noun phrase that modifies the head of a prepositional phrase, which is usually a preposition but can be a verb in a participial form such as VBG) [DIFF]
            "pobj": "nmod", # object of preposition (any dependent that is not a pobj but modifies the head of a prepositional phrase) [DIFF]
            "poss": "nmod:poss", # possession modifier [DIFF]
            "preconj": "conj", # pre-correlative conjunction (the first part of a correlative conjunction that becomes a dependent of the first conjunct in coordination) [DIFF]
            "predet": "det", # pre-determiner [DIFF]
            "prep": "det", # prepositional modifier [DIFF]
            "prt": "dep", # particle (a preposition in a phrasal verb that forms a verb-particle construction) [DIFF]
            "punct": "punct", # punctuation
            "quantmod": "nummod", # modifier of quantifier [DIFF]
            "relcl": "nmod", # relative clause modifier (either relative clause or a reduced relative clause that modifies the head of an NML|NP|WHNP) [DIFF]
            "xcomp": "xcomp" # open clausal complement
            }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Should Spacy move to UniversalDependencies (controversy)? #13738

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Should Spacy move to UniversalDependencies (controversy)? #13738

Uh oh!

Uh oh!

ivan-kleshnin Jan 30, 2025

Should Spacy transition to Universal Dependencies? 🤔

Replies: 1 comment · 1 reply

Uh oh!

lingdoc May 23, 2025

Uh oh!

Uh oh!

lingdoc May 23, 2025

ivan-kleshnin
Jan 30, 2025

Replies: 1 comment 1 reply

lingdoc
May 23, 2025