Description
Our parsing of Zulip message content HTML is designed to be precise about what it expects, and explicit about anything it doesn't understand. This means that when we encounter some content that does something we don't have support for, our code generally knows that, rather than silently plow ahead with a wrong interpretation.
This is helpful because, among other things, it means that if we take a corpus of Zulip message content, we can run our parser on it to learn about constructs that exist in the wild that we haven't yet implemented. (This includes constructs that a current Zulip server will generate, and constructs that older servers would generate and consequently still exist in old messages.)
So we should do that, as an iterative process:
- Run on some set of messages; find things that are unimplemented; file issues; fix them, or at least those that are most common, until the number of unsupported messages is small.
- Then run on a larger set of messages, or one with more very old messages, to find more unimplemented things.
- Repeat until it's difficult to find a message with something we don't know about, and we're comfortable with whatever set of known unimplemented features remain.
Some specific likely steps in that process:
- Write a script that can fetch a bunch of messages in a loop, feed them through
parseContent
, and report anyUnimplementedNode
results (as well as any crashes, which should be fixed immediately). - Run that script on some recent public messages (i.e., messages to public streams) on chat.zulip.org.
- Run it on all public messages on chat.zulip.org.
- Run it on all public messages in realms that publicly list themselves as open communities. For example, have the script sign up a test user in each realm and log in as that test user.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status