Skip to content

Add Source Code Typing Tool #15211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

Vici37
Copy link
Contributor

@Vici37 Vici37 commented Nov 21, 2024

This PR introduces a new crystal tool apply-types subcommand as described in this crystal forum post and initially implemented in this other repo as a stand alone tool. A tool that fully types a crystal file is obviously very tightly coupled with the language and compiler semantic, and so having this tool be part of the compiler itself feels like the correct direction.

That being said, I'm not 100% it's ready yet, but I do think it's ready enough at least to solicit feedback from anyone who is willing to provide it :)

OUTPUT
end

# it "parses, runs semantic, and types everything" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting out this spec for now - it runs prelude as well as try to type everything the test specific file (but that test file isn't part of this PR yet).

# logic is in `crystal/tools/formatter.cr`.

class Crystal::Command
private def typer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entrypoint for the apply-types tool

@@ -0,0 +1,526 @@
module Crystal
class SourceTyper
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the source-typer repo, a lot of these classes / structs are broken into their own files. Precedence in the compiler seems to be to put a lot of these smaller bits into a single file if they can all be more or less self contained.

@crysbot
Copy link
Collaborator

crysbot commented Nov 21, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/exploring-the-compiler/7343/21

@nobodywasishere
Copy link
Contributor

This is awesome!

@@ -39,6 +39,7 @@ class Crystal::Command
Usage: crystal tool [tool] [switches] [program file] [--] [arguments]

Tool:
apply-types add type restrictions to all untyped defs and def arguments
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name suggestions welcome! Another option I thought of is apply-restrictions as that feels more accurate, but also longer to type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bake-types may also work, apply-restrictions is also pretty good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about typify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

@crysbot
Copy link
Collaborator

crysbot commented Jan 23, 2025

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/rfc-variable-declaration-syntax-using/7623/35

Implement typer writing (inefficiently)

Implement formatter, support multiple filename inputs

Partial support for types, still need modules

Update to latest source typer changes

Add code comments

Comment out broken spec for now

Uncomment and fix final spec, add semantic / progress_tracker flags

Remove focus: true (oops)

Good ol' print debugging for windows CI failure

Reimplement def visitor def locator matching for windows

Back to print debugging

Fix and support windows drive letters for root folders
@@ -0,0 +1,580 @@
require "./spec_helper"

def run_source_typer_spec(input, expected_output,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specs are written with a before (INPUT) -> after (OUTPUT) pattern, which should hopefully showcase all of the use cases covered by the tool. All of the specs but the last one skip the prelude run for performance.

rets = {} of String => String

@warnings.each do |warning|
puts "WARNING: #{warning}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings are cases where type restrictions weren't added because it crossed a line with the language syntax somehow (i.e. empty splat calls)

# And now infer types of everything
semantic_node = program.semantic nodes, cleanup: true

# We might run semantic later in an attempt to resolve defaults, don't display those stats or progress
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per this comment, a method can be defined like:

def my_method(arg = "str")

In this case, semantic may run on the "str" (or whatever expression) in an attempt to figure out what it is. It's occasional that methods like these always get called with an argument that isn't the same type as the default, and then the compiler complains if the restriction ends up like:

def my_method(arg : Int32 = "str")

ret
end

private def def_overrides_parent_def(type) : Hash(String, String)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method scans all defs in type and figures out the locations of all methods that override / are overridden between type and its ancestor's definitions. These methods should not have their types restricted in any additional way. This method is required as there's no syntatic way to indicate one method overrides another.

end

# Generates a map of (parsed) Def#location => Signature for that Def
private def init_signatures(accepted_defs : Hash(String, Crystal::Def)) : Hash(String, Signature)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workhorse method that generates the hash of location strings => signature for the def at that location

end
end

def flatten_types(types : Array(Crystal::Type)) : Array(Crystal::Type)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to flatten unions into a single array, so dupe types can be deduplicated

end

def type_name(type : Crystal::Type) : String
type.to_s.gsub(/:Module$/, ".class").gsub("+", "")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an interesting discovery: metatypes of classes can be created using the .class, but the syntax for creating a metatype of a module (also .class) isn't the same string representation that the compiler uses. The more you know

# if there's a signature for a given def and those type restrictions are missing.
#
# All methods present are copy / paste from the original Crystal::Formatter for the given `visit` methods
class SourceTyperFormatter < Crystal::Formatter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done typing, onto formatting!

# A visitor for defs, oddly enough.
#
# Walk through the AST and capture all references to Defs that match a def_locator
class DefVisitor < Crystal::Visitor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefVisitor, a crystal visitor for.... visiting defs. Used for capturing a list of all passed through defs that match a def_locator used when running the CLI command.

@Vici37 Vici37 marked this pull request as ready for review January 24, 2025 05:51
@straight-shoota
Copy link
Member

Thanks your for this submission. And sorry for the long wait. It's a bit of a challenge to be confronted with such a big PR.

I think this tool can be a great enhancement to the developer experience. However, before going into the details of integrating it into the compiler, I think we need to learn a bit more about it. And maybe discuss some higher level design of this feature.
Perhaps this could eventually manifest in an RFC.
For this reason we usually ask to start an issue first before submitting a PR.

Your experience with the pre-existing external tool is of course a great start, thouhg. And having this PR with a working integration as a compiler tool helps as well. So I think we're pretty good already.

Still, let's take a step back from the concrete code submission and start a discussion about this feature proposal.
The code here is still useful as a prototype, of course.

I have some thoughts/questions to spark the discussion (noting here so I don't forget).

I don't expect them to be answered here. I think we should let this PR sit as is and treat it as a draft for now.

@straight-shoota straight-shoota marked this pull request as draft February 5, 2025 19:09
@Vici37
Copy link
Contributor Author

Vici37 commented Feb 9, 2025

Thanks for the response @straight-shoota ! Probably the politest "No" I've received 😁 Totally fair, too, large PRs are always a big challenge, even when you are familiar with the changes themselves.

How's the code status between this PR and https://github.com/Vici37/cr-source-typer? I suppose the code here is newer than the separate repo? Should one use the repo or this branch to build this for a test drive?

This change is definitely ahead of the cr-source-typer. Due to the direct dependency of the crystal source code, I found it easier to iterate directly in compiler, especially after I found myself reproducing / mimicking things in the compiler directly (like trying to find the CRYSTAL_PATH value during compilation). I still stand by my post here in that all changes can be rolled back into the cr-source-typer code. Probably.

What are experiences using this tool productively on a real code base?

This tool is pretty infrequently used, but very valuable when it is (or when I do). I describe the experience here, where I'll start new projects being pretty lax with types, and then find I need to do the chore of filling those back in. Interestingly, a lot of the larger projects I ran this tool on yielded no changes, so I suspect that's a semi-frequent experience of others.

#14696 is a similar use case which we might want to take into consideration as well.

Yup, that was brought up here as well :) This PR probably has 90% of what would be needed for that. Supporting that JSON output could be a flag on this tool, and I almost did that, but needed to draw the line somewhere.

Are you aware of oprypin/crystal@argtohtml...annotator~#diff-bc28f7dec1 which was used for #10575 ?

Yup! I remember looking at that, at least. The meat is this part, which had some good insights into where the typed methods are and how to crawl the typing system (similar mechanism in this PR here).

Some practical issue: How can we integrate the formatter portion without duplicating implementations from Crystal::Formatter?

More granular format writing methods in the Crystal::Formatter would have allowed me to only overwrite the method that (say) printed out the def return restriction and (say) the restriction on an arg, but that would be a larger refactoring job for sure.

I don't expect them to be answered here. I think we should let this PR sit as is and treat it as a draft for now.

Happy to disappoint on that first point. Up to you / the crystal core team on what you want to do with this PR. If requests for changes are made, I'm happy to make them. I can also try breaking this single PR up into multiple smaller ones, if that's desired.

@straight-shoota
Copy link
Member

straight-shoota commented Feb 9, 2025

I'd see it more as a "not quite yet".
My main concern is that when we integrate this officially into the compiler we make some commitment to support and maintain it. Before that, we have more freedom to experiment and maybe adjust some behaviour or the command line interface.

Taking it for a test drive and sharing experiences would help solidify the tool. Maybe we can give it a go to add type annotations in stdlib (cf. #15334) and maybe some shards?

A particular question apart from the pure technical possibility is how to deal with individual methods for which full typing may not even desired, for various reasons. That's briefly mentioned in #15334.

For example, it might be impossible to cover all possible types in a single compilation run (e.g. mutual incompatibility between shards), so the compiler would only ever know about a subsection of possibilities.
Sometimes inferred types can be quite complex and it might be preferable to keep that complexity out of the source code for better readability. Especially when the details are not very relevant to a particular method.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Feb 9, 2025

We're also talking about extracting tools from the main executable. Maybe we can consider this tool as such (external) for its development phase, then maybe eventually distribute it with the compiler when it gets stabilized? Just like we distribute shards for example.

@Vici37
Copy link
Contributor Author

Vici37 commented Feb 10, 2025

My main concern is that when we integrate this officially into the compiler we make some commitment to support and maintain it.

100%, I want to be very cognizant of that for sure. You absolutely should not accept this PR until you're fully comfortable with what this PR is introducing (you're already aware of this, but stating it so you know I'm aware of it it too).

Maybe we can give it a go to add type annotations in stdlib (cf. #15334) and maybe some shards?

That is an excellent test! I had been taking it for a whirl through some other largish repos (lucky, procodile, my own set of shards) and found very little changes being suggested there, due to these larger repos already having typed everything. For crystal itself, I had been focusing primarily on the prelude portion of the code rather than the stdlib. This brings me to your second point...

A particular question apart from the pure technical possibility is how to deal with individual methods for which full typing may not even desired

Absolutely, I've since discovered many of these types of cases. For ongoing lack-of-typing, a linter is better suited for it (it can't tell you what it should be, only that it's missing) and could be configured to ignore the portions where that makes sense. This tool provides value so a dev can see exactly what the compiler thinks the types should be, and can choose to accept or deny those at their discretion (hence my own comment about using this tool infrequently).

We're also talking about extracting tools from the main executable. Maybe we can consider this tool as such (external) for its development phase, then maybe eventually distribute it with the compiler when it gets stabilized?

That would be amazing! Decoupling the compiler executable from the suite of tools it also enables / ships with should have a good side effect of decoupling portions of the compiler source code itself. At the very least, if going down this path, I'd nominate ameba as an additional tool distributed with the compiler before this one 🙂 Though maybe with a bit more relaxed default configuration for some of the rules 😅

As an aside, for the longevity of a standalone cr-source-typer tool itself, I do think two changes would need to be made to the compiler itself:

  • Decoupling llvm requirement from the semantic layer (the issue I mentioned here is still present, I got around it by installing LLVM 18 locally, but that wouldn't be guaranteed installed for all users of this tool)
  • Having a crystal native way to get the equivalent of crystal env properties in the program itself. I've had to resort to running crystal env and then parsing the output as a string here. Would be better of there were some constants / pseudo constants that exposed these directly instead. This was needed to discover where the prelude source is for the semantic pass. On the other hand, if these are already present, my research failed me and I'd greatly appreciate those being pointed out to me.

@crysbot
Copy link
Collaborator

crysbot commented Feb 10, 2025

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/exploring-the-compiler/7343/26

@ysbaddaden
Copy link
Contributor

Please open an issue about decoupling LLVM from the semantic passes. I have a pure crystal DataLayout / ABI that should help (but it's not limited to that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants