Skip to content

GH-46403: [C++] Add support for limiting element size when printing data #46536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

david1437
Copy link
Contributor

@david1437 david1437 commented May 21, 2025

Rationale for this change

#46403

What changes are included in this PR?

A new PrettyPrinter option is added to limit elements to 100 characters by default.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, the default length for outputted elements when stringifying them is now different so if a user was relying on ToString of an array with large elements that result may now be changed.

Copy link

⚠️ GitHub issue #46403 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @david1437 . This looks good in general, some comments below.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 22, 2025
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a failure in the Python test suite (see CI results). Can you update the test for the new behavior?

@david1437
Copy link
Contributor Author

@pitrou this one should be good as well now, only thing is the python test seems to have a different length omitted than I expected? Also it seems cant pass the pretty print options into the python version? not sure how the C++ bindings play into that

@pitrou
Copy link
Member

pitrou commented May 28, 2025

Also it seems cant pass the pretty print options into the python version? not sure how the C++ bindings play into that

If you feel at ease with Cython, you could try to add the options to the Python bindings. Otherwise someone else could do it in another PR.

only thing is the python test seems to have a different length omitted than I expected?

Well, you can try to diagnose the issue?

@david1437
Copy link
Contributor Author

david1437 commented May 28, 2025 via email

@david1437
Copy link
Contributor Author

Added the python bindings and figured out the test issue with some GDB debugging. Apologies for not looking deeper into it earlier, I was a bit intimidated by the python bindings but it wasnt too bad after all. @pitrou let me know if any other adjustments are needed thanks

@david1437 david1437 requested a review from pitrou May 29, 2025 07:37
@@ -1357,7 +1357,8 @@ cdef class Array(_PandasConvertible):
return f'{type_format}\n{self}'

def to_string(self, *, int indent=2, int top_level_indent=0, int window=10,
int container_window=2, c_bool skip_new_lines=False):
int container_window=2, c_bool skip_new_lines=False,
int max_element_length=100):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, is there a reason not to name this element_size_limit like in C++?

@david1437
Copy link
Contributor Author

david1437 commented Jun 2, 2025 via email

@pitrou
Copy link
Member

pitrou commented Jun 2, 2025

No particular reason other then that it sounded a bit more clear to me what it's meaning was.

Hmm, but then should we use the same name in C++?

@pitrou pitrou changed the title GH-46403: [C++] Add support for limiting element size when printing a… GH-46403: [C++] Add support for limiting element size when printing data Jun 2, 2025
@david1437
Copy link
Contributor Author

david1437 commented Jun 2, 2025 via email

@david1437
Copy link
Contributor Author

@pitrou thanks for all your help on this it should be good now

@david1437 david1437 requested a review from pitrou June 2, 2025 18:36
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @david1437 for the update. Looking fine in general, just a few more suggestions.

Write(data, options_.element_size_limit);
}

void PrettyPrinter::Write(std::string_view data, const uint64_t max_chars) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't noticed this, but can this take a regular signed int, and the casting to size_t or uint64_t be done in the method body?

Suggested change
void PrettyPrinter::Write(std::string_view data, const uint64_t max_chars) {
void PrettyPrinter::Write(std::string_view data, const int max_chars) {

Comment on lines 1388 to 1389
Maximum length of a single element before it is truncated,
by default ``100``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Length" is ambiguous, it might be misunderstood as the array length. We could be more precise and say "Maximum number of characters" for example.

@@ -1357,7 +1357,8 @@ cdef class Array(_PandasConvertible):
return f'{type_format}\n{self}'

def to_string(self, *, int indent=2, int top_level_indent=0, int window=10,
int container_window=2, c_bool skip_new_lines=False):
int container_window=2, c_bool skip_new_lines=False,
int element_size_limit=100):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you perhaps add a unit test for passing a specific element_size_limit?

@david1437 david1437 requested a review from pitrou June 5, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants