-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-46403: [C++] Add support for limiting element size when printing data #46536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @david1437 . This looks good in general, some comments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a failure in the Python test suite (see CI results). Can you update the test for the new behavior?
@pitrou this one should be good as well now, only thing is the python test seems to have a different length omitted than I expected? Also it seems cant pass the pretty print options into the python version? not sure how the C++ bindings play into that |
If you feel at ease with Cython, you could try to add the options to the Python bindings. Otherwise someone else could do it in another PR.
Well, you can try to diagnose the issue? |
Not to sure on debugging python bindings calling in to c++, but I'll give
it a try!
…On Wed, May 28, 2025, 3:56 PM Antoine Pitrou ***@***.***> wrote:
*pitrou* left a comment (apache/arrow#46536)
<#46536 (comment)>
Also it seems cant pass the pretty print options into the python version?
not sure how the C++ bindings play into that
If you feel at ease with Cython, you could try to add the options to the
Python bindings. Otherwise someone else could do it in another PR.
only thing is the python test seems to have a different length omitted
than I expected?
Well, you can try to diagnose the issue?
—
Reply to this email directly, view it on GitHub
<#46536 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGD5HPJOI65DVO5SXMAHN7D3AXFDHAVCNFSM6AAAAAB5UGEU5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMJWGY2TMMZVGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Added the python bindings and figured out the test issue with some GDB debugging. Apologies for not looking deeper into it earlier, I was a bit intimidated by the python bindings but it wasnt too bad after all. @pitrou let me know if any other adjustments are needed thanks |
python/pyarrow/array.pxi
Outdated
@@ -1357,7 +1357,8 @@ cdef class Array(_PandasConvertible): | |||
return f'{type_format}\n{self}' | |||
|
|||
def to_string(self, *, int indent=2, int top_level_indent=0, int window=10, | |||
int container_window=2, c_bool skip_new_lines=False): | |||
int container_window=2, c_bool skip_new_lines=False, | |||
int max_element_length=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, is there a reason not to name this element_size_limit
like in C++?
No particular reason other then that it sounded a bit more clear to me what
it's meaning was.
…On Mon, Jun 2, 2025, 1:42 PM Antoine Pitrou ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In python/pyarrow/array.pxi
<#46536 (comment)>:
> @@ -1357,7 +1357,8 @@ cdef class Array(_PandasConvertible):
return f'{type_format}\n{self}'
def to_string(self, *, int indent=2, int top_level_indent=0, int window=10,
- int container_window=2, c_bool skip_new_lines=False):
+ int container_window=2, c_bool skip_new_lines=False,
+ int max_element_length=100):
I'm curious, is there a reason not to name this element_size_limit like
in C++?
—
Reply to this email directly, view it on GitHub
<#46536 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGD5HPNYSXFOF3HE4HPVC2D3BRBBRAVCNFSM6AAAAAB5UGEU5WVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQOBYGIZDONRSGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hmm, but then should we use the same name in C++? |
I can change it later tonight after work
…On Mon, Jun 2, 2025, 2:10 PM Antoine Pitrou ***@***.***> wrote:
*pitrou* left a comment (apache/arrow#46536)
<#46536 (comment)>
No particular reason other then that it sounded a bit more clear to me
what it's meaning was.
Hmm, but then should we use the same name in C++?
—
Reply to this email directly, view it on GitHub
<#46536 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGD5HPPIJYGQCD657FGGOKL3BREMHAVCNFSM6AAAAAB5UGEU5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMZQGY2DINRQGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@pitrou thanks for all your help on this it should be good now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @david1437 for the update. Looking fine in general, just a few more suggestions.
cpp/src/arrow/pretty_print.cc
Outdated
Write(data, options_.element_size_limit); | ||
} | ||
|
||
void PrettyPrinter::Write(std::string_view data, const uint64_t max_chars) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't noticed this, but can this take a regular signed int, and the casting to size_t
or uint64_t
be done in the method body?
void PrettyPrinter::Write(std::string_view data, const uint64_t max_chars) { | |
void PrettyPrinter::Write(std::string_view data, const int max_chars) { |
python/pyarrow/array.pxi
Outdated
Maximum length of a single element before it is truncated, | ||
by default ``100``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Length" is ambiguous, it might be misunderstood as the array length. We could be more precise and say "Maximum number of characters" for example.
@@ -1357,7 +1357,8 @@ cdef class Array(_PandasConvertible): | |||
return f'{type_format}\n{self}' | |||
|
|||
def to_string(self, *, int indent=2, int top_level_indent=0, int window=10, | |||
int container_window=2, c_bool skip_new_lines=False): | |||
int container_window=2, c_bool skip_new_lines=False, | |||
int element_size_limit=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you perhaps add a unit test for passing a specific element_size_limit?
Rationale for this change
#46403
What changes are included in this PR?
A new PrettyPrinter option is added to limit elements to 100 characters by default.
Are these changes tested?
Yes
Are there any user-facing changes?
Yes, the default length for outputted elements when stringifying them is now different so if a user was relying on ToString of an array with large elements that result may now be changed.