-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-27413: add --no-ensure-ascii argument to json.tool #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here a related issue from the issue tracker: Add an option to json.tool to bypass non-ASCII characters. This is my first time contributing to Python, so let me know if there's anything I need to do related to the issue tracker. |
Lib/json/tool.py
Outdated
@@ -25,24 +37,36 @@ def main(): | |||
help='a JSON file to be validated or pretty-printed') | |||
parser.add_argument('outfile', nargs='?', type=argparse.FileType('w'), | |||
help='write the output of infile to outfile') | |||
parser.add_argument('--no_escape', action='store_true', default=False, | |||
help='Do not set ensure_ascii to escape non-ASCII characters') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option name should be --ensure_ascii
, (and default should be True).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@methane are you suggesting:
parser.add_argument('--ensure_ascii', action='store_false', default=True)
# And then when calling `json.dump`:
ensure_ascii=options.ensure_ascii
The problem here is that it's counterintuitive for --ensure_ascii
to result in ensure_ascii=False
. In https://bugs.python.org/issue27413, the discussion settled on --no-ensure-ascii
, which I'm happy to switch to:
parser.add_argument('--no-ensure-ascii', action='store_true', default=False)
# And then when calling `json.dump`:
ensure_ascii=not options.no_ensure_ascii
Just let me know what's preferred.
Lib/json/tool.py
Outdated
if indent == 'None': | ||
return None | ||
if indent == r'\t': | ||
return '\t' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like such special casing.
I think json.tool
should be minimal to expose json.dumps
as CLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@methane can you think of any other way to expose setting indent
? The issue is that indent
can take an int, string, or None. Command line arguments are not ideal for type flexibility. Furthermore, the most common string is a tab, which is difficult to pass via the command line without such special casing.
Another option would be to pass options.indent
through ast.literal_eval
. Therefore, users could specify any python value, but the command line usage would be more cumbersome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this either. You can use external commands like unexpand
for converting spaces to tabs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-referencing bpo-29636.
@serhiy-storchaka so do you think it's best to simplify so you must either pass --indent
an int or None
. Anything that cannot be converted to an int besides None
would throw an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pass an integer with --indent
. Add --no-indent
for None
.
Sorry, please file an issue to bugs.python.org. |
@methane no problem. I'll open an issue on bugs.python.org. Give me a couple days... currently traveling. |
Lib/json/tool.py
Outdated
if indent == 'None': | ||
return None | ||
if indent == r'\t': | ||
return '\t' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this either. You can use external commands like unexpand
for converting spaces to tabs.
Lib/test/test_json/test_tool.py
Outdated
rc, out, err = assert_python_ok('-m', 'json.tool', '--sort-keys', infile) | ||
self.assertEqual(rc, 0) | ||
self.assertEqual(out.splitlines(), | ||
self.expect_without_sort_keys.encode().splitlines()) | ||
self.assertEqual(err, b'') | ||
|
||
def test_no_ascii_flag(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_no_ensure_ascii_flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a3e2b23.
Lib/test/test_json/test_tool.py
Outdated
infile = support.TESTFN | ||
with open(infile, "w") as fp: | ||
self.addCleanup(os.remove, infile) | ||
fp.write(self.data) | ||
fp.write(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be better just to add non-ascii data to self.data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a3e2b23.
Lib/test/test_json/test_tool.py
Outdated
} | ||
''') | ||
infile = self._create_infile(data) | ||
rc, out, err = assert_python_ok('-m', 'json.tool', '--no-ensure-ascii', infile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I afraid this fails on non-UTF-8 locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still fail with Python 3.7? Because in https://docs.python.org/3.7/library/json.html there isn't written anymore that --ensure-ascii
can cause problems, while you can still find it in e.g. https://docs.python.org/2.7/library/json.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested but I assume so. The issue that Python uses the system encoding which cannot represent certain unicode characters. I could imagine a few possible behaviors:
- json.tool fails when trying to output a character that cannot be represented by the output encoding
- json.tool always outputs utf-8
- json.tool attempts to output using the system's encoding. In the case of failure, the exception is handled, the procedure is repeated with
--no-ensure-ascii
disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem -- output characters on the locale that doesn't support them.
Lib/json/tool.py
Outdated
if indent == 'None': | ||
return None | ||
if indent == r'\t': | ||
return '\t' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pass an integer with --indent
. Add --no-indent
for None
.
Lib/test/test_json/test_tool.py
Outdated
@@ -11,7 +11,7 @@ class TestTool(unittest.TestCase): | |||
data = """ | |||
|
|||
[["blorpie"],[ "whoops" ] , [ | |||
],\t"d-shtaeou",\r"d-nthiouh", | |||
],\t"d-shtaeou",\r"🐍 and δ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to avoid using non-ascii characters in sources. Use escaped sequences. Non-ascii characters are not always displayed correctly (for example I see just a rectangle instead of the first character) and it is not clear from which range they are. Use characters from three different ranges: U+0080-U+00FF, U+0100-U+FFFF and U+1000-U+10FFFF. They are encoded differently in Python and JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay in d699070 I use the following unicode characters, which should cover three ranges:
- δ U+03B4
\u03B4
- 🐍 U+1F40D
\N{snake}
- 𝀷 U+1D037
\U0001D037
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed something in the range U+0080-U+00FF. In Python string literals they are encoded with \x
, but JSON always use \u
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay will add § U+00A7 \xa7
http://www.fileformat.info/info/unicode/char/00a7/index.htm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference (in python 3.6):
>>> print("\xA7 \N{snake} \u03B4 and \U0001D037")
§ 🐍 δ and 𝀷
>>> json.dumps("\xA7 \N{snake} \u03B4 and \U0001D037")
'"\\u00a7 \\ud83d\\udc0d \\u03b4 and \\ud834\\udc37"'
Lib/json/tool.py
Outdated
help='Do not set ensure_ascii to escape non-ASCII characters') | ||
group = parser.add_mutually_exclusive_group() | ||
group.add_argument('--indent', default=4, type=int, | ||
help='Indent level for pretty-printing.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 66a173a.
Lib/json/tool.py
Outdated
group = parser.add_mutually_exclusive_group() | ||
group.add_argument('--indent', default=4, type=int, | ||
help='Indent level for pretty-printing.') | ||
group.add_argument('--no-indent', action='store_true', default=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use action='store_const', dest='indent', const=None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea. Implemented in 0f38c18.
As an aside, argparse
in Python 3.6 seems to have a bug where mutual exclusivity is not checked if you don't change the default value. So for example:
# error: argument --no-indent: not allowed with argument --indent
echo "[1,2,3]" | python Lib/json/tool.py --indent=5 --no-indent
# No error, compact mode
echo "[1,2,3]" | python Lib/json/tool.py --indent=4 --no-indent
# No error, indented
echo "[1,2,3]" | python Lib/json/tool.py --no-indent --indent=4
Something to keep in mind... assuming this will be fixed.
parser.add_argument('--no-ensure-ascii', action='store_true', default=False, | ||
help='Do not set ensure_ascii to escape non-ASCII characters') | ||
group = parser.add_mutually_exclusive_group() | ||
group.add_argument('--indent', default=4, type=int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed tests for --indent
and --no-indent
.
e8a7cee
to
da24e13
Compare
@serhiy-storchaka and @methane, thanks for the review thus far. It's been very helpful. Builds are now passing.
I'm not sure whether I've addressed this. Could you be a little more specific on the locale issues? Is this just something that affects the tests? |
$ LC_ALL=en_US.iso88591 ./python -m test.regrtest -v -m test_ensure_ascii test_json
...
======================================================================
FAIL: test_ensure_ascii (test.test_json.test_tool.TestTool)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/serhiy/py/cpython/Lib/test/test_json/test_tool.py", line 137, in test_ensure_ascii
self.assertEqual(expect.splitlines(), json_stdout.splitlines())
AssertionError: Lists differ: [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'] != [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[39 chars]b7"']
First differing element 0:
b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'
b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[38 chars]0b7"'
- [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"']
+ [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\u00ce\\u00b4 \\u00f0\\u009d'
+ b'\\u0080\\u00b7"']
---------------------------------------------------------------------- Other tests are passed by accident. Input data is encoded with UTF-8 and decoded with locale encoding in Try to pass escaped data not encodable with locale encoding. $ echo '["\u20ac"]' | LC_ALL=en_US.iso88591 ./python -m json.tool --no-ensure-ascii
Traceback (most recent call last):
File "/home/serhiy/py/cpython/Lib/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/serhiy/py/cpython/Lib/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/serhiy/py/cpython/Lib/json/tool.py", line 64, in <module>
main()
File "/home/serhiy/py/cpython/Lib/json/tool.py", line 58, in main
sort_keys=options.sort_keys,
File "/home/serhiy/py/cpython/Lib/json/__init__.py", line 180, in dump
fp.write(chunk)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256) |
By the way, |
out, err = proc.communicate(self.data.encode()) | ||
self.assertEqual(out.splitlines(), self.expect.encode().splitlines()) | ||
self.assertEqual(err, None) | ||
self.assertEqual(err, b'') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@berkerpeksag is it okay to bundle a change to an unrelated test within these PRs? Previously stderr wasn't being piped, and err
would invariably be None
even if the process were to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way,
--no-ensure-ascii
and--indent/--no-indent
are two separate enhancements so there should be two pull requests.
@berkerpeksag will keep this PR for --no-ensure-ascii
and open a new one for --indent/--no-indent
. I think it would make sense to merge the indent PR first, as it's closer to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say open a third pull request for this one (this way we can quickly merge the pipe change) No need to create a new issue on bugs.python.org.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@berkerpeksag opened at #346
@berkerpeksag @serhiy-storchaka @methane: I'm excited to continue work on this pull request and now have a better grasp of the locale-dependent encoding issues that I still have to deal with. However, I was hoping that we could get the following PRs merged (in the following order since they're dependent):
Of course review thoroughly... just that it seems like review momentum has stalled. |
See python#201 (comment) picked from commit b4e9087)
See #201 (comment) picked from commit b4e9087)
From python#201 which is being split into two pull requests. See http://bugs.python.org/issue29636
To try and help move older pull requests forward, we are going through and backfilling 'awaiting' labels on pull requests that are lacking the label. Based on the current reviews, the best we can tell in an automated fashion is that a core developer requested changes to be made to this pull request. If/when the requested changes have been made, please leave a comment that says, |
…ptor" objects, if soft-switching is enabled. The code has been in Stackless forever, but was disabled for some unknown reason.
…ptor" objects, if soft-switching is enabled. The code has been in Stackless forever, but was disabled for some unknown reason.
Any chance of this getting merged? |
Closing in favor of #17472. |
Work in progress, pending consensus on whether feature should be added
This pull request increases the utility of
python -m json.tool
by adding support for settingensure_ascii
andindent
in thejson.dump
call.Happy to also address any other arguments that need updating. Or other issues with
json.tool
.