bpo-27413: add --no-ensure-ascii argument to json.tool #201

dhimmel · 2017-02-20T17:28:43Z

Work in progress, pending consensus on whether feature should be added

This pull request increases the utility of python -m json.tool by adding support for setting ensure_ascii and indent in the json.dump call.

Happy to also address any other arguments that need updating. Or other issues with json.tool.

dhimmel · 2017-02-20T20:05:18Z

Here a related issue from the issue tracker: Add an option to json.tool to bypass non-ASCII characters. This is my first time contributing to Python, so let me know if there's anything I need to do related to the issue tracker.

methane · 2017-02-21T07:02:46Z

Lib/json/tool.py

@@ -25,24 +37,36 @@ def main():
                        help='a JSON file to be validated or pretty-printed')
    parser.add_argument('outfile', nargs='?', type=argparse.FileType('w'),
                        help='write the output of infile to outfile')
+    parser.add_argument('--no_escape', action='store_true', default=False,
+                        help='Do not set ensure_ascii to escape non-ASCII characters')


This option name should be --ensure_ascii, (and default should be True).

@methane are you suggesting:

parser.add_argument('--ensure_ascii', action='store_false', default=True) # And then when calling `json.dump`: ensure_ascii=options.ensure_ascii

The problem here is that it's counterintuitive for --ensure_ascii to result in ensure_ascii=False. In https://bugs.python.org/issue27413, the discussion settled on --no-ensure-ascii, which I'm happy to switch to:

parser.add_argument('--no-ensure-ascii', action='store_true', default=False) # And then when calling `json.dump`: ensure_ascii=not options.no_ensure_ascii

Just let me know what's preferred.

methane · 2017-02-21T07:43:39Z

Lib/json/tool.py

+    if indent == 'None':
+        return None
+    if indent == r'\t':
+        return '\t'


I don't like such special casing.
I think json.tool should be minimal to expose json.dumps as CLI.

@methane can you think of any other way to expose setting indent? The issue is that indent can take an int, string, or None. Command line arguments are not ideal for type flexibility. Furthermore, the most common string is a tab, which is difficult to pass via the command line without such special casing.

Another option would be to pass options.indent through ast.literal_eval. Therefore, users could specify any python value, but the command line usage would be more cumbersome.

I don't like this either. You can use external commands like unexpand for converting spaces to tabs.

Cross-referencing bpo-29636.

@serhiy-storchaka so do you think it's best to simplify so you must either pass --indent an int or None. Anything that cannot be converted to an int besides None would throw an error?

Just pass an integer with --indent. Add --no-indent for None.

methane · 2017-02-21T14:54:56Z

Sorry, please file an issue to bugs.python.org.
We use github only for code review, not for design discussion.

dhimmel · 2017-02-21T18:14:24Z

We use github only for code review, not for design discussion.

@methane no problem. I'll open an issue on bugs.python.org. Give me a couple days... currently traveling.

dhimmel · 2017-02-23T20:10:41Z

please file an issue to bugs.python.org.

@methane I opened bpo-29636 for design discussion of --indent. In addition, this PR would close the existing issue bpo-27413 regarding the --no-ensure-ascii argument.

serhiy-storchaka · 2017-02-23T20:15:08Z

Lib/json/tool.py

+    if indent == 'None':
+        return None
+    if indent == r'\t':
+        return '\t'


I don't like this either. You can use external commands like unexpand for converting spaces to tabs.

serhiy-storchaka · 2017-02-23T20:18:55Z

Lib/test/test_json/test_tool.py

        rc, out, err = assert_python_ok('-m', 'json.tool', '--sort-keys', infile)
        self.assertEqual(rc, 0)
        self.assertEqual(out.splitlines(),
                         self.expect_without_sort_keys.encode().splitlines())
        self.assertEqual(err, b'')
+
+    def test_no_ascii_flag(self):


test_no_ensure_ascii_flag

Done in a3e2b23.

serhiy-storchaka · 2017-02-23T20:26:07Z

Lib/test/test_json/test_tool.py

        infile = support.TESTFN
        with open(infile, "w") as fp:
            self.addCleanup(os.remove, infile)
-            fp.write(self.data)
+            fp.write(text)


Perhaps it would be better just to add non-ascii data to self.data.

Done in a3e2b23.

serhiy-storchaka · 2017-02-23T20:27:19Z

Lib/test/test_json/test_tool.py

+        }
+        ''')
+        infile = self._create_infile(data)
+        rc, out, err = assert_python_ok('-m', 'json.tool', '--no-ensure-ascii', infile)


I afraid this fails on non-UTF-8 locale.

Does this still fail with Python 3.7? Because in https://docs.python.org/3.7/library/json.html there isn't written anymore that --ensure-ascii can cause problems, while you can still find it in e.g. https://docs.python.org/2.7/library/json.html

Haven't tested but I assume so. The issue that Python uses the system encoding which cannot represent certain unicode characters. I could imagine a few possible behaviors:

json.tool fails when trying to output a character that cannot be represented by the output encoding

json.tool always outputs utf-8

json.tool attempts to output using the system's encoding. In the case of failure, the exception is handled, the procedure is repeated with --no-ensure-ascii disabled.

Closes https://bugs.python.org/issue27413

serhiy-storchaka

The main problem -- output characters on the locale that doesn't support them.

serhiy-storchaka · 2017-02-23T22:52:17Z

Lib/json/tool.py

+    if indent == 'None':
+        return None
+    if indent == r'\t':
+        return '\t'


Just pass an integer with --indent. Add --no-indent for None.

serhiy-storchaka · 2017-02-23T22:57:13Z

Lib/test/test_json/test_tool.py

@@ -11,7 +11,7 @@ class TestTool(unittest.TestCase):
    data = """

        [["blorpie"],[ "whoops" ] , [
-                                 ],\t"d-shtaeou",\r"d-nthiouh",
+                                 ],\t"d-shtaeou",\r"🐍 and δ",


It would be better to avoid using non-ascii characters in sources. Use escaped sequences. Non-ascii characters are not always displayed correctly (for example I see just a rectangle instead of the first character) and it is not clear from which range they are. Use characters from three different ranges: U+0080-U+00FF, U+0100-U+FFFF and U+1000-U+10FFFF. They are encoded differently in Python and JSON.

Okay in d699070 I use the following unicode characters, which should cover three ranges:

δ U+03B4 \u03B4

🐍 U+1F40D \N{snake}

𝀷 U+1D037 \U0001D037

Needed something in the range U+0080-U+00FF. In Python string literals they are encoded with \x, but JSON always use \u.

Okay will add § U+00A7 \xa7 http://www.fileformat.info/info/unicode/char/00a7/index.htm

For reference (in python 3.6):

>>> print("\xA7 \N{snake} \u03B4 and \U0001D037") § 🐍 δ and 𝀷 >>> json.dumps("\xA7 \N{snake} \u03B4 and \U0001D037") '"\\u00a7 \\ud83d\\udc0d \\u03b4 and \\ud834\\udc37"'

serhiy-storchaka · 2017-02-24T07:31:39Z

Lib/json/tool.py

+                        help='Do not set ensure_ascii to escape non-ASCII characters')
+    group = parser.add_mutually_exclusive_group()
+    group.add_argument('--indent', default=4, type=int,
+                       help='Indent level for pretty-printing.')


Document default value.

Done in 66a173a.

serhiy-storchaka · 2017-02-24T07:33:25Z

Lib/json/tool.py

+    group = parser.add_mutually_exclusive_group()
+    group.add_argument('--indent', default=4, type=int,
+                       help='Indent level for pretty-printing.')
+    group.add_argument('--no-indent', action='store_true', default=False,


Maybe use action='store_const', dest='indent', const=None?

Nice idea. Implemented in 0f38c18.

As an aside, argparse in Python 3.6 seems to have a bug where mutual exclusivity is not checked if you don't change the default value. So for example:

# error: argument --no-indent: not allowed with argument --indent echo "[1,2,3]" | python Lib/json/tool.py --indent=5 --no-indent # No error, compact mode echo "[1,2,3]" | python Lib/json/tool.py --indent=4 --no-indent # No error, indented echo "[1,2,3]" | python Lib/json/tool.py --no-indent --indent=4

Something to keep in mind... assuming this will be fixed.

serhiy-storchaka · 2017-02-24T07:34:18Z

Lib/json/tool.py

+    parser.add_argument('--no-ensure-ascii', action='store_true', default=False,
+                        help='Do not set ensure_ascii to escape non-ASCII characters')
+    group = parser.add_mutually_exclusive_group()
+    group.add_argument('--indent', default=4, type=int,


Needed tests for --indent and --no-indent.

dhimmel · 2017-02-27T00:12:39Z

@serhiy-storchaka and @methane, thanks for the review thus far. It's been very helpful.

Builds are now passing.

The main problem -- output characters on the locale that doesn't support them.

I'm not sure whether I've addressed this. Could you be a little more specific on the locale issues? Is this just something that affects the tests?

serhiy-storchaka · 2017-02-27T08:04:50Z

test_ensure_ascii is failed with non-UTF-8 encoding.

$ LC_ALL=en_US.iso88591 ./python -m test.regrtest -v -m test_ensure_ascii test_json
...
======================================================================
FAIL: test_ensure_ascii (test.test_json.test_tool.TestTool)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/test/test_json/test_tool.py", line 137, in test_ensure_ascii
    self.assertEqual(expect.splitlines(), json_stdout.splitlines())
AssertionError: Lists differ: [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'] != [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[39 chars]b7"']

First differing element 0:
b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'
b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[38 chars]0b7"'

- [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"']
+ [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\u00ce\\u00b4 \\u00f0\\u009d'
+  b'\\u0080\\u00b7"']

----------------------------------------------------------------------

Other tests are passed by accident. Input data is encoded with UTF-8 and decoded with locale encoding in json.tool. Output data is encoded with locale encoding in json.tool and you get back the initial UTF-8 representation.

Try to pass escaped data not encodable with locale encoding.

$ echo '["\u20ac"]' | LC_ALL=en_US.iso88591 ./python -m json.tool --no-ensure-ascii
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/serhiy/py/cpython/Lib/json/tool.py", line 64, in <module>
    main()
  File "/home/serhiy/py/cpython/Lib/json/tool.py", line 58, in main
    sort_keys=options.sort_keys,
  File "/home/serhiy/py/cpython/Lib/json/__init__.py", line 180, in dump
    fp.write(chunk)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)

berkerpeksag · 2017-02-27T13:53:44Z

By the way, --no-ensure-ascii and --indent/--no-indent are two separate enhancements so there should be two pull requests.

dhimmel · 2017-02-27T15:17:02Z

Lib/test/test_json/test_tool.py

            out, err = proc.communicate(self.data.encode())
        self.assertEqual(out.splitlines(), self.expect.encode().splitlines())
-        self.assertEqual(err, None)
+        self.assertEqual(err, b'')


@berkerpeksag is it okay to bundle a change to an unrelated test within these PRs? Previously stderr wasn't being piped, and err would invariably be None even if the process were to fail.

By the way, --no-ensure-ascii and --indent/--no-indent are two separate enhancements so there should be two pull requests.

@berkerpeksag will keep this PR for --no-ensure-ascii and open a new one for --indent/--no-indent. I think it would make sense to merge the indent PR first, as it's closer to complete.

I'd say open a third pull request for this one (this way we can quickly merge the pipe change) No need to create a new issue on bugs.python.org.

@berkerpeksag opened at #346

See python#201 (comment).

dhimmel · 2017-03-10T00:56:15Z

@berkerpeksag @serhiy-storchaka @methane: I'm excited to continue work on this pull request and now have a better grasp of the locale-dependent encoding issues that I still have to deal with. However, I was hoping that we could get the following PRs merged (in the following order since they're dependent):

Fix stderr bug in json.tool test #346: fixing stderr bug in existing json.tool test
bpo-29636: Add --indent / --no-indent arguments to json.tool #345: addition of indentation options to json.tool which affects the structure of the json.tool code.
then it will be easier to work address the outstanding issues with this PR.

Of course review thoroughly... just that it seems like review momentum has stalled.

See #201 (comment).

See python#201 (comment) picked from commit b4e9087)

See #201 (comment) picked from commit b4e9087)

From python#201 which is being split into two pull requests. See http://bugs.python.org/issue29636

brettcannon · 2018-02-21T01:51:43Z

To try and help move older pull requests forward, we are going through and backfilling 'awaiting' labels on pull requests that are lacking the label. Based on the current reviews, the best we can tell in an automated fashion is that a core developer requested changes to be made to this pull request.

If/when the requested changes have been made, please leave a comment that says, I have made the requested changes; please review again. That will trigger a bot to flag this pull request as ready for a follow-up review.

…ptor" objects, if soft-switching is enabled. The code has been in Stackless forever, but was disabled for some unknown reason.

txtsd · 2019-12-01T01:48:21Z

Any chance of this getting merged?

dhimmel · 2019-12-05T14:36:05Z

Closing in favor of #17472.

dhimmel added 2 commits February 20, 2017 12:06

Add indent support for json.tool

a47f396

Add support for disabling ensure_ascii in json.tool

fb5d984

the-knights-who-say-ni added the CLA signed label Feb 20, 2017

Add test for no_escape arg of json.tool

4fb3bb3

dhimmel force-pushed the json_tool branch from 307f94f to 4fb3bb3 Compare February 20, 2017 18:25

methane requested changes Feb 21, 2017

View reviewed changes

serhiy-storchaka reviewed Feb 23, 2017

View reviewed changes

bpo-27413: --no-ensure-ascii argument in json.tool

29d01fe

Closes https://bugs.python.org/issue27413

dhimmel force-pushed the json_tool branch from 4d03b9d to 29d01fe Compare February 23, 2017 20:29

Add non-ascii characters to TestTool.data

a3e2b23

serhiy-storchaka reviewed Feb 23, 2017

View reviewed changes

dhimmel added 2 commits February 23, 2017 18:18

Add --no-indent argument

97152e9

Escape non-ascii characters in json.tool tests

d699070

serhiy-storchaka reviewed Feb 24, 2017

View reviewed changes

dhimmel added 5 commits February 26, 2017 15:37

Switch --no-indent to use store_const

0f38c18

Improve argparse help messages

66a173a

Add unicode character in the U+0080-U+00FF range

dd25cf3

Format unicode output for JSON

3413f4f

Test indent

52bafb0

dhimmel force-pushed the json_tool branch 2 times, most recently from e8a7cee to da24e13 Compare February 26, 2017 23:16

Move unicode tests outside of self.data

1306518

dhimmel force-pushed the json_tool branch from da24e13 to 1306518 Compare February 26, 2017 23:53

dhimmel commented Feb 27, 2017

View reviewed changes

dhimmel mentioned this pull request Feb 27, 2017

bpo-29636: Add --indent / --no-indent arguments to json.tool #345

Merged

dhimmel changed the title ~~Increase the utility of json.tool~~ bpo-27413: add --no-ensure-ascii argument to json.tool Feb 27, 2017

dhimmel added a commit to dhimmel/cpython that referenced this pull request Feb 27, 2017

Fix stderr bug in json.tool test

93ea550

See python#201 (comment).

dhimmel mentioned this pull request Feb 27, 2017

Fix stderr bug in json.tool test #346

Merged

berkerpeksag pushed a commit that referenced this pull request Mar 15, 2017

Fix stderr bug in json.tool test (#346)

b4e9087

See #201 (comment).

berkerpeksag pushed a commit to berkerpeksag/cpython that referenced this pull request Mar 15, 2017

Fix stderr bug in json.tool test (python#346)

072655f

See python#201 (comment) picked from commit b4e9087)

berkerpeksag mentioned this pull request Mar 15, 2017

[3.6] Fix stderr bug in json.tool test (#346) #676

Merged

berkerpeksag added a commit that referenced this pull request Mar 16, 2017

Fix stderr bug in json.tool test (#346) (#676)

0dfd18a

See #201 (comment) picked from commit b4e9087)

dhimmel mentioned this pull request Mar 16, 2017

Allow setting indent width or character in json.tool #426

Closed

dhimmel mentioned this pull request Jul 15, 2017

bpo-30971: Improve code readability of json.tool #2720

Closed

dhimmel added a commit to dhimmel/cpython that referenced this pull request Jul 20, 2017

Add --indent / --no-indent arguments to json.tool

3c67514

From python#201 which is being split into two pull requests. See http://bugs.python.org/issue29636

brettcannon added the awaiting changes label Feb 21, 2018

dhimmel mentioned this pull request Oct 15, 2018

bpo-29636: improve CLI of json.tool #9765

Closed

brettcannon added the type-feature A feature request or enhancement label Aug 22, 2019

dhimmel closed this Dec 5, 2019

jaraco pushed a commit that referenced this pull request Dec 2, 2022

Update pytest from 4.1.0 to 4.1.1 (#201)

70b1147

Uh oh!

bpo-27413: add --no-ensure-ascii argument to json.tool #201

bpo-27413: add --no-ensure-ascii argument to json.tool #201

Uh oh!

Conversation

dhimmel commented Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhimmel commented Feb 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

methane commented Feb 21, 2017

Uh oh!

dhimmel commented Feb 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhimmel commented Feb 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhimmel Feb 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhimmel commented Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhimmel commented Feb 20, 2017 •

edited

Loading

dhimmel commented Feb 21, 2017 •

edited

Loading

dhimmel Feb 26, 2017 •

edited

Loading

dhimmel commented Feb 27, 2017 •

edited

Loading

serhiy-storchaka commented Feb 27, 2017 •

edited

Loading