Skip to content

bpo-27413: add --no-ensure-ascii argument to json.tool #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed
32 changes: 28 additions & 4 deletions Lib/json/tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,18 @@
import sys


def parse_indent(indent):
"""Parse the argparse indent argument."""
if indent == 'None':
return None
if indent == r'\t':
return '\t'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like such special casing.
I think json.tool should be minimal to expose json.dumps as CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methane can you think of any other way to expose setting indent? The issue is that indent can take an int, string, or None. Command line arguments are not ideal for type flexibility. Furthermore, the most common string is a tab, which is difficult to pass via the command line without such special casing.

Another option would be to pass options.indent through ast.literal_eval. Therefore, users could specify any python value, but the command line usage would be more cumbersome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this either. You can use external commands like unexpand for converting spaces to tabs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-referencing bpo-29636.

@serhiy-storchaka so do you think it's best to simplify so you must either pass --indent an int or None. Anything that cannot be converted to an int besides None would throw an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pass an integer with --indent. Add --no-indent for None.

try:
return int(indent)
except ValueError:
return indent


def main():
prog = 'python -m json.tool'
description = ('A simple command line interface for json module '
Expand All @@ -25,24 +37,36 @@ def main():
help='a JSON file to be validated or pretty-printed')
parser.add_argument('outfile', nargs='?', type=argparse.FileType('w'),
help='write the output of infile to outfile')
parser.add_argument('--no-ensure-ascii', action='store_true', default=False,
help='Do not set ensure_ascii to escape non-ASCII characters')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option name should be --ensure_ascii, (and default should be True).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methane are you suggesting:

parser.add_argument('--ensure_ascii', action='store_false', default=True)
# And then when calling `json.dump`:
ensure_ascii=options.ensure_ascii

The problem here is that it's counterintuitive for --ensure_ascii to result in ensure_ascii=False. In https://bugs.python.org/issue27413, the discussion settled on --no-ensure-ascii, which I'm happy to switch to:

parser.add_argument('--no-ensure-ascii', action='store_true', default=False)
# And then when calling `json.dump`:
ensure_ascii=not options.no_ensure_ascii

Just let me know what's preferred.

parser.add_argument('--indent', default='4', type=parse_indent,
help='Indent level or str for pretty-printing. '
'Use None for the most compact representation. '
r'Use "\t" for tab indentation.')
parser.add_argument('--sort-keys', action='store_true', default=False,
help='sort the output of dictionaries alphabetically by key')
options = parser.parse_args()

# Read input JSON
infile = options.infile or sys.stdin
outfile = options.outfile or sys.stdout
sort_keys = options.sort_keys
with infile:
try:
if sort_keys:
if options.sort_keys:
obj = json.load(infile)
else:
obj = json.load(infile,
object_pairs_hook=collections.OrderedDict)
except ValueError as e:
raise SystemExit(e)

# Export JSON
outfile = options.outfile or sys.stdout
with outfile:
json.dump(obj, outfile, sort_keys=sort_keys, indent=4)
json.dump(obj, outfile,
indent=options.indent,
ensure_ascii=not options.no_ensure_ascii,
sort_keys=options.sort_keys,
)
outfile.write('\n')


Expand Down
14 changes: 11 additions & 3 deletions Lib/test/test_json/test_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ class TestTool(unittest.TestCase):
data = """

[["blorpie"],[ "whoops" ] , [
],\t"d-shtaeou",\r"d-nthiouh",
],\t"d-shtaeou",\r"🐍 and δ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to avoid using non-ascii characters in sources. Use escaped sequences. Non-ascii characters are not always displayed correctly (for example I see just a rectangle instead of the first character) and it is not clear from which range they are. Use characters from three different ranges: U+0080-U+00FF, U+0100-U+FFFF and U+1000-U+10FFFF. They are encoded differently in Python and JSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay in d699070 I use the following unicode characters, which should cover three ranges:

  • δ U+03B4 \u03B4
  • 🐍 U+1F40D \N{snake}
  • 𝀷 U+1D037 \U0001D037

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed something in the range U+0080-U+00FF. In Python string literals they are encoded with \x, but JSON always use \u.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference (in python 3.6):

>>> print("\xA7 \N{snake} \u03B4 and \U0001D037")
§ 🐍 δ and 𝀷
>>> json.dumps("\xA7 \N{snake} \u03B4 and \U0001D037")
'"\\u00a7 \\ud83d\\udc0d \\u03b4 and \\ud834\\udc37"'

"i-vhbjkhnth", {"nifty":87}, {"morefield" :\tfalse,"field"
:"yes"} ]
"""
Expand All @@ -26,7 +26,7 @@ class TestTool(unittest.TestCase):
],
[],
"d-shtaeou",
"d-nthiouh",
"🐍 and δ",
"i-vhbjkhnth",
{
"nifty": 87
Expand All @@ -48,7 +48,7 @@ class TestTool(unittest.TestCase):
],
[],
"d-shtaeou",
"d-nthiouh",
"🐍 and δ",
"i-vhbjkhnth",
{
"nifty": 87
Expand Down Expand Up @@ -106,3 +106,11 @@ def test_sort_keys_flag(self):
self.assertEqual(out.splitlines(),
self.expect_without_sort_keys.encode().splitlines())
self.assertEqual(err, b'')

def test_no_ensure_ascii_flag(self):
infile = self._create_infile()
rc, out, err = assert_python_ok('-m', 'json.tool', '--no-ensure-ascii', infile)
self.assertEqual(rc, 0)
self.assertEqual(out.splitlines(),
self.expect_without_sort_keys.encode().splitlines())
self.assertEqual(err, b'')