Skip to content

Commit 331624b

Browse files
committed
Issue #14674: Add a discussion of the json module's standard compliance.
Patch by Chris Rebert.
1 parent a61b09f commit 331624b

File tree

2 files changed

+114
-6
lines changed

2 files changed

+114
-6
lines changed

Doc/library/json.rst

Lines changed: 111 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,10 @@
66
.. moduleauthor:: Bob Ippolito <[email protected]>
77
.. sectionauthor:: Bob Ippolito <[email protected]>
88

9-
`JSON (JavaScript Object Notation) <http://json.org>`_ is a subset of JavaScript
10-
syntax (ECMA-262 3rd edition) used as a lightweight data interchange format.
9+
`JSON (JavaScript Object Notation) <http://json.org>`_, specified by
10+
:rfc:`4627`, is a lightweight data interchange format based on a subset of
11+
`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ syntax (`ECMA-262 3rd
12+
edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf>`_).
1113

1214
:mod:`json` exposes an API familiar to users of the standard library
1315
:mod:`marshal` and :mod:`pickle` modules.
@@ -105,8 +107,10 @@ Using json.tool from the shell to validate and pretty-print::
105107

106108
.. note::
107109

108-
The JSON produced by this module's default settings is a subset of
109-
YAML, so it may be used as a serializer for that as well.
110+
JSON is a subset of `YAML <http://yaml.org/>`_ 1.2. The JSON produced by
111+
this module's default settings (in particular, the default *separators*
112+
value) is also a subset of YAML 1.0 and 1.1. This module can thus also be
113+
used as a YAML serializer.
110114

111115

112116
Basic Usage
@@ -185,7 +189,8 @@ Basic Usage
185189
*object_hook* is an optional function that will be called with the result of
186190
any object literal decoded (a :class:`dict`). The return value of
187191
*object_hook* will be used instead of the :class:`dict`. This feature can be used
188-
to implement custom decoders (e.g. JSON-RPC class hinting).
192+
to implement custom decoders (e.g. `JSON-RPC <http://www.jsonrpc.org>`_
193+
class hinting).
189194

190195
*object_pairs_hook* is an optional function that will be called with the
191196
result of any object literal decoded with an ordered list of pairs. The
@@ -230,7 +235,7 @@ Basic Usage
230235
*encoding* which is ignored and deprecated.
231236

232237

233-
Encoders and decoders
238+
Encoders and Decoders
234239
---------------------
235240

236241
.. class:: JSONDecoder(object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None)
@@ -415,3 +420,103 @@ Encoders and decoders
415420

416421
for chunk in json.JSONEncoder().iterencode(bigobject):
417422
mysocket.write(chunk)
423+
424+
425+
Standard Compliance
426+
-------------------
427+
428+
The JSON format is specified by :rfc:`4627`. This section details this
429+
module's level of compliance with the RFC. For simplicity,
430+
:class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and parameters other
431+
than those explicitly mentioned, are not considered.
432+
433+
This module does not comply with the RFC in a strict fashion, implementing some
434+
extensions that are valid JavaScript but not valid JSON. In particular:
435+
436+
- Top-level non-object, non-array values are accepted and output;
437+
- Infinite and NaN number values are accepted and output;
438+
- Repeated names within an object are accepted, and only the value of the last
439+
name-value pair is used.
440+
441+
Since the RFC permits RFC-compliant parsers to accept input texts that are not
442+
RFC-compliant, this module's deserializer is technically RFC-compliant under
443+
default settings.
444+
445+
Character Encodings
446+
^^^^^^^^^^^^^^^^^^^
447+
448+
The RFC recommends that JSON be represented using either UTF-8, UTF-16, or
449+
UTF-32, with UTF-8 being the default.
450+
451+
As permitted, though not required, by the RFC, this module's serializer sets
452+
*ensure_ascii=True* by default, thus escaping the output so that the resulting
453+
strings only contain ASCII characters.
454+
455+
Other than the *ensure_ascii* parameter, this module is defined strictly in
456+
terms of conversion between Python objects and
457+
:class:`Unicode strings <str>`, and thus does not otherwise address the issue
458+
of character encodings.
459+
460+
461+
Top-level Non-Object, Non-Array Values
462+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
463+
464+
The RFC specifies that the top-level value of a JSON text must be either a
465+
JSON object or array (Python :class:`dict` or :class:`list`). This module's
466+
deserializer also accepts input texts consisting solely of a
467+
JSON null, boolean, number, or string value::
468+
469+
>>> just_a_json_string = '"spam and eggs"' # Not by itself a valid JSON text
470+
>>> json.loads(just_a_json_string)
471+
'spam and eggs'
472+
473+
This module itself does not include a way to request that such input texts be
474+
regarded as illegal. Likewise, this module's serializer also accepts single
475+
Python :data:`None`, :class:`bool`, numeric, and :class:`str`
476+
values as input and will generate output texts consisting solely of a top-level
477+
JSON null, boolean, number, or string value without raising an exception::
478+
479+
>>> neither_a_list_nor_a_dict = "spam and eggs"
480+
>>> json.dumps(neither_a_list_nor_a_dict) # The result is not a valid JSON text
481+
'"spam and eggs"'
482+
483+
This module's serializer does not itself include a way to enforce the
484+
aforementioned constraint.
485+
486+
487+
Infinite and NaN Number Values
488+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
489+
490+
The RFC does not permit the representation of infinite or NaN number values.
491+
Despite that, by default, this module accepts and outputs ``Infinity``,
492+
``-Infinity``, and ``NaN`` as if they were valid JSON number literal values::
493+
494+
>>> # Neither of these calls raises an exception, but the results are not valid JSON
495+
>>> json.dumps(float('-inf'))
496+
'-Infinity'
497+
>>> json.dumps(float('nan'))
498+
'NaN'
499+
>>> # Same when deserializing
500+
>>> json.loads('-Infinity')
501+
-inf
502+
>>> json.loads('NaN')
503+
nan
504+
505+
In the serializer, the *allow_nan* parameter can be used to alter this
506+
behavior. In the deserializer, the *parse_constant* parameter can be used to
507+
alter this behavior.
508+
509+
510+
Repeated Names Within an Object
511+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
512+
513+
The RFC specifies that the names within a JSON object should be unique, but
514+
does not specify how repeated names in JSON objects should be handled. By
515+
default, this module does not raise an exception; instead, it ignores all but
516+
the last name-value pair for a given name::
517+
518+
>>> weird_json = '{"x": 1, "x": 2, "x": 3}'
519+
>>> json.loads(weird_json)
520+
{'x': 3}
521+
522+
The *object_pairs_hook* parameter can be used to alter this behavior.

Misc/NEWS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -514,6 +514,9 @@ Build
514514
Documentation
515515
-------------
516516

517+
- Issue #14674: Add a discussion of the json module's standard compliance.
518+
Patch by Chris Rebert.
519+
517520
- Issue #15630: Add an example for "continue" stmt in the tutorial. Patch by
518521
Daniel Ellis.
519522

0 commit comments

Comments
 (0)