A naive parser for JSON
string
Before diving into this project, let's think over the following questions
- Is
"true"
a validJSON
string? What about" \r\n\t true"
,"3.14"
,"\"\u0000\""
? (and more) - With
node.js
,JSON.parse("123456789123456789123456789")
returns1.2345678912345679e+26
, so doesJSON
format support arbitrary length integers? - With
node.js
, bothJSON.parse('"\\/"')
andJSON.parse('"/"')
run successfully, doesJSON
format support both"\\/"
and"/"
?
For question 1, all of "true"
, " \r\n\t true"
, "3.14"
, "\"\u0000\""
are parsed successfully by JSON.parse(...)
For question 2 and 3, what I tried can be seen as follows
To get a good understanding of JSON
format, I referred to Introducing JSON
and implemented a simple JSON
parser in this project.
It should be able to handle arbitrary length JSON
string (the max length of java
string is 2147483647, please
don't let the input string exceed this length limit).
If you find any bug, welcome to create an issue
, thanks.
Step 1: Build a jar
file with the following command
mvn clean package
Step 2: Use this jar
file to parse and pretty print JSON
string from standard input or specified input file.
Two examples are shown below.
echo ' [ {"Message": "Hello, world", "Some special numbers": [4.2E1, 23E0, 3.14159265358979], "Today is Saturday" : true, "Needs to work": false, "Test for null": null}]' | \
java -jar target/naive-json-parser-1.0-SNAPSHOT-jar-with-dependencies.jar
The output is like this
[
{
"Message": "Hello, world",
"Some special numbers": [
4.2E1,
23E0,
3.14159265358979
],
"Today is Saturday": true,
"Needs to work": false,
"Test for null": null
}
]
echo '[ "Hello", 3.14, true, {"key1": ["value1", "value2"]} ]' > input.json
java -jar target/naive-json-parser-1.0-SNAPSHOT-jar-with-dependencies.jar -f input.json
The output is like this
[
"Hello",
3.14,
true,
{
"key1": [
"value1",
"value2"
]
}
]
Let's take a look at the main
method in com.study.Main
class
public static void main(String[] args) throws IOException {
SimpleOptionHandler optionHandler = new SimpleOptionHandler(args);
try (InputStream inputStream = optionHandler.isFileSpecified() ? new FileInputStream(optionHandler.getFileName()) : System.in) {
byte[] bytes = inputStream.readAllBytes();
String raw = new String(bytes, StandardCharsets.UTF_8);
JsonParser jsonParser = new JsonParser();
Json json = jsonParser.parse(new PeekingIterator<>(raw.codePoints().iterator()));
PresenterFacade presenterFacade = new PresenterFacade();
String result = presenterFacade.convertToString(json);
System.out.println(result);
}
}
There are 4 steps.
- Command line arguments handling (
-f/--file
option is supported) - Read from
stdin
or the specified file - Parse as a
Json
instance - Present this
Json
instance
A JSON
value can be any of the below items
- object
- array
- string
- number
- "true" (i.e. literal
true
) - "false" (i.e. literal
false
) - "null" (i.e. literal
null
)
The key idea for parsing the above seven items can be classified into five cases. Let me show these five cases from easier ones to complex ones.
For null
, false
, true
, they are special literal items.
They are parsed by the below parsers respectively
A number
has below three parts
integer
fraction
exponent
Number
items are parsed by NumberParser.
As there are three parts (fraction
and exponent
can be empty) within a number
item,
IntegerParser,
FractionParser,
ExponentParser are used to parse these three parts
respectively.
A string
is composed by below three parts.
- A leading
"
characters
- A tailing
"
String
items are parsed by StringParser.
A StringParser implements the logic for matching
leading/tailing "
by itself,
while it dispatches the logic of matching characters
to
a CharactersParser.
An array
item comes in one of below two formats
'[' ws ']'
(case one)'[' elements ']'
(case two)
Array
items are parsed by ArrayParser.
To handle case one, WhitespaceParser is used.
To handle case two, ElementsParser is used.
An object
item comes in one of below two formats
'{' ws '}'
(case one)'{' members '}'
(case two)
Object
items are parsed by ObjectParser.
To handle case one, WhitespaceParser is used.
To handle case two, MembersParser is used.
- Use some library for command line arguments handling (e.g. arguments for output-file, indent-level-control)
- Use slf4j for logging