2019
DOI: 10.1007/s00778-019-00578-5
|View full text |Cite
|
Sign up to set email alerts
|

Parsing gigabytes of JSON per second

Abstract: JavaScript Object Notation or JSON is a ubiquitous data exchange format on the Web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible.Despite the maturity of the problem of JSON parsing, we show that substantial speedups are possible. We present the first standard-compliant JSON parser to process gigabytes of data per second on a single core, using commodity processors. We can use a quarter or fewer instr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
57
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
7

Relationship

5
2

Authors

Journals

citations
Cited by 46 publications
(58 citation statements)
references
References 26 publications
0
57
0
1
Order By: Relevance
“…We expect that such new instruction sets should be applicable to base64 decoding and encoding. Future work could also integrate fast base64 decoders inside vectorized parsers such as simdjson 15 .…”
Section: Resultsmentioning
confidence: 99%
“…We expect that such new instruction sets should be applicable to base64 decoding and encoding. Future work could also integrate fast base64 decoders inside vectorized parsers such as simdjson 15 .…”
Section: Resultsmentioning
confidence: 99%
“…After loading v 1 , we detect all invalid 2-byte sequences at once using vectorized classification, a concept we documented in earlier work [11]. If a bit in the range 0-6 is set in all three looked-up patterns for a byte as checked with the AND instruction, 5 that byte (and the UTF-8) is considered invalid.…”
Section: Invalid 2-byte Sequencesmentioning
confidence: 99%
“…There has been much work on the acceleration of text content using SIMD instructions (e.g., base64 [14,15], JSON [11], XML [16], HTML [17], CVS [18]). We are not aware of any published work directly related to Unicode validation using SIMD instructions other than our own [11]. Cameron [19] has worked on the related problem of UTF-8 to UTF-16 transcoding using SIMD instruction, but their approach is not applicable to high-speed validation.…”
Section: Related Workmentioning
confidence: 99%
“…However, with the release of CityJSON 1.0, an effort was made to optimize the CityJSON parser in azul to the same level as the CityGML parser. For this, azul 0.9 has a new parser based on the highly optimized experimental simdjson library (https://github.com/lemir e/simdjson) (Langdale & Lemire, 2019), which uses modern processors' SIMD instructions to speed up parsing. It is worth noting that despite spending less time developing azul's CityJSON parser than the CityGML parser, azul is now able to parse CityJSON files twice or three times faster than the same files in CityGML.…”
Section: Cityjsonmentioning
confidence: 99%