Parsing a JSON data stream

While browsing StackOverflow I accidentally stumbled upon this question.
The issue seemed like a simple but interesting one, and since google had no answers I thought it would be fun to give it a shot. The first iteration of my solution involved reading in strings of indefinite length, but it quickly turned into a overly-complex state machine so I stepped back and rewrote it with one-character-at-a-time input. You can find on github, under JSONCharInputReader.

JSONCharInputReader was designed to read in a JSON array, not object.
In other words, this will work:

[1, 3, 4, {"var": "val"}, ["array_item", "array_item"], ...

but the following will not:

{"key": ["array", 1, 2], "key2": "value", ...

To parse a JSON stream, create a JSONCharInputReader object and pass in your own implementation of the JSONChunkProcessor interface. JSONChunkProcessor defines one function:

interface JSONChunkProcessor {
    public function process($jsonChunk);
}

Clients implementing this can expect $jsonChunk to be valid JSON (as long as the JSON data being read in is valid itself). If the above array was to be read in, the decoded $jsonChunks passed to the processor would be as follows:


// Decoding 1
int(1)

// Decoding  3
int(3)

// Decoding  4
int(4)

// Decoding  {"var": "val"}
object(stdClass)#3 (1) {
  ["var"]=>
  string(3) "val"
}

// Decoding  ["array_item", "array_item"]
array(2) {
  [0]=>
  string(10) "array_item"
  [1]=>
  string(10) "array_item"
}

See example.php for a sample implementation of the reader, which can read JSON from the terminal by executing it with:

cat | php example.php

One notable limitation is that the $jsonChunks passed to JSONChunkProcessor will be first-dimension level elements of the incoming JSON array data stream. In other words, large objects or arrays will only be processed once the reader receives all of their data.

Proudly powered by WordPress
Theme: Esquire by Matthew Buchanan.