Active9 months ago
I am trying to process somewhat large (possibly up to 200M) JSON files.The structure of the file is basically an array of objects.
In this quick web dev tutorial, you'll learn how to process large files in PHP, avoiding methods that won't work because of memory limitations.
- I've been trying to get.HTML files to process like.PHP files using the.htaccess file, I've used the following, both of which don't work and trying to access the test page gives me an option to.
- Form Processing with PHP. When a user fills in this form and clicks on the submit button, the form data is sent to the process.php file.
- PHP 5 File Handling Previous Next File handling is an important part of any web application. You often need to open and process a file for different tasks.
- Processing the Form Data ( PHP Code ) Next, we are going to create our PHP file that will process the data. When you submit your HTML form PHP automatically populates two superglobal arrays, $_GET and $_POST, with all the values sent as GET or POST data, respectively.
- Uploading Files with PHP What do pictures in an online photo album, email attachments in a web-based mail client, and data files submitted to an online application for batch processing all have in.
So something along the lines of:
Each object has arbitrary properties and does not necessary share them with other objects in the array (as in, having the same).
I want to apply a processing on each object in the array and as the file is potentially huge, I cannot slurp the whole file content in memory, decoding the JSON and iterating over the PHP array.
So ideally I would like to read the file, fetch enough info for each object and process it.A SAX-type approach would be OK if there was a similar library available for JSON.
Any suggestion on how to deal with this problem best?
The Mighty Rubber DuckThe Mighty Rubber Duck2,25633 gold badges2222 silver badges2626 bronze badges
6 Answers
I've written a streaming JSON pull parser pcrov/JsonReader for PHP 7 with an api based on XMLReader.
It differs significantly from event-based parsers in that instead of setting up callbacks and letting the parser do its thing, you call methods on the parser to move along or retrieve data as desired. Found your desired bits and want to stop parsing? Then stop parsing (and call
close()
because it's the nice thing to do.)(For a slightly longer overview of pull vs event-based parsers see XML reader models: SAX versus XML pull parser.)
Example 1:
Read each object as a whole from your JSON.
Output:
Objects get returned as stringly-keyed arrays due (in part) to edge cases where valid JSON would produce property names that are not allowed in PHP objects. Working around these conflicts isn't worthwhile as an anemic stdClass object brings no value over a simple array anyway.
Example 2:
Read each named element individually.
Output:
Example 3:
Read each property of a given name. Bonus: read from a string instead of a URI, plus get data from properties with duplicate names in the same object (which is allowed in JSON, how fun.)
Output:
How exactly to best read through your JSON depends on its structure and what you want to do with it. These examples should give you a place to start.
user3942918user394291818.8k1010 gold badges4242 silver badges5959 bronze badges
I decided on working on an event based parser. It's not quite done yet and will edit the question with a link to my work when I roll out a satisfying version.
EDIT:
I finally worked out a version of the parser that I am satisfied with. It's available on GitHub:
There's probably room for some improvement and am welcoming feedback.
The Mighty Rubber DuckThe Mighty Rubber Duck2,25633 gold badges2222 silver badges2626 bronze badges
There exists something like this, but only for C++ and Java. Unless you can access one of these libraries from PHP, there's no implementation for this in PHP but
jonijonijson_read()
as far as I know. However, if the json is structured that simple, it's easy to just read the file until the next }
and then process the JSON received via json_read()
. But you should better do that buffered, like reading 10kb, split by }, if not found, read another 10k, and else process the found values. Then read the next block and so on..4,27911 gold badge2020 silver badges3838 bronze badges
This is a simple, streaming parser for processing large JSON documents. Use it for parsing very large JSON documents to avoid loading the entire thing into memory, which is how just about every other JSON parser for PHP works.
Aaron AverillAaron Averill
Recently I made a library called JSON Machine, which efficiently parses unpredictably big JSON files. Usage is via simple
foreach
. I use it myself for my project.Example:
See https://github.com/halaxa/json-machine
Filip HalaxaFilip Halaxa
There is http://github.com/sfalvo/php-yajl/ I didn't use it myself.
Alex JasminAlex Jasmin34.4k55 gold badges6666 silver badges6161 bronze badges
protected by tchristSep 6 '12 at 12:30
Php Image Processing
Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
Would you like to answer one of these unanswered questions instead?
Not the answer you're looking for? Browse other questions tagged phpjsonlarge-files or ask your own question.
Here's a function that I have used to get a nice simple array of all incoming files from a page. It basically just flattens the $FILES array. This function works on many file inputs on the page and also if the inputs are '<input type='file[]' multiple>'. Note that this function loses the file input names (I usually process the files just by type).
<?php
function incoming_files() {
$files = $_FILES;
$files2 = [];
foreach ($files as $input => $infoArr) {
$filesByInput = [];
foreach ($infoArr as $key => $valueArr) {
if (is_array($valueArr)) { // file input 'multiple'
foreach($valueArr as $i=>$value) {
$filesByInput[$i][$key] = $value;
}
}
else { // -> string, normal file input
$filesByInput[] = $infoArr;
break;
}
}
$files2 = array_merge($files2,$filesByInput);
}
$files3 = [];
foreach($files2 as $file) { // let's filter empty & errors
if (!$file['error']) $files3[] = $file;
}
return $files3;
}
$tmpFiles = incoming_files();
?>
will transform this:
Array
(
[files1] => Array
(
[name] => facepalm.jpg
[type] => image/jpeg
[tmp_name] => /tmp/php3zU3t5
[error] => 0
[size] => 31059
)
[files2] => Array
(
[name] => Array
(
[0] => facepalm2.jpg
[1] => facepalm3.jpg
)
[type] => Array
(
[0] => image/jpeg
[1] => image/jpeg
)
[tmp_name] => Array
(
[0] => /tmp/phpJutmOS
[1] => /tmp/php9bNI8F
)
[error] => Array
(
[0] => 0
[1] => 0
)
[size] => Array
(
[0] => 78085
[1] => 61429
)
)
)
into this:
Array
(
[0] => Array
(
[name] => facepalm.jpg
[type] => image/jpeg
[tmp_name] => /tmp/php3zU3t5
[error] => 0
[size] => 31059
)
[1] => Array
(
[name] => facepalm2.jpg
[type] => image/jpeg
[tmp_name] => /tmp/phpJutmOS
[error] => 0
[size] => 78085
)
[2] => Array
(
[name] => facepalm3.jpg
[type] => image/jpeg
[tmp_name] => /tmp/php9bNI8F
[error] => 0
[size] => 61429
)
)