
In this post I will demonstrate some practices to optimize performance and reduce memory footprint of a PHP script. Writing your code according to them is especially useful in the case where you traditionally do not use PHP: transforming a large bulk of data.
Even PHP can handle this because it has become more performant in the last years and has got a command line interface since 4.3.0. But there are more arguments for transforming data with PHP: You reuse your application’s PHP interface. If it is performant itself, it’s probably not necessary to rewrite parts in another language to do the bulk operations.
- In PHP, assignments copy arrays. Assign arrays explicitly by reference, so they will not be duplicated in memory:
class RowHolder { public $rows; function __construct(array $rows) { $this->rows = &$rows; } } $manyRows = $Stmt->fetchAll(); $RowHolder = new RowHolder($manyRows);Same goes for iteration:
foreach ($RowHolder->rows as &$row) { ... } - Use prepared statements. Next to be more efficient they prevent SQL injection:
$Stmt = $PDOConnection->prepare( 'SELECT * FROM table WHERE id = ?' ); foreach ($ids as $id) { $row = $Stmt->execute(array($id))->fetch(); ... $Stmt->closeCursor(); } - Check if you really need a class to represent a simple data structure which is used in vast numbers. Memory usage is higher and you get longer
serialize()strings. Imagine a country border, which is represented as array of geo points. Using an array ofPointobjects will need more bytes than simply using an array of arrays, like this:$Border = new Geometry( Geometry::LinearRing, array( array(53.0749616,87.867913), array(53.0719262,87.840664), array(53.0706606,87.819640), ... array(53.0749616,87.867913) ) );I came across this when I created a framework for OpenGIS Simple Features. The standard defines a class hierarchy of geometric features. Starting the conventional way I just transfered it into a big class tree. But then I switched to an easier way, because all classes share an array of elements which form a geometry. I just define the parent class –
Geometry. It is a container for a multi-dimensional array instead of a complex object tree. -
Memory management is still PHP’s big issue. Especially using objects with circular references creates memory leaks. While rendering a website this is not a problem because the number of objects is rather small. Memory is cleaned up after script execution. But looping through millions of objects will make the memory use growing until the script stops (sometimes even without a message). There is no clear way to prevent leaks. Raising
memory_limitwill not help to complete the script execution, if it will run a long time. I don’t suggest trying to fix memory leaks completely. You will not be successful with complex structures. To create a job script with PHP is not elegant but there is a way to do it.I suggest the following solution: Let the script run on the command line until it reaches a memory limit you have defined. If the job was not finished, the script will save its state to a temporary file. On the next execution it reloads the state and starts from there. A bash script repeatedly calls the job script. It looks like this:
#!/bin/sh e=2 while [ $e -eq 2 ] do jobScript.php $1 e=$? done
And this is a scheme I use for jobScript.php:
<?php ini_set('memory_limit', '500M'); /* This one is for checking and should be significantly lower than the 'memory_limit' ini setting */ $memory_limit = 100 * 1000 * 1000; /* Create State with names of variables, which are important for current execution state. Example: object identifier of current data row */ $State = new State(array( 'stateVar1', 'stateVar2', ... ), $dataset); // Try to load variables into script if (!$State->load()) { // On initial execution: init state variables $stateVar1 = ...; $stateVar2 = ...; // On repeated execution: $State->load() does the trick } // This is the loop where your data is transformed do { /* The job: - fetch data chunk from db depending from state vars ... - compute heavy objects (and create memory leaks) - put data to db - change state vars */ /* To call unset() or explicit __destruct() here didn't succeed to prevent memory leaks */ if (memory_get_usage(true) > $memory_limit) { /* If memory usage has risen too high, save current script state and exit */ $State->save(); exit(2); } } while ($data_exists); /* This time the script finishes - now delete the script's state and exit with success */ $State->delete(); exit(0);Pretty, huh? For me it has worked many times. This is the
Stateclass for download.Don’t do this at home, if your job is finished quickly or your script does not create memory leaks ;)
Photos are by photocase users complize and mathias the dread