Updates from May, 2010

  • Vitamin

    Andi 17:14 on May 14, 2010 | 0 Permalink | Reply

    Wer schon genug gestrampelt hat, kann auch sozial sein.
    Strampeln als edle Beschäftigung für noch mehr Edles, das dann folgt und sogar sozial ist!
    Musik als Strampelkatalysator …
    Denn: Kulturelle Beiträge sind an Teilnahme an sozialen Systemen gebunden.

    Wer diesen Post seltsam findet, der gehört nicht zu den seltenen, die mal zu wenig strampelten oder ist schlichtweg von wenigen Dingen begeistert.

     
  • Getting out more from PHP

    Andi 16:35 on June 27, 2009 | 0 Permalink | Reply

    In this post I will demonstrate some practices to optimize performance and reduce memory footprint of a PHP script. Writing your code according to them is especially useful in the case where you traditionally do not use PHP: transforming a large bulk of data.

    Even PHP can handle this because it has become more performant in the last years and has got a command line interface since 4.3.0. But there are more arguments for transforming data with PHP: You reuse your application’s PHP interface. If it is performant itself, it’s probably not necessary to rewrite parts in another language to do the bulk operations.

    1. In PHP, assignments copy arrays. Assign arrays explicitly by reference, so they will not be duplicated in memory:
      class RowHolder {
        public $rows;
        function __construct(array $rows) {
          $this->rows = &$rows;
        }
      }
      
      $manyRows = $Stmt->fetchAll();
      $RowHolder = new RowHolder($manyRows);

      Same goes for iteration:

      foreach ($RowHolder->rows as &$row) {
        ...
      }
    2. Use prepared statements. Next to be more efficient they prevent SQL injection:
      $Stmt = $PDOConnection->prepare(
        'SELECT * FROM table WHERE id = ?'
      );
      
      foreach ($ids as $id) {
        $row = $Stmt->execute(array($id))->fetch();
        ...
        $Stmt->closeCursor();
      }
    3. Check if you really need a class to represent a simple data structure which is used in vast numbers. Memory usage is higher and you get longer serialize() strings. Imagine a country border, which is represented as array of geo points. Using an array of Point objects will need more bytes than simply using an array of arrays, like this:
      $Border = new Geometry(
        Geometry::LinearRing,
        array(
          array(53.0749616,87.867913),
          array(53.0719262,87.840664),
          array(53.0706606,87.819640),
          ...
          array(53.0749616,87.867913)
        )
      );

      I came across this when I created a framework for OpenGIS Simple Features. The standard defines a class hierarchy of geometric features. Starting the conventional way I just transfered it into a big class tree. But then I switched to an easier way, because all classes share an array of elements which form a geometry. I just define the parent class – Geometry. It is a container for a multi-dimensional array instead of a complex object tree.

    4. Memory management is still PHP’s big issue. Especially using objects with circular references creates memory leaks. While rendering a website this is not a problem because the number of objects is rather small. Memory is cleaned up after script execution. But looping through millions of objects will make the memory use growing until the script stops (sometimes even without a message). There is no clear way to prevent leaks. Raising memory_limit will not help to complete the script execution, if it will run a long time. I don’t suggest trying to fix memory leaks completely. You will not be successful with complex structures. To create a job script with PHP is not elegant but there is a way to do it.

      I suggest the following solution: Let the script run on the command line until it reaches a memory limit you have defined. If the job was not finished, the script will save its state to a temporary file. On the next execution it reloads the state and starts from there. A bash script repeatedly calls the job script. It looks like this:

      #!/bin/sh
      e=2
      while [ $e -eq 2 ]
        do jobScript.php $1
        e=$?
      done

      And this is a scheme I use for jobScript.php:

      <?php
      ini_set('memory_limit', '500M');
      
      /*
      This one is for checking and should be
      significantly lower than the 'memory_limit' ini setting
      */
      $memory_limit = 100 * 1000 * 1000;
      
      /*
      Create State with names of variables,
      which are important for current execution state.
      Example:
      object identifier of current data row
      */
      $State = new State(array(
        'stateVar1',
        'stateVar2',
        ...
      ), $dataset);
      
      // Try to load variables into script
      if (!$State->load()) {
      
        // On initial execution: init state variables
        $stateVar1 = ...;
        $stateVar2 = ...;
        // On repeated execution: $State->load() does the trick
      }
      
      // This is the loop where your data is transformed
      do {
      
        /*
        The job:
        - fetch data chunk from db depending from state vars ...
        - compute heavy objects (and create memory leaks)
        - put data to db
        - change state vars
        */
      
        /*
        To call
        unset() or explicit __destruct()
        here didn't succeed to prevent memory leaks
        */
      
        if (memory_get_usage(true) > $memory_limit) {
      
          /*
          If memory usage has risen too high,
          save current script state and exit
          */
          $State->save();
          exit(2);
        }
      } while ($data_exists);
      
      /*
      This time the script finishes - now delete
      the script's state and exit with success
      */
      $State->delete();
      exit(0);

      Pretty, huh? For me it has worked many times. This is the State class for download.

      Don’t do this at home, if your job is finished quickly or your script does not create memory leaks ;)

    Photos are by photocase users complize and mathias the dread

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
esc
cancel