Thursday, October 21, 2010

PHP function choices for speed.

Sometimes a user defined function in PHP is just taking a long time to run and you are not sure why.  The best way to find what is taking so long is to use something like XDebug for PHP which produces cachegrind files.  I am not going to go into detail on how to set all of that up. But, I have provided some examples to look out for. Here is some of the biggest ones we found in our application:

strstr() vs strpos()
   When looking for the existence of a substring inside of a string I have noticed that most of our developers tend to use strstr(), although that is wrong.  strpos() is faster and less memory intensive (http://us.php.net/manual/en/function.strstr.php).  We have tried to move all of our string checks to using strpos().

strpos() vs strpbrk()
    You may be saying, "Wait you just said strpos() is the way to go." and it is in most cases.  The exception to using strpos() would be if you need to check for the existence of multiple single characters. Here is an example:

// strpos()
if(strpos($mystring, "<") !== false || strpos($mystring, ">") !== false || strpos($mystring, "&") !== false) {
       //my string contains at least one of those characters
}

// This way is much faster:
if(strpbrk($mistring, "<>&") !== false) {
       //my string contains at least one of those characters
}

Case-sensitive vs Case-insensitive for string comparison
     You may think that if you do everything in case-insensitive then it makes it easier for your users. While this may be true,case-insensitive is much slower because it has to check many variations to determine if the string exists. I suggest you use case sensitive functions where you can and only use the case-insensitve function only if you have to.


Redundant checks
     Sometimes a redundant check can make things go much faster. Lets say we have a function that tries to detect if a string is XML.  There are a few checks we can do that will weed out the non-XML strings before we run some complex regex queries against it. Lets look at the following code:


function isXML($string){
    if(is_string($string) && strlen($string) > 3){

        if( strpos($string,"<"."?xml")!==false ){
            $string=preg_replace('/<\?xml(.*)?\?>/', "", $string, 1 );
        }

        if(strpos($string, "<") === false && strpos($string, "&lt;") === false) {
            return false;
        }

        if( strpos($string,"\n")!==false ) {
            $string=str_replace("\n", "", $string);
        }

        $m=array(
        // we check it without namespaces first because it is faster
        '/^\<(\w+).*\1\>$/',
        '/^\<(\w+).*\/\>$/',
        '/^&lt;(\w+?).*\1&lgt;$/',
        '/^&lt;(\w+?).*\/&lgt;$/',

        '/^\<(\w*[:]*\w+).*\1\>$/',
        '/^\<(\w*[:]*\w+).*\/\>$/',
        '/^&lt;(\w*[:]*\w+?).*\1&lgt;$/',
        '/^&lt;(\w*[:]*\w+?).*\/&lgt;$/'
        );
        foreach($m as $i){
            if( preg_match($i, $string) ) return true;
        }
    }
    return false;
}

So here you see the first thing I check is if it is a string and if the length of that string is greater then 3.  I do this because the shortest XML you could have is something like "<a/>" which is 4 characters. Next I remove the XML declaration because some of our stuff has that in there (like stored PHP code) and that would produce a false positive.  Next does it have '<' or '&lt;'? If not, there is no way it can be an XML string. It goes on, but the important thing to see here is the order in which I do it. I am trying to balance between checking for the most likely ones and the fastest ones first.

Condition order
    Also, one thing many people overlook when trying to speed up their application is condition order.  When PHP is evaluating multiple conditions in a single statement, it will stop at the first one that sets the condition. So if you use && between all of you conditions it will stop on the first one that returns false and will not evaluate the other conditions.  Just the same way, if you use || between all of your conditions, it will stop on the first one that is true and not evaluate the other conditions. So, you will want to order your conditions by the likely hood it would match and the speed of the operation. Lets look at an example:


// a bad way to order it
if( is_bool($value) ||is_numeric($value) || !empty($value) ) {
    // Do something
}


// a much better way to order it
if( !empty($value) || is_numeric($value) || is_bool($value) ){
    // Do something
}

Here you will see the first thing it checks is is_bool() then is_numeric() then if it is not empty.  The empty() function will return false for both false and 0, so we check for those separately.  Now you may need to order these differently for your use, but for our use, most of the time $value is going to be a string.  Other times it may be 0 or false.  Since everything except 0 and false are triggered by the empty, I do it first which speeds up the checks.

So keep in mind the little stuff when designing your code, because when your function is called 12,000 times in a single page query that 10 millisecond difference can really add up.

There are many more things to look for, but I will save those for a different post.

No comments:

Post a Comment