Thursday, October 21, 2010

PHP function choices for speed.

Sometimes a user defined function in PHP is just taking a long time to run and you are not sure why.  The best way to find what is taking so long is to use something like XDebug for PHP which produces cachegrind files.  I am not going to go into detail on how to set all of that up. But, I have provided some examples to look out for. Here is some of the biggest ones we found in our application:

strstr() vs strpos()
   When looking for the existence of a substring inside of a string I have noticed that most of our developers tend to use strstr(), although that is wrong.  strpos() is faster and less memory intensive (http://us.php.net/manual/en/function.strstr.php).  We have tried to move all of our string checks to using strpos().

strpos() vs strpbrk()
    You may be saying, "Wait you just said strpos() is the way to go." and it is in most cases.  The exception to using strpos() would be if you need to check for the existence of multiple single characters. Here is an example:

// strpos()
if(strpos($mystring, "<") !== false || strpos($mystring, ">") !== false || strpos($mystring, "&") !== false) {
       //my string contains at least one of those characters
}

// This way is much faster:
if(strpbrk($mistring, "<>&") !== false) {
       //my string contains at least one of those characters
}

Case-sensitive vs Case-insensitive for string comparison
     You may think that if you do everything in case-insensitive then it makes it easier for your users. While this may be true,case-insensitive is much slower because it has to check many variations to determine if the string exists. I suggest you use case sensitive functions where you can and only use the case-insensitve function only if you have to.


Redundant checks
     Sometimes a redundant check can make things go much faster. Lets say we have a function that tries to detect if a string is XML.  There are a few checks we can do that will weed out the non-XML strings before we run some complex regex queries against it. Lets look at the following code:


function isXML($string){
    if(is_string($string) && strlen($string) > 3){

        if( strpos($string,"<"."?xml")!==false ){
            $string=preg_replace('/<\?xml(.*)?\?>/', "", $string, 1 );
        }

        if(strpos($string, "<") === false && strpos($string, "&lt;") === false) {
            return false;
        }

        if( strpos($string,"\n")!==false ) {
            $string=str_replace("\n", "", $string);
        }

        $m=array(
        // we check it without namespaces first because it is faster
        '/^\<(\w+).*\1\>$/',
        '/^\<(\w+).*\/\>$/',
        '/^&lt;(\w+?).*\1&lgt;$/',
        '/^&lt;(\w+?).*\/&lgt;$/',

        '/^\<(\w*[:]*\w+).*\1\>$/',
        '/^\<(\w*[:]*\w+).*\/\>$/',
        '/^&lt;(\w*[:]*\w+?).*\1&lgt;$/',
        '/^&lt;(\w*[:]*\w+?).*\/&lgt;$/'
        );
        foreach($m as $i){
            if( preg_match($i, $string) ) return true;
        }
    }
    return false;
}

So here you see the first thing I check is if it is a string and if the length of that string is greater then 3.  I do this because the shortest XML you could have is something like "<a/>" which is 4 characters. Next I remove the XML declaration because some of our stuff has that in there (like stored PHP code) and that would produce a false positive.  Next does it have '<' or '&lt;'? If not, there is no way it can be an XML string. It goes on, but the important thing to see here is the order in which I do it. I am trying to balance between checking for the most likely ones and the fastest ones first.

Condition order
    Also, one thing many people overlook when trying to speed up their application is condition order.  When PHP is evaluating multiple conditions in a single statement, it will stop at the first one that sets the condition. So if you use && between all of you conditions it will stop on the first one that returns false and will not evaluate the other conditions.  Just the same way, if you use || between all of your conditions, it will stop on the first one that is true and not evaluate the other conditions. So, you will want to order your conditions by the likely hood it would match and the speed of the operation. Lets look at an example:


// a bad way to order it
if( is_bool($value) ||is_numeric($value) || !empty($value) ) {
    // Do something
}


// a much better way to order it
if( !empty($value) || is_numeric($value) || is_bool($value) ){
    // Do something
}

Here you will see the first thing it checks is is_bool() then is_numeric() then if it is not empty.  The empty() function will return false for both false and 0, so we check for those separately.  Now you may need to order these differently for your use, but for our use, most of the time $value is going to be a string.  Other times it may be 0 or false.  Since everything except 0 and false are triggered by the empty, I do it first which speeds up the checks.

So keep in mind the little stuff when designing your code, because when your function is called 12,000 times in a single page query that 10 millisecond difference can really add up.

There are many more things to look for, but I will save those for a different post.

Friday, October 15, 2010

New Servers

I talked with one of the sales people at a new hosting company about some new servers.  What they have to offer sounds really good, but kinda expensive.  It is so going to be worth it to have them fully maintain the server.  We are going to go with the EMS services from NeoSpire at http://www.neospire.net .  It still may take a few weeks to get everything setup and moved over, wish us luck!

Thursday, October 14, 2010

Servers were down

Well, our Amazon EC2 app server locked up last night and I went to re-launch it ... it failed to start ... for 7 hours!!!  At 2:00am I finally got it back up and running.  This was caused by a combination of things:
     a) our server start up script that installs our software and sets every setting needed by our software had some incorrect stuff,
     b) and Scalr managed to lose my VHost config for our server, so I did not have SSL and mod_rewite was broken. 
I am pretty upset right now.  Our app server has locked up several times in the past couple of months on Amazon and we have no way of telling why.  The only thing we can do is forcefully terminate the instance and re-launch it, which destroys all of the logs :(.
Sorry to all of our Australian customers.

We will probably be moving away from Amazon EC2 and going back to dedicated servers.  Amazon EC2 sounds like the perfect solution for us on paper, but in practice it just isn't going to work.

Wednesday, October 13, 2010

Billing our customers

We just rolled out a new version of our software and we have not been able to bill our customers.  Pretty scary huh?  Now today the owners are breathing down our necks to get the billing up and going as soon as possible so we are trying to throw some stuff together to get it done.  We will see how it goes.

Monday, October 11, 2010

About Me

Hello, my name is Anthony and I am a lead developer at a software company the sells web applications as a service.  It primarily targets gymnastics and cheerleading facilities and has clients world-wide. I am very good in PHP, Javascript and HTML.  I plan to post my experiences here about my day to day dealings with programming, gymnastics and customers.  I do wish to keep my full name and my company name secret to protect both me and them.