Get your FREE 30 page Developing SOLID Applications guide!

Micro Optimizations That Don’t Matter

Out Of Date Warning

Languages change. Perspectives are different. Ideas move on. This article was published on October 26, 2009 which is more than two years ago. It may be out of date. You should verify that technical information in this article is still current before relying upon it for your own purposes.

Last week I wrote about some optimizations you can apply to your code that will improve the performance of your site significantly. I also mentioned that regularly an article pops up talking about ways to shave time off your scripts, and I talked about how these articles mostly are bunk. Like this one.

The article I linked above is a run-of-the-mill micro optimization list. The difference here is that the author actually makes use of some benchmarks to make their point. So, let’s go step by step and discover together why this article takes longer to read than the amount of CPU time it saves.

Loops
The author asserts that it is best to calculate the maximum value for a for loop outside of the declaration of the loop. Inadvertently, the author stumbles upon a tried-and-true programming technique: don’t repeat yourself.

The code sample:

#Worst than foreach and while loop
for($i =0; $i < count($array);$i++){
echo 'This is bad, my friend';
}

#Better than foreach and while loop
$total = (int)count($array);
for($i =0; $i < $total;$i++){
echo 'This is great, my friend';
}

This is a true “duh!” moment. Of course a loop is faster when you don’t run a function each time you iterate through it! But bear in mind that if you think a “for” loop is the fastest loop available, you’d be just as surprised as Sara Golemon, one of the internals developers, to find out that it is not.

Loops are a necessary part of programming, even nested loops sometimes (which he argues against). Don’t forgo loops just because you’re worried about performance. If you use them right, performance won’t even be a factor.

Single Vs. Double Quotes
This is one of the two big PHP optimization nightmares that just won’t die. The argument goes like this: because PHP has to parse a double-quoted string twice looking for variables, it is inherently slower than a single quoted string.

The argument isn’t necessarily true.

I did my own benchmark this morning on PHP 5.2.10, and discovered something interesting: when PHP evaluates a simple string that is double quoted and a simple string that is single quoted (that is, there is not a variable in either string), the single quoted string actually runs slower than the double-quoted string. Yep. I’m serious.

Double Quotes Test: 11.838791131973
Single Quotes Test: 11.952278137207

That is my average for five runs of 10,000,000 iterations over a single quoted string and a double-quoted string.

When I added a variable (concatenated on the single-quoted string), the two performed as expected: the double-quoted string performed slower.

What does this mean? It means that PHP appears to be smart enough not to parse the double-quoted string twice, if it doesn’t have to. PHP seems to optimize for you, meaning you don’t have to optimize yourself. It’s also worth noting that in 10,000,000 iterations, the average difference between the two was 3/10ths of a second. If you’re trying to save 3/10ths of a second, you may have other areas worth refactoring.

Pre-Increment Versus Post-Increment
I did a benchmark of this micro-optimization tip. And I found that in fact, pre-increment is actually faster than post increment. By 5% in fact. That’s a major performance boost, right?

Wrong.

The amount of time I saved on average was 5/100ths of a second. That’s not even enough time for me to have typed the last sentence, or for you to have read it. And I was doing 10,000,000 operations.

The benchmark looks great – a 5% increase in performance – until you realize that all benchmarks are subject to possible fallacy.

Absolute Path vs. Real Path
As much as I hate to admit it, this does have some merit – if you’re including tons and tons of files. For me, including a file via relative path was 11x slower than including a file with an absolute path. However, using absolute paths does create a problem.

Absolute paths make code completely and 100% non-portable. Sure, you can set a constant that contains the absolute path – and I recommend you do this anyway – but you’re still going to lose performance as the parser converts the constant into a string to use in the path definition. Also, an absolute path for everything can be a headache.

PHP includes options for autoloading and using an include path. The performance difference may be some 11x slower for relative paths (or making use of the include path), but we’re talking 0.005 seconds for absolute path versus 0.11 seconds for relative path – this isn’t getting into the seconds here.

Echo vs. Print
Is print() slower than echo()? Yes.

Does it matter? No.

Why doesn’t it matter? It doesn’t matter because after 10,000,000 iterations, there was less than 1-second difference between the two. Less. Than. A. Second. Think about that. The time you save by using one over the other isn’t even going to register; it’ll look more like a rounding error when you consider and compute your stats.

The benchmark that is cited here basically says as much. The slowest measure of using print() was 456 microseconds. This seems like a lot until you consider that 456 microseconds is 0.000456 seconds. How fast is that? You read the last sentence slower than 456 microseconds. Sure, the fact that print() is up to 5x slower might scare you, but when you consider it in terms of actual time spent, that should put it in perspective.

It’s worth noting that there’s something that every PHP developer should know: generally speaking, most of the time a language construct (like echo() and isset()) will run faster than a function (like print() and array_key_exists()). However, you have to pick the right tool for the job, rather than going for performance in most cases.

Dot Vs Commas Concatenation
This is another one of those “does it matter?” And the answer is no.

The author uses the same benchmark I linked above to prove that commas are faster. But remember, the benchmark’s fastest execution time per call was 456 microseconds.

str_replace() vs preg_replace() vs ereg_replace()
Inadvertently, the author stumbles upon another tried-and-true programming technique: use the right tool for the job.

Fact: if you’re using str_replace() five times, you probably need a regular expression.

Fact: if you’re using ereg_replace(), you’re using a deprecated function and you shouldn’t do that.

Both str_replace(), str_ireplace(), and preg_replace() have their place in PHP and programming in general. You should always pick the right tool for the job, ignoring performance for the most part. Consider a construction project: you might move more earth with a backhoe but if you’re trying to dig a 3″ wide ditch a trowel would probably be the best tool, even if it’s not the most performant.

Getting a Timestamp
You may be able to grab the timestamp from the superglobal $_SERVER array just fine. But there are some problems with this that should be considered.

First, it makes your code very hard to read. All of a sudden you’ve grabbed a timestamp and are making use of it without accessing the time() function.

Second, this means you’re not making use of the PHP Date/Time object that’s built in. The following code gives you a Date/Time object with the current time that you can manipulate:

<?php
$date = new DateTime();

Why is using this object better? It’s compatible with OOP programming. And it has a bunch of built-in functions that you won’t have to rewrite or do on the fly. PHP 5.3 introduces a lot of cool things (like calculating the difference between times).

Third, relying on the $_SERVER superglobal breaks code that doesn’t have a $_SERVER superglobal or contains less data in it. Don’t believe me? Good luck debugging code that doesn’t have a $_SERVER[‘REQUEST_TIME’] value. Or when you use a framework that eliminates the $_SERVER values.

Fourth, and final, there’s no mention of validation. Now, it’s arguable that the REQUEST_TIME key comes directly from the server, and it very well may come from the server. But, and this is worth noting, it creates a double-standard for the superglobal arrays. With the exception of $_SESSION, ANY data that you get out of a superglobal *must* be validated. Every. Single. Time. The reason for this is because you don’t want to make assumptions about where that data came from, or establish a double standard. Instead, by validating every piece of data that comes from a superglobal array, you ensure that each piece is clean. And any validation you write is going to kill any performance bonus you get from using the superglobal REQUEST_TIME value.

explode() versus preg_split()
This relates directly to our previous discussion of picking the right tool. These functions are fundamentally different: explode() takes a condition and creates an array. preg_split() is related, but takes a regular expression to split the string. Pick the right tool for the job, and don’t care as much about performance.

Summary
If you use these micro optimization tips you’ll probably save some CPU time, but your boss won’t be pleased, and neither will your client. Your code will be harder to read, and you’ll spend hours to get back microseconds. I’ve talked with developers, some of whom work for companies like Yahoo and Facebook, who laugh when I bring up print() versus echo(). They don’t care, and they run the world’s biggest PHP websites. So why should you?

The biggest problem with “micro optimization” is that it is a red herring for needing to refactor more effectively. It makes the assumption that your performance problems can be solved by some simple small changes in your code, rather than massive, or significant refactoring and rearchitecture choices, and that’s simply not true.

If you want a big performance boost, please feel free to employ some of these tips, as they have been proven to shave seconds off requests and reduce server load. But you should reject the tips of changing from single quotes to double quotes, as they’re simply not going to save you time.

This blog entry implements The Beginner Pattern.

Write better object oriented PHP today.

Object oriented programming always leaves you with a headache. What if you could master it instead?

Get the book now! »

Clay (@claylua) wrote at 10/26/2009 2:24 am:

Well, since you linked to my article, i might as well leave a comment on your blog.

Well, on the loop case. I did not said that loop or nested loop isn’t part of programming. I’m just stating the efficiency cost of having too much nested loop. Definitely if you use it right (which is not like 4-10 nested loops) it really doesn’t bother anyone when looking at how powerful computers are nowadays. And in case you know everything. Not EVERYONE does. Furthermore, the link you provided is a bit static, try looking for one that provides a more dynamic result to test on different OS? Nonetheless, its a good references to check it out :)

Single vs Double quote. Well done on spending your time on benchmarking (when there are already real time sites doing it). Anyway, if you really want to do a benchmark on the sentence i wrote, try doing it with a double quote with a variable string embed within it. But this is micro optimization, i believe you won’t be interested with micro optimization.

pre vs post increment. 5% can also mean 0.0000001ms compare to 0.00000001ms. Like you said, it really doesn’t matter.

echo vs print. it really DOESN’T MATTER.

etc….

anyway, I’m just interested with the benchmarking of MICRO OPTIMIZATION and sort of good to know such differences as a programmer although you don’t since it doesn’t provide any benefit to you. But i do agree with you that micro optimization doesn’t help optimize performance big time. May be you odd to read the comment i left on that post before spending the time writing this?

Jonas Lejon (@jonasl) wrote at 10/26/2009 3:43 am:

Don’t forget about rand() vs mt_rand(). Some people puts the slow rand() inside a loop etc

Samuel Folkes (@SamuelFolkes) wrote at 10/26/2009 4:09 am:

I read the article you’re rebutting a while back and chuckled to myself, particularly after reading the part about obtaining a timestamp. The concept of micro-optimization is easy to sell, particularly because people are generally more concerned with getting a quick fix than learning how to write code properly In the first place. I agree with almost all you say here. As far as echo() and print() are concerned, I agree that the speed difference gained from using echo() is negligible, however the reason print() is slower is not because its a function. They’re both language constructs but print() is slower because unlike echo() it returns a value; ‘1’ if the argument is successfully printed and ‘0’ if it isn’t.

till (@klimpong) wrote at 10/26/2009 4:55 am:

Hey,

so on the absolute path issue, two things:

a) Did you toy with the realpath cache (and ttl) at all? It helps you somewhat if you jack up the values, etc.. Generally this works well in a production environment when the application only changes with a new deployment.

b) For the “makes code unportable” issue. This is something you can actually take care of in deployment. For example, PEAR has those nifty @foo@ identifiers which can be replaced with a replacement task. So for example, @php_bin@ becomes /whereever/php-cli/is/located. This depends on PEAR’s configuration (which should be taken care of when you set it up/install it).

Nice writing!

Giorgio Sironi (@giorgiosironi) wrote at 10/26/2009 12:25 pm:

Agree on all the points, Brandon.

Greg Beaver wrote at 10/26/2009 1:53 pm:

First off, I agree 100% that micro-optimizations may not matter at all to your application.

However, I’d like to point out that you have oversimplified the question of how to load external files and how it affects performance. In Rasmus Lerdorf’s keynote for Drupalcon last year (http://szeged2008.drupalcon.org/program/sessions/rasmus-lerdorf-keynote-simple-hard) he took a simple application and through a series of optimizations, some micro, some more substantial, transformed it from an application that can handle 13 req/sec to one that does 400 req/sec as measured by Apache benchmark.

One of the more significant improvements came simply from changing include_path from “.:/path/to/stuff” to “/path/to/stuff:.” This is certainly a micro-optimization by any definition. Also, by replacing require_once with include, he increased the requests per second (note: not the percentage of execution time, the actual number of requests that can be served per second) by a large percentage.

So, this is all to say, micro-optimizations are sometimes useless, but when you do them in conjunction with a real application, and measure the difference in requests per second, you can get an accurate picture of how much they help, and they can result in several thousands of percentage increase in performance for the application as a whole.

The key is don’t optimize blindly, and consider the tradeoffs that are inherent.

Clay (@claylua) wrote at 10/26/2009 9:55 pm:

The key is don’t optimize blindly, and consider the tradeoffs that are inherent.

@Greg Beaver : Absolutely agree with that.

Pierre Spring (@shvi) wrote at 10/27/2009 6:14 pm:

There is one advantage about these micro optimizations: Coding Standards. You have arguments to enforce some coding standards within communities and enterprises. I, for example, don’t like it, when I see double and single quotes mixed within the same ligne of concatenation …

Richard Lynch (@LynchRichard) wrote at 10/28/2009 1:05 am:

I thought rand() was gutted and replaced with mt_rand() under the hood anyway…

Also, I’m not sure why anybody thinks “” is parsed “more” than ”…

Both have escape characters, so PHP has to iterate through the string for each, looking for \\ and \’ in one, and a whole host of things in the other, including, but not limited to: {} $ \” and a bunch of other backslash stuff that I won’t duplicate here when you can rtfm.

Maybe PHP has to go through “twice” for “”, but I sure don’t see why…

It’s still silly to try to optimize where your bottleneck isn’t.

Build it simple/straightforward first, then find bottlenecks and figure out why it’s slow, and ONLY then can you begin to optimize.