Profiling WordPress 1.5
I’ve been doing some Xdebug profiling on LightPress to ensure that speed is properly balanced with memory usage, thanks to some tips from Sencer. While I was at it, I thought it might be fun do a few profiles on my WordPress development site.
Xdebug offers numerous profiling modes, but the one I’ve used here is the “Total Execution Time” (4) profile which sums the excution time for each function call and lists functions in descending order. E.g. if the function apply_filters is called 10 times, the total will show the sum of the 10 separate calls (note that further calls made within a function count do count towards its total). This is a useful profile to determine which functions make up the bulk of WordPress processing time.
Output from a typical index page (5 posts displayed, Markdown on, PreFormatted off):
| Function Summary Profile (sorted by total execution time) | |||
|---|---|---|---|
| Total Time Taken | Avg. Time Taken | Number of Calls | Function Name |
| 0.2940158844 | 0.2940158844 | 1 | {include} |
| 0.2807240486 | 0.2807240486 | 1 | {require} |
| 0.2529430389 | 0.2529430389 | 1 | {require_once} |
| 0.2503268719 | 0.2503268719 | 1 | {require_once} |
| 0.2076935768 | 0.0005016753 | 414 | *apply_filters |
| 0.1687283516 | 0.0017394675 | 97 | call_user_func_array |
| 0.1416168213 | 0.0283233643 | 5 | *the_content |
| 0.0881128311 | 0.0176225662 | 5 | *markdown |
| 0.0825111866 | 0.0165022373 | 5 | *_runblockgamut |
| 0.0723328590 | 0.0055640661 | 13 | *_runspangamut |
| 0.0719242096 | 0.0143848419 | 5 | *_formparagraphs |
| 0.0633089542 | 0.0048699196 | 13 | *_escapespecialchars |
| 0.0624432564 | 0.0003122163 | 200 | *get_settings |
| 0.0606269836 | 0.0606269836 | 1 | {include} |
| 0.0514016151 | 0.0009345748 | 55 | *wptexturize |
| 0.0476915836 | 0.0043355985 | 11 | *list_cats |
| 0.0424897671 | 0.0035408139 | 12 | *wpdb->query |
| 0.0413334370 | 0.0037575852 | 11 | *wpdb->get_results |
| 0.0407681465 | 0.0407681465 | 1 | *wp_list_cats |
| 0.0307693481 | 0.0025641123 | 12 | mysql_query |
(output snipped for the sake of brevity, there are many more pages of this …)
| Opcode Compiling | 8.3145830631 |
| Function Execution | 0.5755004883 |
| Ambient Code Execution | 0.0055794716 |
| Total Execution | 0.5810799599 |
| Total Processing | 8.8956630230 |
The results are interesting:
- once you get past the includes,
apply_filtersis the biggest culprit, consuming between 20-50% of page rendering time for my test site. At 414 calls (most of which do nothing since there are only 85 calls toadd_filterand 7 calls toadd_action) this is no surprise. Not to mention that this function makes terrible use of resources since data is passed by value between filters and every call toapply_filtersresults in another call tomerge_filters. - the cumulative effect of calls to helper functions like
get_settingshave a significant impact on processing time. - 11 calls to
list_cats?!? I can understand one call (for the sidebar listing) but can anyone explain where the other ten come from? - on the plus side, query execution figures much lower on the list than one would expect.
- other fun stats omitted from the listing excerpt: 2870 calls to
preg_replace, 1508 calls tostr_replace, 212 calls tosmiliescmp, 1024 calls tostrlen.
There are certainly several low-hanging fruit here, but the WordPress team really needs to take a hard look at the filter system.
I’ve also tested the same page with PreFormatted activated, resulting in a two-fold decrease in apply_filters execution time (reduced to 402 calls) and a twenty-fold decrease in the_content execution time (markdown obviously disappears from the list. Your mileage may vary.
Just one more note: I have the suspicion, that xdebug gets "irritated" by small loops that get calles very often. There was one piece of code in Textpattern (for the language files), that loaded each line one of the language-file [640 lines], did some trimming, exploding etc. and assigned the value to an array. Judging from xdebug this was about 13% of execution time. So I rewrote the code to cache the array-assignments in a file that could be included on following runs. The Cache-File was in .php and only assigned values. Judging from xdebug the whole language thing now took a fraction of a percent. The total execution time was considerably down, too. - However testing with ab [repeatedly] revealed that there was no measurable difference between the two alternatives.
My conclusion is, that only relying on xdebug is not enough. This also showed when comparing the difference between txp, wp and lp as measured by ab and as measured by xdebug: Xdebug seemed to inflate the differences when compared to ab. So I always test my hypotheses now about what slows things down, and wether the assumed improvement is real.
Having said that, the filters on the front-end of WP also were the first thing to catch my eye, but I am not sure wether the effect is as severe as xdebug makes it seem.
Sencer — 26-Apr-2005 14:46Sencer, I totally agree that Xdebug itself has an effect on the results -- it's a classic case of the Heisenberg Uncertainty Principle. But the total times reported by Xdebug do match closely with what I've measured using simple microtime functions so I'm inclined to believe the results.
Regardless, the number of calls made to
apply_filtersis significant in and of itself. Each PHP function call carries a performance penalty and this function is no lightweight.Testing with apachebench does bring up a whole host of other non-PHP optimization issues that need to be addressed. Often a PHP-based performance gain (as seen by Xdebug or through timing) will have a less significant impact at the apachebench testing level. That's why I'm amazed how big a difference you uncovered between WordPress and TextPattern (and later LightPress).
EDIT: I should clarify "PHP-based performance gain" as low-level code changes to squeeze out more speed (eg. passing by reference, reducing function calls, using built-in functions, replacing regex operations with simple string functions). There are certainly other changes one could make to PHP code that are more structural/architecture-oriented which might have a bigger impact when testing with apachebench (e.g. output buffering).
jerome — 26-Apr-2005 15:08I forgot to mention: Other improvements (even below 10%) did show later through ab testing.
> so I’m inclined to believe the results.
I won't try to convince you, at least not for now when I don't have definitive proof. ;) I think what I will do is, extract the relevant code from my case and run & time it from the command line. Both versions, with and without xdebug.
> Testing with apachebench does bring up a whole host of other > non-PHP optimization issues that need to be addressed.
In general that's true, but it is not very likely that those have the impact of counteracting other performance-gains or even reversing actual order of performance measured for the given alternatives. I guess what I am trying to say is, that non-php issues may change the "scale" of the improvements, they likely won't reverse or nullify otherwise existing performance differences.
RE you clarification: I guess we value it differently. My POV is, that if the php-performance gains are not measurable with ab, then they will not have a great enough impact that it's worth to bother with them (only refering to execution time here). I would rather spend time where the cost/gain ratio is highest. One example: You mention passing by reference vs. passing by value. A quick grep shows 576 function definition in the WP source. One could go and try to rewrite everyone of them. However PHP uses copy-on-write, that means if an array is copied, then at first only a reference is made to the original data - it's only when you start changing the array, that the actual copy will be made. So in those cases where the parameters are not changed, no difference in speed will be made. And on the remaining cases it is safe to say that one or two of the ones that do make a difference will make > 95% of the total difference, because they are passing a huge variable which contains the whole page. So in this hypothetical scenario one can gain 90% of the performance gain with only changing two functions, rather than trying to change another 574 definitions for a full 100%.
Sencer — 27-Apr-2005 00:51Sencer, I offered those only as examples of possible performance improvements -- none of them should be implemented blindly without testing the results. Thorough testing is a necessary part of any performance optimization effort and the more tools one uses, the better the results.
The reason I'm focusing on Xdebug here is that (unlike apachebench) it can give hints about the efficiency of certain pieces of code. For absolute timing measures, apachebench is certainly a more appropriate tool and I'm happy that you've published such detailed results. Without them, we'd still be trying to figure which is faster and by how much! :)
You bring up a good point about copy-on-write that many PHP developers probably don't know (if they even understand references). I'm aware of this behaviour and I certainly don't subscribe to changing all functions to pass by reference. However, in the case of
apply_filterswe have a function whose main purpose is to modify the data passed to it: at 414 calls there is likely to be a modest performance gain by optimizing data assignment and streamlining this function. In factapply_filtersand its sidekickmerge_filtersare the most-called functions in my tests -- if I were looking for a quick fix, this would be the first place I'd look. (well, after applying page caching that is)But my goal is improving LightPress performance. :) The point of this post was to highlight some low-hanging fruit for improving WordPress performance (which I have no intention of tackling). I simply thought the results were interesting.
jerome — 27-Apr-2005 09:03