Hitting the Database Less: Quick and Dirty Strategies for Database Efficiency

« »

Below are a list of my top five quick-and-dirty strategies for improving database performance in web applications. These suggestions are culled from recent experience and mixed with some ideas that I’ve implemented in my own code. They’re not high level, but they are something we need consistent reminders about. Here they are…

1. Try caching. If you have something that doesn’t change a whole lot, or data that can be stale for a period of time, consider caching it. This can be as simple as writing a function that writes it to a file. For example:

<?php

function cacheData($filename,$time = 900)
{

	if(file_exists($filename) &amp;amp;&amp;amp; (time() - filemtime($filename) < $time))
	{
		$string = file_get_contents($filename);
		return $string;
	}
	else
	{
		// $string = some_query_function();
		file_put_contents($filename,$string);
		return $string;
	}
}

?>

This function would check for a file’s last modification date, and if the data is stale, it would replace it with new data. Since a file read is almost always faster than a database hit, this is a great solution.

2. Reduce the number of queries that run. For every query that you run, your database takes a hit. By reducing those queries, you improve the speed. Use JOIN statements (Jay Pipes has a great presentation you can read here.)

3. Use indexes. Learn how to use the EXPLAIN command, and then implement indexes to optimize your database. Do NOT index everything; index only where it is appropriate.

4. Optimize data usage. Ensure that data is kept once; make sure that you have frequently used data separate from infrequently used data or large data (read TEXT, LONGTEXT, BLOB). For example, you might consider keeping a blog’s metadata separate from the entries and linking them through a foreign key; you could also consider having the entry text as a text file that is read from disk (up to you).

5. Avoid functions in WHERE statements. For example, this will cause problems:

SELECT * FROM MEMBER WHERE TO_DAYS(expiration) 
- TO_DAYS(CURDATE()) < 30

The optimizer cannot use an index on the 'expiration' column because it has to apply the function TO_DAYS() first; this negates the positives on the index.

A better SQL query would be this:

SELECT * FROM MEMBER WHERE expiration < 
DATE_ADD(CURDATE(), INTERVAL 30 DAY)

In this case, the optimizer can use the 'expiration' column index to help find our data faster. (Credit to Paul Dubois for these queries)

These are easy but important tips. The more they're written down, the more people will use them. Use them, and your life will be easier.

Brandon Savage is the author of Mastering Object Oriented PHP and Practical Design Patterns in PHP

Posted on 11/14/2008 at 5:03 pm
Categories: Best Practices, Databases, System Architecture
Tags: , , , ,

omerida (@omerida) wrote at 11/17/2008 2:22 pm:

Excellent tips – one I would add from my experience, is to do any intense data processing outside of the web request, for example via cron scripts. If you need to count/summarize/average, from a large data set to generate reports, consider making those calculations and then saving them to a summary data table. Your web script simply formats the summary data for display.

Jaik wrote at 11/18/2008 4:42 am:

Could you explain point 4, I didn’t think have large data fields affected performance unless they were being selected or used elsewhere in the query?

Brandon Savage (@brandonsavage) wrote at 11/18/2008 5:28 am:

Jaik, here’s what I do know:

Jay Pipes who works for MySQL recommends in his Join-Fu talk (http://jpipes.com/index.php?/archives/260-Slides-from-Drunken-Query-Master-and-Join-fu-Talks-at-ZendCon.html) that you be careful when picking your data types.

In his slides, he indicates that you should use TEXT sparingly, and BLOB VERY sparingly. He also notes that the smaller your records, the more you can fit into a single page on the disk, and the faster your scans will be. Just because you’re not selecting on a particular record doesn’t mean that adding on that data doesn’t hinder performance.

He also recommends splitting up large tables into smaller ones, presumably for the same reason, and partitioning long tables into shorter ones.

I’m not an expert in MySQL (Jay Pipes works for MySQL), so I recommend if you want to learn more you visit his website at http://jpipes.com/ and take a look at what he has to say.

itchy wrote at 9/6/2009 7:10 pm:

what’s the difference between a large table and a long table?

Brandon Savage (@brandonsavage) wrote at 9/7/2009 2:36 pm:

To me, a long table has lots of rows. A large table can have lots of rows, or each row could be exceptionally large.

For example, if you’re storing files in the database as blobs, you could have a large table with just a few rows. Likewise, if you have a table of users that contains 35 million rows, that’s too much.

« »

Copyright © 2022 by Brandon Savage. All rights reserved.