Adding Indexes to Your Database to Improve Performance

Posted by Sheldon Finlay on April 24, 2009

Working with a relational database is easy, just define your tables and fields and you’re good right? Well, not quite. It’s important to visual how your tables are going to relate and how your queries are going to interact with your database structure. This is where having the proper indexes on your table will speed up your queries dramatically and make your application perform like a champ (at least on the database side).

Indexes? In My Database?

Let’s jump right into it and look at how MySQL accesses records in a single database table. Say you have a simple users table which is structured like this:

CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
)

Now, let’s say we have a few thousand users in this table and we do a simple query like this:

SELECT * FROM users WHERE name = "John Smith"

MySQL has absolutely no idea where this record is found in the table. It doesn’t know if there is one record for John Smith or a dozen. So it has to do a full table scan reading though every single row in the database looking for a match in the name column. Now this isn’t too bad if your table has only a small number of records. But once your table rows grows to a certain point, say a couple thousand records, this non-indexed table can become quite slow. So what can we do?

An Index in Time Saves Nine

The first general guideline of indexing is to add an index to any field you are querying on. In this case we are querying on the name field. So if we add an index to name field we’ll see a huge performance gain and our table can scale larger without increasing the query time, thus making our application a lot more responsive.

Let’s see this in action: I have created a database table using the structure above and populated it with about 7 million random records. (yes, this is a bit data set, but not uncommon). If I run the query above without an index on the name field it looks like this:

2298 rows in set (3.59 sec)

Ugh! It took 3.59 seconds to run that query and return the results. Now you might not think that 3+ seconds is a lot, but trust me, it’s horrible. When you’re dealing with a production application which is serving hundreds of queries to the database a second this lag is going to compound even greater and your application is going to grind to a halt. This is so full of fail it ain’t funny.

Now, lets run the same query on the same 7 million records but with an index added to the name field:

2298 rows in set (0.02 sec)

Now that’s what I am talking about. That’s a huge performance increase: from nearly 4 seconds down to a couple tenths of a second. This, my friend, is a good thing. You want your application to be lean and it’s queries to the database to be lightning fast. Indexes will help your achieve the latter.

So what happened here? How did MySQL’s performance improve just by adding an index to a queried field? Well, it helps to understand a little bit about what an index is. Think of your database as a large book. An index is sort of like the index of a book. Without an index, if you wanted to find out on what pages a particular topic appeared you would have to leaf through every single page scanning for that information. Fortunately, a book index has a listing of where those topic appear so you can save yourself a lot of work by just referring to the index. For a MySQL database, when you add in index, the database goes through and builds a listing of the records for that field for lightning fast access so it doesn’t have to scan the whole database. When records are added or removed from the database the index is also updated.

Now an index does add some bloat to a table. For the table I’ve been using above the non-indexed version weighed in at 350MB and the indexed version was 25MB larger at 375MB. Not a huge increase, but you definitely don’t to go around adding indexes to all your fields. Only add indexes to fields that will benefit from them. Here are some general guidelines:

  • Add an index to any field you are querying regularly (as illustrated above).
  • Add an index to any field you are joining on. The performance benefit here can be enormous as complex joins across multiple tables can be very slow.
  • Add an index to any field you are going to be adding  a GROUP BY or ORDER BY clause on.

Now, keep in mind if your table has a primary key, then you get an index for free on that primary key. No need to add an index to a primary key. However, if you are doing a join and have two tables, for example and addresses with a foreign key of user_id like so:

CREATE TABLE `addresses` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id' int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
)

You will want to add an index to user_id in order to speed up a simple join query like the one below:

SELECT * FROM users, addresses WHERE users.id = addresses.user_id AND users.id=842414;

So that’s the basics of database indexes for MySQL. This is a big topic and I have only touched upon the basics. Keep in mind that every situation is different and there are no hard and fast rules to adding indexes apart from not adding them with reckless abandon. Observe how your application is using the database and what queries are being generated and try to determine which fields are being queried on or joined on and add indexes to those and see if our application’s database performance improves.

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

Comments