Backing up MySQL Databases

Posted by Sheldon Finlay on April 30, 2009

Most people backup their web site files. But with so many sites being database-driven it is equally important to make regular MySQL database backups. If you are running a blog, a forum or any database-driven web site and your database gets nuked, well there goes all your content. How long will it take to restore it and get your site back online? If you have a backup, the data loss should be minimal and restoration of your backup should take no time at all.  If you don’t have a backup you are probably screwed and all your work has disappeared into the ether.

What options do you have for creating a MySQL database backup? There are plenty and many of them can be automated using shell scripting or cronjobs so you can set them and forget them (until you need them!). Let’s look at a few manual backup methods. In a later post I will talk a little bit about automating backups.

PhpMyAdmin

Most of you are probably familiar with PhpMyAdmin, the PHP web-based administration backend for MySQL. PhpMyAdmin has a tab called export where by you can backup your entire database or just selected tables. The default export settings are usually perfect, although you can tweak them according to your needs, although you should always export as SQL, as that is the format your database will expect if you ever need to restore it. Lastly, you want to save the file to your local computer and to save space and transfer time, compress it in the format of your choice.

mysqldump

mysqldump is a command line utility which pretty much does the same thing as PhpMyAdmin except it’s shell based and not web based. So you will need a shell account on the server where your database resides. The basic syntax for using mysqldump is:

mysqldump -u [user name] -p[password] [database name] > [dump file]

So you will need to provide your database username, database password, database name and finally a filename which will be the schema file containing the entire database structure and data.

If your database is large you might want to compress it to save space and transfer time. You can backup the database and compress the database in one swoop like so:

mysqldump –-user [user name] –-password=[password] [database name] > [dump file] | gzip > [dump file]

Now, to restore a database with mysqldump you would issue do something like this:

mysql -u [username ] -p[password] [database name] < [dump_file]

This same command could also be used for a schema file which you created with PhpMyAdmin. You would of course need to upload the schema file to your account. I find working with schema files through the shell a lot faster than working with PhpMyAdmin. PhpMyAdmin can also choke on your schema, particularly if you are saving binary info to your database.

mysqlhotcopy

mysqlhotcopy is another command line utility for creating MySQL database backups. It has many advantages over mysqldump:

  • It’s very fast. So it’s suitable for large databases which would be both time consuming to backup as well as restore.
  • It makes exact copies of the actual files which make up the database tables.
  • It ensure data integrity by locking and flushing the tables while they are being copied. Other database backup methods can export a backup while a record is being written which can lead to a corruption in the backup.

Now, despite it’s overwhelming positives, mysqlhotcopy comes with one large downfall which makes it not practical for everyone. To restore the files, you need to have access to the actual files which make up your database. If you are on a shared web host or don’t have root access on your database server you won’t be able to restore the files yourself. You might be able to ask the web host to copy the files and flush the database for you. But you should check before you use this method, lest you get stuck with a backup which you can’t restore.

So how do you use mysqlhotcopy? It’s a shell command similar to mysqldump:

mysqlhotcopy -u [username] -p[password] db_name [/path/to/new_directory]

What this will do is lock and flush the tables in your database and make an exact copy of those files to the directory your specified. To restore your database, you simply copy these files into your database directory and flush or restart MySQL. The location on my server would be /var/lib/mysql/table_name/, but yours might be different.

So this covers some manual backup methods. Manual backups are better than nothing, but they still leave something to be desired. It can be time consuming to run backups manually and would be better to have them run automatically at regular intervals. It would also be ideal to have some sort of backup rotation where you could choose between restoring yesterday’s backup,  last week’s backup or anything in between. This is a topic I will try to cover in a future post. But for now, just start backing up your databases.

Learn Git!

Posted by Sheldon Finlay on April 30, 2009

If you are doing any sort of coding, be it programming or just CSS/XHTML, you stand to benefit by using some sort of version control. I use Git for programming projects as well as for design projects. The benefits are enormous. Now, I am not going to talk about why you should be using Git. Perhaps, I’ll cover that in a later post. But I wanted to direct the beginner as well as the moderately advanced user to GitCasts.com. Git is an extremely powerful tool and you can get by with just learning a few command. But there is a lot more to Git and the real fun begins when you really learn to leverage it.

Adding Indexes to Your Database to Improve Performance

Posted by Sheldon Finlay on April 24, 2009

Working with a relational database is easy, just define your tables and fields and you’re good right? Well, not quite. It’s important to visual how your tables are going to relate and how your queries are going to interact with your database structure. This is where having the proper indexes on your table will speed up your queries dramatically and make your application perform like a champ (at least on the database side).

Indexes? In My Database?

Let’s jump right into it and look at how MySQL accesses records in a single database table. Say you have a simple users table which is structured like this:

CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
)

Now, let’s say we have a few thousand users in this table and we do a simple query like this:

SELECT * FROM users WHERE name = "John Smith"

MySQL has absolutely no idea where this record is found in the table. It doesn’t know if there is one record for John Smith or a dozen. So it has to do a full table scan reading though every single row in the database looking for a match in the name column. Now this isn’t too bad if your table has only a small number of records. But once your table rows grows to a certain point, say a couple thousand records, this non-indexed table can become quite slow. So what can we do?

An Index in Time Saves Nine

The first general guideline of indexing is to add an index to any field you are querying on. In this case we are querying on the name field. So if we add an index to name field we’ll see a huge performance gain and our table can scale larger without increasing the query time, thus making our application a lot more responsive.

Let’s see this in action: I have created a database table using the structure above and populated it with about 7 million random records. (yes, this is a bit data set, but not uncommon). If I run the query above without an index on the name field it looks like this:

2298 rows in set (3.59 sec)

Ugh! It took 3.59 seconds to run that query and return the results. Now you might not think that 3+ seconds is a lot, but trust me, it’s horrible. When you’re dealing with a production application which is serving hundreds of queries to the database a second this lag is going to compound even greater and your application is going to grind to a halt. This is so full of fail it ain’t funny.

Now, lets run the same query on the same 7 million records but with an index added to the name field:

2298 rows in set (0.02 sec)

Now that’s what I am talking about. That’s a huge performance increase: from nearly 4 seconds down to a couple tenths of a second. This, my friend, is a good thing. You want your application to be lean and it’s queries to the database to be lightning fast. Indexes will help your achieve the latter.

So what happened here? How did MySQL’s performance improve just by adding an index to a queried field? Well, it helps to understand a little bit about what an index is. Think of your database as a large book. An index is sort of like the index of a book. Without an index, if you wanted to find out on what pages a particular topic appeared you would have to leaf through every single page scanning for that information. Fortunately, a book index has a listing of where those topic appear so you can save yourself a lot of work by just referring to the index. For a MySQL database, when you add in index, the database goes through and builds a listing of the records for that field for lightning fast access so it doesn’t have to scan the whole database. When records are added or removed from the database the index is also updated.

Now an index does add some bloat to a table. For the table I’ve been using above the non-indexed version weighed in at 350MB and the indexed version was 25MB larger at 375MB. Not a huge increase, but you definitely don’t to go around adding indexes to all your fields. Only add indexes to fields that will benefit from them. Here are some general guidelines:

  • Add an index to any field you are querying regularly (as illustrated above).
  • Add an index to any field you are joining on. The performance benefit here can be enormous as complex joins across multiple tables can be very slow.
  • Add an index to any field you are going to be adding  a GROUP BY or ORDER BY clause on.

Now, keep in mind if your table has a primary key, then you get an index for free on that primary key. No need to add an index to a primary key. However, if you are doing a join and have two tables, for example and addresses with a foreign key of user_id like so:

CREATE TABLE `addresses` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id' int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
)

You will want to add an index to user_id in order to speed up a simple join query like the one below:

SELECT * FROM users, addresses WHERE users.id = addresses.user_id AND users.id=842414;

So that’s the basics of database indexes for MySQL. This is a big topic and I have only touched upon the basics. Keep in mind that every situation is different and there are no hard and fast rules to adding indexes apart from not adding them with reckless abandon. Observe how your application is using the database and what queries are being generated and try to determine which fields are being queried on or joined on and add indexes to those and see if our application’s database performance improves.

Syncing dotfiles using your MobileMe iDisk

Posted by Sheldon Finlay on April 11, 2009

If you are developing on a Mac, you no doubt use dotfiles. If you are like me, you are constantly tweaking your dotfiles, adding in new aliases and functions or whatever to help streamline your workflow. I have two computers I develop on, a Macbook and a Mac Pro desktop. Switching between them is often frustrating because my dotfile customizations I use on one are not found on the other until I set them up. I knew there had to be a better way to keep my ever-changing dotfiles synced.

I have a MobileMe account with Apple and already use it to synchronize my life between my two Macs and my iPhone. It rocks and is well worth the subscription for the convenience it affords. This past week I started thinking about whether it would be possible to sync other files between my Macs. I decided to start with my dotfiles as a simple and easy proof of concept. Here’s how I did it:

The computers you’ll be syncing must use the same MobileMe account since you’ll be using the iDisk as the vehicle of synchronization. The first thing you want to do is enabled iDisk sync on the computers.  iDisk sync can be enabled in MobleMe settings in System Preferences. Choose the iDisk option and set iDisk Sync to On and choose Automatically as the update choice.  This will enable your iDisk files to be synced and downloaded to your computer regularly so you can access them offline. Otherwise, all your iDisk files are stored in the cloud and you wouldn’t have access to your dotfiles if you were offline.

iDisk sync should be on and set to automatic.

Open your iDisk on one of the computers and create a permanent location for your dotfiles. For this tutorial I’ll be using what I have set up which is have a folder called Misc and inside there I have a folder called Dotfiles. You can set it up however you like, as long it it’s permanent. If you move or rename this location you will need to redo the symlinks.

Next choose which computer’s dotfiles will be the ones you want to use and move them to your chosen folder on your iDisk. By default, dotfiles are invisible in the Finder, so you’ll need to use the Terminal. The iDisk should be found under /Volumes/iDisk:

mv .profile /Volumes/iDisk/Misc/Dotfiles/.
mv .bashrc /Volumes/iDisk/Misc/Dotfiles/.
mv .bash_profile /Volumes/iDisk/Misc/Dotfiles/.
mv .bash_login /Volumes/iDisk/Misc/Dotfiles/.
mv .gitconfig /Volumes/iDisk/Misc/Dotfiles/.
mv .gitk /Volumes/iDisk/Misc/Dotfiles/.
mv .gitignore /Volumes/iDisk/Misc/Dotfiles/.
mv .irbc /Volumes/iDisk/Misc/Dotfiles/.
mv .rails-plugin-sources /Volumes/iDisk/Misc/Dotfiles/.

These are just some of the dotfiles I like to keep synced (my list goes on and also includes some non-dotfiles in my root directory). Yours will no doubt be different depending upon which shells and tools you use. The world is your oyster here, so don’t hold back.

Now that your dotfiles have been moved over it’s time to link to them using a symbolic link. A symbolic link is basically an alias or pointer to an actual file located somewhere else:

ln -s /Volumes/iDisk/Misc/Dotfiles/.profile .
ln -s /Volumes/iDisk/Misc/Dotfiles/.bashrc .
ln -s /Volumes/iDisk/Misc/Dotfiles/.bash_profile .
ln -s /Volumes/iDisk/Misc/Dotfiles/.bash_login .
ln -s /Volumes/iDisk/Misc/Dotfiles/.gitconfig .
ln -s /Volumes/iDisk/Misc/Dotfiles/.gitk .
ln -s /Volumes/iDisk/Misc/Dotfiles/.gitignore .
ln -s /Volumes/iDisk/Misc/Dotfiles/.irbc .
ln -s /Volumes/iDisk/Misc/Dotfiles/.rails-plugin-sources .

When you do an ‘ls -al’ in Terminal you should see your newly created dotfile symlinks. Log out of Terminal and log back in and check to see if your environment is working. Test your aliases, etc. If you set everything up correctly, things should work as they always have. If not, check your work.

Now that the first computer is set up and dotfiles are on your iDisk you can go ahead and set up any other computers you want to synchronize. This time you will just be deleting the dotfiles from the computer:

rm -f .profile .bashrc .bash_profile .bash_login .gitconfig .gitk .gitignore .irbc .rails-plugin-sources

Once they are deleted your can repeat the symlink step you did on the first computer. One thing to watch for: make sure your iDisk on both computers have synced. They will sync automatically every 10 minutes or so, but if you’re real quick, the dotfiles might not show up yet on the other computers. Either go get a cup of coffee or try doing a manual sync on all your computers, starting with the first one you set up. Once they are synced, do your symlinks and test to make sure everything works.

That’s it. You’re done! Now, if you make a dotfile change on any of the computers, it will be automatically synced to the others. It’s a very simple thing, but a huge convenience.