MySQL Logo

Joe was hired into a company to maintain a PHP application with a MySQL database. They would have random reports of duplicate data and data with missing information. When Joe looked into the problem 9 times out of 10 it was due to an issue where a user deleted one row but the same delete operation didn’t delete rows related to the original row. This caused that orphaned data to show up in some reports and cause odd join results in others.

SQL provides an excellent way to make sure that our databases don’t have orphaned rows or invalid relationships. Using them is a must for all table relationships.

Why Should We Use Foreign Key Relationships?

Let’s look at an example. In our application, we have a users table and a user_logins table to track our user accounts and their login history respectively.

mysql> create table users (
    id int(11) not null primary key auto_increment,
    email varchar(255) not null
);
Query OK, 0 rows affected (0.01 sec)

mysql> create table user_logins (
    id int(11) not null primary key auto_increment,
    login_time datetime not null,
    user_id int(11) not null
);
Query OK, 0 rows affected (0.01 sec)

We’re going to create a set of users for our experiments.

mysql> insert into users (email) values
    ('test1@me.com'), 
    ('test2@me.com'), 
    ('test3@me.com'), 
    ('test4@me.com');
Query OK, 1 row affected (0.00 sec)

mysql> select * from users;
+----+--------------+
| id | email        |
+----+--------------+
|  1 | test1@me.com |
|  2 | test2@me.com |
|  3 | test3@me.com |
|  4 | test4@me.com |
+----+--------------+
4 rows in set (0.00 sec)

Now what we would want to do is insert a new row into user_logins using an SQL statement like the one below.

mysql> insert into user_logins (user_id, login_time) values (1, utc_timestamp());

But let’s say we have an error and instead of using the user’s id it uses the user’s id plus 10. In that case, we’ll have a query like the following.

mysql> insert into user_logins (user_id, login_time) values (11, utc_timestamp());

Now what we would hope for is that instead of the row being inserted we would get an error so we could see the error and fix it. However, when we run the query it’s successful.

mysql> insert into user_logins (user_id, login_time) values (11, utc_timestamp());
Query OK, 1 row affected (0.00 sec)

Now in this case this isn’t going to cause problems but when we eventually get our 11th user it will create some odd results where a user logged in before they were ever created.

Foreign Keys to the Rescue

Most variants of SQL have a feature called foreign key constraints. This allows us to tell our database engine that a column references another column. Setting up this relationship will cause the database engine to fail when it creates, updates, or deletes rows that create invalid data like the example above.

In this case, we want to tell the database engine that the user_id column in the user_logins table is always going to be a value in the id column of the users table.

As a quick aside, this relationship isn’t limited to being separate tables. We could have a supervisor_id column in the users table that describes a supervisor/direct relationship and link them using foreign keys.

How to Create Our Foreign Key Relationship

To create our relationship we’re going to use the alter table command.

alter table table1 add foreign key (column_name_in_table1) references table2 (column_name_in_table2);

For our example, this is going to look like the following.

alter table user_logins add foreign key (user_id) references users (id);

Now because we have “bad” data in our user_logins table we’re going to get the following error.

mysql> alter table user_logins add foreign key (user_id) references users (id);
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails 
(`examples`.`#sql-4c1_17`, CONSTRAINT `#sql-4c1_17_ibfk_1` FOREIGN KEY (`user_id`)
REFERENCES `users` (`id`))

There are a couple of ways we can fix this. We can go through and delete the rows that have problems, we can delete all of the data, or we can temporarily disable the checks for the foreign keys during this operation. We do not recommend disabling the checks as this will cause us to still have rows with bad data which will cause problems in the future. This is a common solution on the web that it is a big no-no.

If this were production database we would go through and remove the individual rows that are problems using a left join but because this is just a demo application we’re just gonna delete all of the rows within the table.

mysql> delete from user_logins;
Query OK, 1 row affected (0.00 sec)

Now we can successfully add the foreign key constraint.

mysql> alter table user_logins add foreign key (user_id) references users (id);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now that our foreign key constraint has been added let’s look at what happens when we try to insert the bad user_logins data.

mysql> insert into user_logins (user_id, login_time) values (11, utc_timestamp());
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
(`examples`.`user_logins`, CONSTRAINT `user_logins_ibfk_1` FOREIGN KEY (`user_id`)
REFERENCES `users` (`id`))

As we can see the foreign key constraint has prevented us from adding this bad data into the database. We now no longer need to worry about the fact that a new user will have bad data associate with their account. This should also show us that there is an error long before it causes problems.

What Happens When We Try To Delete

Now that we’ve shown what happens when we try to add a new row that does not have a matching relationship, let’s look at what happens when we try to delete a row while it still has a reference to it.

To test this will insert a new row.

mysql> insert into user_logins (user_id, login_time) values (1, utc_timestamp());
Query OK, 1 row affected (0.00 sec)

Now when we try to delete the user row with an id of 1 we get an error message.

mysql> delete from users where id = 1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
(`examples`.`user_logins`, CONSTRAINT `user_logins_ibfk_1` FOREIGN KEY (`user_id`)
REFERENCES `users` (`id`))

This is how the foreign key constraint prevents us from orphaning records.

Cascading Deletes

MySQL can automatically delete these rows using a cascading delete. Without the cascading delete, we need to manually delete rows in the referencing tables which takes more time but causes less accidental deletions.

We’re going to reset our database to set this up.

mysql> drop table user_logins;
Query OK, 0 rows affected (0.00 sec)

mysql> create table user_logins (
    id int(11) not null primary key auto_increment,
    login_time datetime not null,
    user_id int(11) not null
);
Query OK, 0 rows affected (0.02 sec)

Now we’ll run the command that sets up the foreign key so it automatically deletes when the referenced row is deleted. Note that it’s essentially the same command as above but it has an on delete cascade at the end.

mysql> alter table user_logins add foreign key (user_id) references users (id) on delete cascade;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Let’s look at what happens when we insert a new user_logins row and then delete the users row that is associated with it.

mysql> insert into user_logins (user_id, login_time) values (1, utc_timestamp());
Query OK, 1 row affected (0.00 sec)

mysql> delete from users where id = 1;
Query OK, 1 row affected (0.00 sec)

Now we can double-check to make sure that the rows were deleted.

mysql> select * from user_logins;
Empty set (0.00 sec)

Downsides

There is a slight performance penalty when using foreign key constraints because the database must verify that all the columns are valid before it performs the operation. This tends to not be noticeable with more commands and the data integrity improvements easily outweigh the performance hit.

Conclusion

Foreign key constraints provide an excellent way to prevent orphaned and invalid data in our database. It is a must for all columns that reference another table and should be a requirement.