Joe was hired into a company to maintain a PHP application with a MySQL database. They would have random reports of duplicate data and data with missing information. When Joe looked into the problem 9 times out of 10 it was due to an issue where a user deleted one row but the same delete operation didn’t delete rows related to the original row. This caused that orphaned data to show up in some reports and cause odd join results in others.
SQL provides an excellent way to make sure that our databases don’t have orphaned rows or invalid relationships. Using them is a must for all table relationships.
Why Should We Use Foreign Key Relationships?
Let’s look at an example. In our application, we have a
users table and a
user_logins table to track our user accounts and their login history respectively.
We’re going to create a set of users for our experiments.
Now what we would want to do is insert a new row into
user_logins using an SQL statement like the one below.
But let’s say we have an error and instead of using the user’s id it uses the user’s id plus 10. In that case, we’ll have a query like the following.
Now what we would hope for is that instead of the row being inserted we would get an error so we could see the error and fix it. However, when we run the query it’s successful.
Now in this case this isn’t going to cause problems but when we eventually get our 11th user it will create some odd results where a user logged in before they were ever created.
Foreign Keys to the Rescue
Most variants of SQL have a feature called foreign key constraints. This allows us to tell our database engine that a column references another column. Setting up this relationship will cause the database engine to fail when it creates, updates, or deletes rows that create invalid data like the example above.
In this case, we want to tell the database engine that the
user_id column in the
user_logins table is always going to be a value in the
id column of the
As a quick aside, this relationship isn’t limited to being separate tables. We could have a
supervisor_id column in the
users table that describes a supervisor/direct relationship and link them using foreign keys.
How to Create Our Foreign Key Relationship
To create our relationship we’re going to use the
alter table command.
For our example, this is going to look like the following.
Now because we have “bad” data in our
user_logins table we’re going to get the following error.
There are a couple of ways we can fix this. We can go through and delete the rows that have problems, we can delete all of the data, or we can temporarily disable the checks for the foreign keys during this operation. We do not recommend disabling the checks as this will cause us to still have rows with bad data which will cause problems in the future. This is a common solution on the web that it is a big no-no.
If this were production database we would go through and remove the individual rows that are problems using a left join but because this is just a demo application we’re just gonna delete all of the rows within the table.
Now we can successfully add the foreign key constraint.
Now that our foreign key constraint has been added let’s look at what happens when we try to insert the bad
As we can see the foreign key constraint has prevented us from adding this bad data into the database. We now no longer need to worry about the fact that a new user will have bad data associate with their account. This should also show us that there is an error long before it causes problems.
What Happens When We Try To Delete
Now that we’ve shown what happens when we try to add a new row that does not have a matching relationship, let’s look at what happens when we try to delete a row while it still has a reference to it.
To test this will insert a new row.
Now when we try to delete the
user row with an id of 1 we get an error message.
This is how the foreign key constraint prevents us from orphaning records.
MySQL can automatically delete these rows using a cascading delete. Without the cascading delete, we need to manually delete rows in the referencing tables which takes more time but causes less accidental deletions.
We’re going to reset our database to set this up.
Now we’ll run the command that sets up the foreign key so it automatically deletes when the referenced row is deleted. Note that it’s essentially the same command as above but it has an
on delete cascade at the end.
Let’s look at what happens when we insert a new
user_logins row and then delete the
users row that is associated with it.
Now we can double-check to make sure that the rows were deleted.
There is a slight performance penalty when using foreign key constraints because the database must verify that all the columns are valid before it performs the operation. This tends to not be noticeable with more commands and the data integrity improvements easily outweigh the performance hit.
Foreign key constraints provide an excellent way to prevent orphaned and invalid data in our database. It is a must for all columns that reference another table and should be a requirement.
Scott is the Director of Technology at WeCare Connect where he strives to provide solutions for his customers needs. He's the father of two and can be found most weekends working on projects around the house with his loving partner.