It's 4:45 on a Friday afternoon and you're browsing Reddit to kill the last few minutes of the day when someone comes running into your office:
Pam accidentally deleted a customer and we need to run bills on Monday morning.
Disaster has struck and you're the only one who can fix it. Time to pull the backups from the morning and get their data back. So much for getting out by 5 and maybe you're whole evening but what if there's a better way...
Traditional SQL Delete and Restore
In most applications that save data into an SQL database when a user deletes a record it happens like this:
delete from criticaldata where id = 12;
After that command is run the record is gone from the system forever. If your really lucky the code even cleans up after itself by issuing commands that delete rows that depend on the original row (or automatically if foreign keys are setup to do this):
delete from crititaldata2 where criticaldataId = 12; delete from crititaldata3 where criticaldataId = 12;
This behavior might now require you to undelete hundreds of records. So how are you going to get the data back?
Hopefully, your backups are good so the data can be restored from 8 AM backup. Then it's just a simple process of restoring the backup to a different database, extracting all the rows that were deleted, and restoring the data back to the original database. If you're using mysql you can even use
mysqldump --where to get a backup of just those records you need.
But what about all those records that have been created between the last backup and when the record was deleted? It might only represents a few hours of someones time. If you're unlucky it represents thousands of man hours, data created by your clients, or data that can't be recreated. Not a great situation to be in.
What if there's a better way?
Lazy Delete and Restore
What we really want is something like the Trash/Recycle Bin on your computer. A temporary space where deleted items are sent before they're purged from the system so you or your users can undo their actions.
How do we do this?
Stop using the delete command. Instead, add a column named 'deleted' to flag that record as deleted. Have the column defaults to null so if the deleted column isn't null then it's been deleted. When the user goes to delete an object you set this column to the ID of the user that deleted it. You can also have a separate table that contains delete events and use this ID so you can quickly find all the rows that were deleted due to a user deleting an item.
Do the same thing with files. Instead of reading the files directly off the disk keep track of them in the database and use the database as a kind of file system. DO NOT include the file contents in the database.
On most projects we have a Trash page that allows the user to restore the data they've deleted. That way they don't even need to get us involved to restore their data.
Then we have a cron job that runs every night to delete the flagged data that was deleted more than 30 days ago.
This process isn't without some downsides.
The biggest is that it causes extra work. For every query, you'll have to specify that it only return records where
deleted is null. There will be cases where when you miss this and deleted data leaks through into the UI (it sucks to hear someone say 'I deleted this but it's still showing up'). This can be alleviated by using views that only return undeleted data but it's another thing that you'll have to manage.
You'll also need to be diligent about making sure that when you delete an item all the rows that depend on that row are also deleted. You can't rely on cascading deletes and you might end up with "orphaned" records.
The other downside is that it uses extra space in your database. Because the records still exist, your database is still indexing the rows, they're still being backed up, and they're still sitting in RAM. With the cost of drives and RAM I really don't think this is that much of a problem unless your users delete a lot of data. In our systems we have less then 1% of the records flagged as deleted.
Happy Customers and Happy Programmers
Ultimately, by being able to restore your user's data faster you'll make them happier and you'll waste less time restoring data that was accidentally deleted and more time creating new features. It may be extra work for you up front but the long term benefits are worth it.
Scott is the Director of Technology at WeCare Connect where he strives to provide solutions for his customers needs. He's the father of two and can be found most weekends working on projects around the house with his loving partner.