Last week I made a small update error on my application and I ended up with duplicates values in a table. Of course, this would not have happened if I had a unique key, but as I check before inserting, I thought I was safe.
Unfortunately, as I couldn’t delete everything and just start updating data again, I had to figure out how to delete duplicates rows.
As a first step, I run a simple query to find out how much I was in trouble.
SELECT Place_ID, Event_ID, StartDate, COUNT(*) FROM Showings GROUP BY Place_ID, Event_ID, StartDate HAVING COUNT(*) > 1
Good news first: there are no triplets :)
Less good news: I have more than a thousand rows to delete. So no way to do this by running one request after the other…
Good thing: since my table has a primary key, I can identify duplicate data:
SELECT Place_ID, Event_ID, StartDate, MAX(Showing_ID) AS ID FROM Showings GROUP BY Place_ID, Event_ID, StartDate HAVING COUNT(*) > 1
This way, I find the IDs of all the rows added when there was already a record with the same data (Place_ID, Event_ID and StartDate). I only have to delete these useless values (since the others were there first) :
DELETE FROM Showings WHERE Showing_ID IN ( SELECT MAX(Showing_ID) FROM Showings GROUP BY Place_ID, Event_ID, StartDate HAVING COUNT(*) > 1 )
Sometimes, IT is not that complicated.
Version en français : Supprimer des doublons en SQL.