Google+ Followers

zondag 9 maart 2014

Everyday PostgreSQL: Counting votes.

Problem: a movie fansite allows users to vote, and rather than try to block users from voting multiple times, they just log every vote and make their minds up later.


DROP TABLE IF EXISTS movies;
DROP TABLE IF EXISTS votes;

CREATE TABLE movies (name VARCHAR(100));
CREATE TABLE votes (userid INT, moviename VARCHAR(100));

INSERT INTO movies (name) VALUES ('Austin Powers'), ('Tank girl'), ('The fifth element');
INSERT INTO votes (userid, moviename) VALUES (1,'Tank girl'),(1,'Tank girl'),(1,'Tank girl'),(1,'The fifth element'),(2,'Austin Powers'),(2,'Tank girl'),(3,'Tank girl'),(3, 'The fifth element');


So, how do you count unique votes per user, per movie?

One way is to do exactly what the problem describes; count the number of unique votes per movie.
First; create a list of unique votes per movie, using a plain old DISTINCT:

SELECT DISTINCT userid, moviename FROM votes;

Then use that as a sourc of vote information  in a nice CTE:

WITH distinctvotes AS (SELECT DISTINCT userid, moviename FROM votes)
SELECT COUNT(*),moviename FROM distinctvotes INNER JOIN movies ON movies.name=distinctvotes.moviename GROUP BY moviename
ORDER BY 1 DESC;

This approach also enables you to define more rules for which votes count and which don't. The CTE query could for example exclude al users who voted more than ten times in the same minute. Putting that query in a CTE leaves the actual count query clean and easy to manage.