TV Sort: Engineering the ultimate TV episode ranking system

Cover Image for TV Sort: Engineering the ultimate TV episode ranking system

TL;DR

I built TV Sort, an open-source episode ranking game that ditches the usual 1-10 ratings for a unique, human-driven sorting algorithm that pits episodes against each other. The game draws from TMDB, IMDb, and Wikipedia, with a bit of GPT thrown in for good measure.


My mother and I are huge Frasier fans. We've both watched and rewatched that show more times than we can count. And this year, while spending the Christmas week in Ireland with my family, we got to talking about which Frasier episode is our absolute favorite. And I started thinking: There are many solid episodes; how would they all rank against each other?

The easy way out is to use a 1-10 ranking system. But that's not good enough. There are lots of episodes that I rank 10. And how do I know the difference between a 7 and an 8? What -is- the difference?

Could I come up with a better way to rank episodes?

The algorithm

If I ask you to rank all episodes of a show, you'll find it quite tricky beyond saying, "Oh, I especially like X and Y, but I really don't like Z". But if I ask you to tell me which of two episodes you prefer, that's much easier! It's easy because you can say, "I really like A", and move on.

The first thing that pops into mind when thinking about this is ranking systems like Elo. Elo would be perfect for this, except that there is no stopping point at which you're -sure- that the rankings are properly defined - you can keep "battling" episodes forever. It also doesn't minimize the number of comparisons a person would have to make, which means they could be sitting there for days comparing 200+ episodes.

I had to come up with something better. And then it hit me: That… sounds a lot like a sorting algorithm, but one that waits for human input rather than being fully automated. If someone could provide the input for all comparisons made by the sorting algorithm, creating a ranked list of episodes would be possible. And so that's what I decided to go for.

I had previously seen Leonid Shevtsov's MonkeySort, which implements precisely what I've just described: A human-driven Quicksort algorithm. I decided to read up on it to understand how it does its job and re-implement it to drive this.

Note: Quicksort isn't the best algorithm for this; in the worst case, it's O(n²). A merge sort would be ideal here; it's stable and always O(n log n). Its only downside is that it uses more memory, but we're not sorting huge lists here, so that's not a concern. I've raised an issue (for myself) to look into this later on.

The data

Once I had the algorithm figured out, I had to think of what it would take to make it something a person could use:

  • A web app that shows episodes side by side, letting you decide on the outcome.
  • Episodes should include stills and descriptions to help jog your memory.
  • People should be able to use this web app for more than just Frasier. 😅

The first part was straightforward. I already had the algorithm. But grabbing the data for different shows and episodes was something that I had to figure out. Obviously, IMDB has a ton of data, but their API has a cost associated with it. There's a third-party IMDB API, but they charge for it, too. I wanted to avoid paying monthly fees for a web app like this, where I wouldn't need the API after fetching the relevant show details.

Thankfully, I knew of another place where I could get the data: The Movie Database

My home media server (Plex) uses it, as does Filebot, which I use for organizing media files. I decided to check it out, and it turns out it has an entirely free API with an incredibly generous rate limit.

With that in mind, I built the web app using Next.js, with Knex driving the database. People could search for a show, and it would fetch shows from TMDB. Once someone clicked on the show, the "show sorter" would start. In the background, the backend would fetch and store all the necessary data for all episodes so that everything could be fed down to the browser in one go, enabling the person to go through the entire sorting process locally instead of keeping them waiting for the backend between comparisons.

It was all working well, except for one big problem: Assessing an episode by reading the descriptions put a lot of cognitive load on the person. Doubly so if the descriptions were wordy. My goal was to make it as easy as possible to quickly grasp what an episode was about so you could get through the comparisons more easily.

I needed to clean up these descriptions for all episodes of all shows.

The LLM

Turning to an LLM was the obvious answer. I decided to feed the descriptions through GPT to get it to spit out 3 concise sentences describing the plot. I wanted to use GPT-4, but it wouldn't have been feasible cost-wise to do it for what is essentially an unlimited number of episodes and TV shows. Using GPT-3.5 meant I had to do a bit more to get it to work well. After tinkering with the prompt a bit, I came up with an excellent way to do it (first, spit out -all- plot points, then find 3 significant points to showcase, and then, if the generated sentences are too long, shorten them a bit).

That worked quite well, but the descriptions being fed to the LLM were not great: TMDB's own guidance says, "When writing an overview, try to keep it short, concise, and free of spoilers."

But we -need- spoilers. We -need- to know the significant things that happen during the episode so that it's very obvious to someone comparing these episodes.

So, I set out to get better data. TMDB includes external IDs with each episode, which means linking an episode to the relevant page on IMDB is easy. IDMB has a "plot summaries" and "synopsis" section, and I was able to grab them with basic HTML scraping. Easy.

That worked well enough. But some episodes didn't have enough detail in IMDB (like for The Office). I needed better data.

That's when I turned to Wikipedia. I had already thought about it before, but Wikipedia is difficult: There is no direct link to episodes from TMDB, and many shows don't even have dedicated episode pages; they have season pages listing every episode. I had put it on the back burner until I realized how important it was to get data from there.

Once I started looking into it, things turned out to be simpler than I thought. It turns out that TMDB does have a Wikidata ID for every episode (if the episode has a specific page) and for every show. That meant that I could use SPARQL and the Wikidata Query Service to find the Wikipedia URL for the episode (and if an episode didn't have its own page, find the URL for that episode's season, where the episode is listed and described).

The query to get the URL for a show's season:

SELECT ?wppage WHERE {
    wd:${wikidataId} wdt:P527 ?season . # P527 (has part)
    ?season wdt:P31 wd:Q3464665; # P31 (instance of) Q3464665 (television series season)
    p:P179 [pq:P1545 "${seasonNumber}"] . # P179 (part of the series) P1545 (series ordinal)
    ?wppage schema:about ?season .
FILTER(contains(str(?wppage),'//en.wikipedia'))
}
I'm not going to lie; I have -no- idea what I was doing with SPARQL.
I'm not going to lie; I have -no- idea what I was doing with SPARQL.

I spent only an hour trying to fiddle with it (and reading the Wikidata SPARQL tutorial). Even though I was able to learn enough through trial and error to get these queries working, I was definitely humbled. It was -nothing- like SQL.

I understand the query well enough to know that it's looking for the Wikipedia URL for the season of a show with a specific season number. But I don't understand the syntax well enough to know why it works. I'm sure there's a better way to do it, but I got it working, and that's all that matters.

Once I had the right URL, scraping the content (the Plot section of an episode page, or the relevant table cell in a season page) was easy.

And with that done, I was finally able to turn this (from Stress Relief (The Office)):

[
    "Dwight's fire safety seminar goes wrong.",
    "Michael organizes a roast for Stanley.",
    "Andy believes Pam and Jim are film gurus."
]

Into:

[
    "Dwight's realistic fire alarm causes Stanley's heart attack.",
    "Pam's father seeks a separation from her mother.",
    "Michael organizes a comedic roast for himself in the warehouse."
]

It might seem small, but it's the difference between someone not being sure what episode it is and immediately knowing and being able to compare it to other episodes.

Performance

One of the biggest problems with using an LLM, of course, is that loading a 200-episode show for the first time was no longer instant. Whoever was loading the show for the first time had to wait for the LLM to generate these plot points, which could take several minutes. I didn't want to complicate my infrastructure for this, so I decided to use pg-boss, which is a -brilliant- job scheduling system for Node that uses only Postgres (the database that I was already using for storing show/episode information).

I was able to shunt the generation of plot points to the background so that when someone tries to load a show for the first time, it's available for ranking instantly, using the original TMDB episode descriptions until the LLM-generated plot points become available. The browser will keep pinging the backend in the background until it has all the plot points, updating the UI as they become available.

Moving the work to the background also makes it easy to avoid doubling up work when a show is accessed multiple times while processing.

Finishing Touches

Developing this, I had to ensure the interface worked well on mobile. I iterated several times to find something that would fit well, even on tiny mobile phone screens. I also put the comparison buttons at the bottom so that someone on their phone could quickly get through a bunch of episode comparisons without having to move their fingers.

I also wanted to ensure there were show-specific landing pages to which people could be linked. Those pages don't have much content at the moment, but I'd like to add more information to them in the future - like a list of the top 10 episodes or the best and worst episodes of each season.

I think it looks pretty spiffy, especially on mobile.
I think it looks pretty spiffy, especially on mobile.
I'm sure the UX can improve further.
I'm sure the UX can improve further.

While preparing to deploy this, I obviously had to work on the little details, like containerizing it, adding error tracking, sitemap.xml, robots.txt, and even OpenGraph data to make it easy for people to share.

Open Graph (OG) images are the images that show up when you share a link on iMessage, Slack, Twitter, or any other place.

Thinking about how to make this fun for sharing, I started thinking about how Next.js lets you generate OG images dynamically. I decided to make the image include the show's poster.

I think this does its job perfectly when it comes to drawing people in.
I think this does its job perfectly when it comes to drawing people in.

It's live, and open source.

The journey from a casual holiday debate to a fully functional game was challenging and rewarding. TV Sort is now live, and I encourage you to try it out and contribute your rankings and insights.

One of the byproducts of this adventure has been a deeper appreciation for the wealth of data available through TMDB. I encourage everyone to embrace the spirit of community-driven data refinement and contribute back to TMDB, enhancing the data quality for everyone. Especially stills!

The project is open-source under the AGPL license. There is still plenty to improve: I have a sizeable public list of planned features and improvements on GitHub, including the ability to see all your rankings (in progress or completed) on the home page, the ability to rank specific seasons rather than the entire show (and have it automatically count towards a ranking for the whole show), and the ability to undo mistakes. Also, one day, I'd like to use all these rankings to form a global consensus on where precisely each episode of every show stands, beyond simplistic 1-10 rankings. If you're a TypeScript developer and interested in this, I'd welcome any help.

And, whether you're a TV aficionado eager to curate your ultimate episode list or someone who loves a good sorting algorithm, head to TV Sort and start ranking.

And, of course, check out my favorite Frasier episodes. Making this list is the whole reason this started!

If you have any thoughts to share, ideas for improvements, or issues you've encountered, reach out to me by email, Twitter, or wherever else you might find me. I'd love to get your perspective on this (if only to know that I'm not alone in caring about this!).