Who are you?

My name is Levi McCormick.

Why did you make this?

I often struggle to go back to a particular video and find the quote or comment that I remember. Since YouTube content isn’t explicitly searchable, I made this site to make the content discoverable again.

How does it work?

It’s fairly simple, but getting it all working was not. Here’s the process:

  1. A script discovers the video url.
  2. Another script downloads the subtitle file.
  3. Yet another script takes the subtitle file and parses it into html. Using the timecode data from the subtitle file, it links to that time in the video for easy playback. I’ve tried to make it break up the text into chunks based on length, speakers, music, etc. All this content is written to a Markdown file. This part was took a lot of time. 😁
  4. The Markdown is interpreted into a static html site with Hugo.
  5. The resulting static site is hosted with AWS S3. For more info on this, check out Static Hosting on my website.

Why static hosting?

Cheap and scalable. One thing I’ve learned over the years is when you build things like this, you need the most robust infrastructure you can get, especially with a potential audience as large as Gary’s. Often known as the Slashdot Effect, a small website will crumble under the sudden attention of a huge audience. Here the combination of S3 and CloudFront will be able to handle everything the internet can throw at it.

How does the search work?

I’m relying on Google Site Search to index all the content and make it searchable. I could implement my own index and search infrastructure, but the costs would be very high, both in time and expenses. As a fan project, I can’t afford that.

Why can’t I find a particular video?

Either the video hasn’t been indexed by Google yet, or it wasn’t captured by the crawler. Not all videos have subtitles uploaded with them. Those rely on YouTube’s automatic transcription, which is still not very good. I’ll include them in the future. If it hasn’t been indexed, then we just have to wait for the Google bot to show up and crawl it again.

What’s next?

I need to finish the automation that takes push notifications from YouTube uploads and automatically publishes the new video transcripts to the site. I have all the pieces in place, it’s just a matter of plumbing it together.