Back to RSS

January 26, 2018

I've started using RSS again. For a long while there I was mostly remembering to check a handful of sites and relying on Twitter for everything else. Both of these are bad solutions: haphazard, unorganized, habit-forming. So, RSS: it's been around a long time, it still works, and almost every site I read uses it.

Some sites don't, though. I started writing a scraper that would generate an RSS feed, but... too much work, and surely a solved problem. I tried Apify, which crawls websites for you, and lets you scrape out the content you want into an RSS feed (and lots of other types of feeds, too). I set up a few scrapers, added the feeds to my RSS reader, and let them be.

When I revisited them a month or so later, they seemed to have not been running at all. I might have configured something incorrectly, but I'm pretty sure they were all set up to run once a day. There seemed to be no record of them ever running at all, even though I'd tested them all before adding them to my reader.

So... I wrote the custom scraper. It ended up being pretty easy, mostly because almost everything I needed had already been written: axios for fetching the sites, cheerio for parsing what I wanted from the scraped data, jstoxml for converting data to an RSS feed, and express for serving the feed. scrape-to-feed is the end result. I deployed it with now; here's an example feed, which scrapes story headlines from the New York Times: The whole thing took maybe three hours or so, start to finish. And it's actually much more convenient than having to click through the UI of a web app.

Give it a spin if you like.

©2018 Zach Green