January 26, 2018
I've started using RSS again. For a long while there I was mostly remembering to check a handful of sites and relying on Twitter for everything else. Both of these are bad solutions: haphazard, unorganized, habit-forming. So, RSS: it's been around a long time, it still works, and almost every site I read uses it.
Some sites don't, though. I started writing a scraper that would generate an RSS feed, but... too much work, and surely a solved problem. I tried Apify, which crawls websites for you, and lets you scrape out the content you want into an RSS feed (and lots of other types of feeds, too). I set up a few scrapers, added the feeds to my RSS reader, and let them be.
When I revisited them a month or so later, they seemed to have not been running at all. I might have configured something incorrectly, but I'm pretty sure they were all set up to run once a day. There seemed to be no record of them ever running at all, even though I'd tested them all before adding them to my reader.
So... I wrote the custom scraper. It ended up being pretty easy, mostly because almost everything I needed had already been written:
axios for fetching the sites,
cheerio for parsing what I wanted from the scraped data,
jstoxml for converting data to an RSS feed, and
express for serving the feed.
scrape-to-feed is the end result. I deployed it with
now; here's an example feed, which scrapes story headlines from the New York Times: https://feeds-thnykrwoda.now.sh/nyt-example-feed. The whole thing took maybe three hours or so, start to finish. And it's actually much more convenient than having to click through the UI of a web app.
Give it a spin if you like.