Effortless Web Scraping With Cloudflare Workers
Some time ago I had the need to grab information from a website to display some visitor numbers onto a widget powered by Scriptable (opens in a new tab). Unfortunately, that website didn’t have an API for its numbers, it only displayed it in a table on the website. This is where a web scraper is needed.
There are endless ways and articles about web scrapers, so I want to share a little tool I used for that job. I do like it because: It’s easy to use, fast, and free. All you need is an account at Cloudflare (opens in a new tab).
The tool I want to highlight is web.scraper.workers.dev (opens in a new tab) by Adam Schwartz (opens in a new tab). Give it a URL and a CSS selector and you’re done! If the website ever goes down, you can grab the code on GitHub (opens in a new tab) and host it yourself.
If for example you’d use the URL example.com
and the CSS selector h1
you’d get the result:
{ "result": { "h1": ["Example Domain"] }}
You can also generate a permalink to it:
https://web.scraper.workers.dev/?url=example.com&selector=h1&scrape=text&pretty=true
I’m now using such a permalink as my API for those website numbers. I find using Cloudflare Workers for web scraping to be quite approachable and Adam’s tool makes it even easier.
Oh, and in case you’re wondering how I use the table data, here’s a small code playground to show it:
const incoming = "Name Max Current PlaceA 144 51 PlaceB 50 25 PlaceC 200 130" function chunk(arr: Array<string>, size = 3): Array<Array<string>> { const bulks: Array<Array<string>> = [] for (let i = 0; i < Math.ceil(arr.length / size); i++) { bulks.push(arr.slice(i * size, (i + 1) * size)) } return bulks } function parseString(input: string, size = 3): ParseStringResponse { const arr = input.split(" ") const columns = arr.slice(0, size).map((i) => ({ heading: i, property: i.toLowerCase() })) const rawData = arr.slice(size, arr.length) const chunkedData = chunk(rawData) const data = chunkedData.map((item) => ({ name: item[0], max: item[1], current: item[2], })) return { columns, data } } export const output = parseString(incoming) type ParseStringResponse = { columns: Array<{ heading: string property: string }> data: Array<{ name: string max: string current: string }> }