offbyone.tech

Parsing URLs from text using native web tech

Here's one that that's a lot easier than it used to be.

Say you're rendering a bunch of text, and that text may or may not contain one or more URLs. You'd like those URLs to be actual <a> tags when you render them, so that folks can click on them.

It used to be, you'd have to do a little footwork, parsing the text for the protocol, pathname etc... an error prone process. Or you were smart about it and just relied on a npm module that handled all the edge cases and gave you a decent API.

But, now you don't even need to write your own parsing logic nor do you need to add a dependency to handle it for you. You can use the URL browser API.

If you're not familiar with it, URL is a really handy API, that takes in a URL and spits out a bunch of helpful properties:

const url = new URL(
"https://offbyone.tech/parsing-urls-from-text-using-native-web-tech"
);
console.log(url);
// URL {
// hash: "",
// host: "offbyone.tech"
// hostname: "offbyone.tech",
// href: "https://offbyone.tech/parsing-urls-from-text-using-native-web-tech",
// origin: "https://offbyone.tech",
// password: "",
// pathname: "/parsing-urls-from-text-using-native-web-tech",
// port: "",
// protocol: "https:",
// search: "",
// searchParams: URLSearchParams { },
// username: ""
// }

This replaces a bunch of stuff that we used to need custom code for, but it's also super handy for parsing URLs:

const text = `
Aut culpa sapiente autem quia enim. https://example.com Sequi consequatur nisi sapiente atque ut pariatur. Quibusdam veniam et repellendus ducimus eaque explicabo quia. Cumque quidem quaerat quia sit commodi voluptatem. Architecto libero et illum. Blanditiis https://duckduckgo.com modi quidem qui libero veniam in et. Amet autem fugiat nobis ex dolorem animi quidem. Non pariatur autem numquam et occaecati sapiente sit. Dolores hic fugit dolorem fuga voluptas. Possimus et cupiditate officiis voluptas. Voluptas qui minima sit sunt excepturi blanditiis dolore expedita.
`;
const words = ;
return text.split(" ")
.map((str, idx, arr) => {
const text = idx + 1 !== arr.length ? `${str} ` : str;
try {
const url = new URL(str); // If this doesn't throw an error, it's a URL!
return `<a href="${url.href}">${text}</a>`;
} catch (err) {
// Error! It's not a URL, so we'll just proceed with it as a regular string
return text;
}
})
.join("");

Here's that code snippet in action:

Aut culpa sapiente autem quia enim. https://example.com Sequi consequatur nisi sapiente atque ut pariatur. Quibusdam veniam et repellendus ducimus eaque explicabo quia. Cumque quidem quaerat quia sit commodi voluptatem. Architecto libero et illum. Blanditiisi https://duckduckgo.com modi quidem qui libero veniam in et. Amet autem fugiat nobis ex dolorem animi quidem. Non pariatur autem numquam et occaecati sapiente sit. Dolores hic fugit dolorem fuga voluptas. Possimus et cupiditate officiis voluptas. Voluptas qui minima sit sunt excepturi blanditiis dolore expedita.

This works because new URL() just throws an error if it receives anything other than a URL.

It's worth noting that that snippet won't quite work in React unless you use dangerouslySetInnerHTML. Here's a React-safe approach:

export default () => {
const text = `
Aut culpa sapiente autem quia enim. https://example.com Sequi consequatur nisi sapiente atque ut pariatur. Quibusdam veniam et repellendus ducimus eaque explicabo quia. Cumque quidem quaerat quia sit commodi voluptatem. Architecto libero et illum. Blanditiis https://duckduckgo.com modi quidem qui libero veniam in et. Amet autem fugiat nobis ex dolorem animi quidem. Non pariatur autem numquam et occaecati sapiente sit. Dolores hic fugit dolorem fuga voluptas. Possimus et cupiditate officiis voluptas. Voluptas qui minima sit sunt excepturi blanditiis dolore expedita.
`;
return (
<blockquote>
{text
.split(" ")
.reduce((acc, str, idx, arr) => {
const text = idx + 1 === arr.length;
try {
const url = new URL(str); // If this doesn't throw an error, it's a URL!
acc.push(<a href={url.href}>{text}</a>);
} catch (err) {
// Error! It's not a URL, so we'll just proceed with it as a regular string
if (typeof acc[acc.length - 1] === "string") {
acc[acc.length - 1] = acc[acc.length - 1] + text;
} else {
acc.push(text);
}
}
return acc;
}, [])
.map(item => item)}
</blockquote>
);
};

And the result:

Aut culpa sapiente autem quia enim. https://example.com Sequi consequatur nisi sapiente atque ut pariatur. Quibusdam veniam et repellendus ducimus eaque explicabo quia. Cumque quidem quaerat quia sit commodi voluptatem. Architecto libero et illum. Blanditiis https://duckduckgo.com modi quidem qui libero veniam in et. Amet autem fugiat nobis ex dolorem animi quidem. Non pariatur autem numquam et occaecati sapiente sit. Dolores hic fugit dolorem fuga voluptas. Possimus et cupiditate officiis voluptas. Voluptas qui minima sit sunt excepturi blanditiis dolore expedita.

That's it! That's all you have to do. Browser support is very good, but doesn't include IE. If you need to support older browsers, polyfill.io can cover this one:

<script
crossorigin="anonymous"
src="https://polyfill.io/v3/polyfill.min.js?flags=gated&features=URL"
></script>
©2018-2019 Zach Green