What is the best regex to match a URL in text?

There is no single perfect pattern, but John Gruber's liberal web-URL regex is the most battle-tested for free text. It handles http, https, www-prefixed, and bare domain-slash URLs, and it gets trailing punctuation and balanced parentheses right. For validating a single string rather than scanning prose, skip regex and use the URL() constructor instead.

Why does my URL regex include the period at the end of a sentence?

Because a period is a legal path and query character, so a greedy pattern keeps consuming it. The fix is to refuse a short list of trailing punctuation as the final character of the match, or to strip trailing punctuation after matching. Gruber's pattern bakes the punctuation exclusion into its final character class.

How do I match URLs that contain parentheses?

Wikipedia-style URLs like /Foo_(disambiguation) break naive patterns because the closing paren also ends a parenthetical sentence. Match one or two levels of balanced parentheses explicitly. Recursive regex can match any depth, but recursion is PCRE/Perl-only and unavailable in JavaScript, so cap it at two levels for portability.

How to Match a URL with Regex — and the 3 Edge Cases Most Patterns Miss

Q: Should I use regex or new URL() to validate a URL?

Use new URL() to validate one string — it follows RFC 3986, throws on invalid input, and parses out the protocol, host, and pathname for free. Use regex when you need to find URLs embedded in arbitrary text, which the URL constructor cannot do. They solve different problems.

You need to pull every URL out of a chat log, a Markdown file, or a block of user comments, and the obvious move is to write a regex. The trouble is that the naive pattern you reach for first works on https://example.com and then quietly mangles real input. To match a URL with regex reliably you have to plan for three specific failures: trailing punctuation, balanced parentheses, and links with no scheme. Here is each one, why it breaks, and the pattern that survives it.

TL;DR

A naive URL regex grabs the sentence's final period and comma — exclude trailing punctuation.
Wikipedia URLs with parentheses need explicit balanced-paren matching, up to two levels.
Bare links like bit.ly/foo have no https:// scheme — match domain-dot-domain-slash too.
For validating one string, skip regex and use the URL() constructor; it follows RFC 3986.
Test every pattern in a live regex tester against messy real input first.

How to match a URL with regex in JavaScript

The job splits into two completely different problems, and conflating them is the root cause of most broken patterns. Finding URLs inside free text is a search problem. Validating one isolated string is a parse problem. Regex is the right tool for the first and the wrong tool for the second.

The naive pattern and why it almost works

Most people start here, and on a clean input it looks fine:

/https?:\/\/\S+/g

It reads as "http or https, then ://, then one or more non-whitespace characters." Paste it into a tester with Visit https://example.com today and it matches https://example.com cleanly. The problem only shows up with realistic text, because \S+ is greedy and does not know where a URL ends and a sentence resumes.

Anchoring: scanning text vs validating one string

If your input is a single field that should contain exactly one URL and nothing else, anchor the pattern with ^ and $ so it must match start to finish. If your input is a paragraph, do not anchor — you want a global, unanchored search that can find several URLs in one pass. Picking the wrong mode here is why a pattern that "works on regex101" fails in production: the test string was one clean URL, the real data was a messy paragraph.

Set the flags before you blame the pattern

Two flags matter most for URL work. The g (global) flag is what lets matchAll() return every URL instead of just the first. The i (case-insensitive) flag matters because schemes and hosts are case-insensitive — HTTP://Example.COM is valid. Per RFC 3986, the scheme and host components are compared case-insensitively, so your pattern should be too.

Why your URL regex grabs the period at the end of a sentence

This is the first edge case, and it bites everyone. Write Read https://example.com/page. and the naive \S+ happily eats the final period, returning https://example.com/page. — a 404 waiting to happen.

A period is a legal URL character

You cannot just ban periods, because they appear inside hostnames (example.com) and paths (/v1.2/page.html). The dot is structural. The issue is purely positional: a trailing period at the very end of the match is almost always sentence punctuation, while a period in the middle is part of the URL.

Refuse a closing punctuation character

The portable fix is to forbid a small set of characters as the last character of the match. The widely-used approach from John Gruber's URL-matching regex ends the pattern with a negated class that excludes terminal punctuation:

[^\s`!()\[\]{};:'".,<>?«»“”‘’]

That class says "the URL may not end on whitespace or any of these punctuation marks," so the trailing ., ,, ;, or " is left behind for the sentence. The same logic handles a URL wrapped in quotes or followed by a comma in a list.

Or strip trailing punctuation after matching

If you would rather keep the pattern simple, match greedily and clean up afterward in code:

const cleaned = raw.replace(
  /[.,;:!?)\]'"]+$/,
  ""
);

This trims any run of trailing punctuation from the captured string. It is easier to read than baking the exclusion into the regex, at the cost of a second step. Both approaches are legitimate; pick the one your team will understand in six months.

How do I match URLs with parentheses in them

The second edge case is the nastiest, and it comes straight from the real world: Wikipedia. A URL like https://en.wikipedia.org/wiki/Cure_(album) contains a closing parenthesis that is part of the link. But authors also write parenthetical asides like (see https://example.com), where the closing paren is not part of the link.

The balanced-parens problem

Your pattern has to make opposite decisions about the same character depending on context. In (see https://example.com) the ) ends the sentence aside. In .../Cure_(album) the ) ends the URL. A simple "stop at )" rule breaks Wikipedia; a simple "include )" rule breaks parenthetical asides.

Match one or two levels explicitly

Gruber's pattern solves this by treating a balanced (...) group as a valid run inside the URL, while a lone unbalanced ) is treated as the boundary:

\(([^\s()<>]+|(\([^\s()<>]+\)))*\)

This matches a parenthesized chunk that may itself contain one nested pair — two levels deep. In practice that is enough: Gruber reported that across real-world reports, no URL ever needed more than two levels of nesting. The result correctly keeps (album) while dropping the aside's closing paren.

Why you cannot match arbitrary depth in JavaScript

To match parentheses nested to any depth you need a recursive pattern, and recursion in regex is engine-specific. PCRE and Perl support recursive subpatterns; JavaScript's RegExp does not. This is the same portability trap covered in why a regex passes in Python but fails in JS — features like recursion, possessive quantifiers, and some lookbehind forms are not universal. Capping at two levels keeps the pattern working everywhere.

How to match URLs without http:// (scheme-less links)

The third edge case is the one people forget until a user complains. Humans write bit.ly/abc, www.example.com, and example.org/page without ever typing https://. A pattern anchored on https?:// silently skips all of them.

Recognize the domain-dot-slash shape

The trick is to also accept the "something-dot-something-slash-something" shape that signals a bare web address. Gruber's web-URL pattern allows three entry points:

(?:
  https?://               # explicit scheme
  | www\d{0,3}[.]         # www. / www1. / www2.
  | [a-z0-9.\-]+[.][a-z]{2,4}/   # domain then slash
)

The third branch is what catches bit.ly/foo and is.gd/x/. It is deliberately loose: a bare example.com with no path is ambiguous (is "example.com" a URL or just a sentence noun?), so the pattern only treats it as a URL once a slash appears.

The tradeoff: liberal matching means false positives

A liberal pattern that catches bare domains will occasionally match things that are not links — a filename like config.local/backup, for instance. That is the cost of catching real user-typed URLs. Decide which error is cheaper for your app: missing a real link, or occasionally linkifying a non-link. Here is how the three matching strategies compare:

Strategy	Catches bare links	False positives
`https?://` only	No	Very few
Gruber liberal	Yes	Some
`URL()` constructor	N/A (single string)	None

For most "make links clickable" features, the liberal pattern wins because users expect www.foo.com to become a link. For security filtering, the stricter scheme-required pattern is safer.

Should I use regex or new URL() to validate a URL

If your actual goal is validating one string — not finding links in prose — stop writing regex. JavaScript ships a parser that already follows the spec.

Let the URL constructor do the parsing

The URL() constructor throws a TypeError on invalid input and hands you the parsed components for free:

function isValidUrl(s) {
  try {
    const u = new URL(s);
    return u.protocol === "https:"
        || u.protocol === "http:";
  } catch {
    return false;
  }
}

This is more reliable than any hand-rolled regex because it implements the WHATWG URL standard, parses internationalized domain names and ports correctly, and never suffers catastrophic backtracking. The protocol check is there because new URL("mailto:x") and new URL("javascript:alert(1)") both parse successfully — validity is not the same as "is an http link."

When you genuinely need both

A common pipeline uses each tool for what it is good at: a liberal regex to find candidate URLs in text, then new URL() to validate and normalize each candidate. The regex casts a wide net; the constructor rejects the junk. If you also need to safely embed a found URL into a query string or attribute, encode it first — see when to reach for a URL encoder rather than escaping characters by hand.

The modern option: URLPattern

For routing-style matching — "does this URL match /users/:id?" — the newer URLPattern API is purpose-built and far more readable than a regex. Browser support is still uneven in 2026, so check before relying on it in production, but it is the right long-term tool for path matching.

Putting it together: a checklist

Before you ship a URL-matching regex, walk through this list:

Decide: are you finding URLs in text, or validating one string? Pick regex or URL() accordingly.
Add the g and i flags for text scanning; anchor with ^...$ only for single-string validation.
Exclude trailing punctuation so you do not eat the sentence's period or comma.
Handle balanced parentheses up to two levels for Wikipedia-style links.
Accept scheme-less www. and domain/path shapes if users type bare URLs.
Test the pattern against messy real input, not just one clean URL.

That last point matters most. Paste your pattern and a pile of realistic strings — quoted URLs, URLs in parentheses, bare domains, and a trailing-comma list — into a regex tester and confirm each one highlights correctly before you commit. Comparing two candidate patterns side by side is exactly the kind of thing a diff checker makes obvious.

References

RFC 3986 — Uniform Resource Identifier (URI): Generic Syntax — authoritative URI grammar; cited for scheme/host case-insensitivity and component structure.
An Improved Liberal, Accurate Regex Pattern for Matching URLs — John Gruber's patterns; source for trailing-punctuation exclusion and two-level balanced parentheses.
MDN: URL() constructor — the spec-compliant validation alternative used in the code samples.
MDN: URL Pattern API — modern path-matching API referenced as the long-term alternative to routing regex.

Related on iKit

Test any URL pattern live before you ship it — the fastest loop for checking your URL regex against messy sample text without writing a script.
The 25 regex patterns you'll actually reuse every week — a quick-reference cheatsheet that includes the character classes and quantifiers used in the URL pattern above.
Email regex patterns compared: strict vs loose — the same liberal-vs-strict tradeoff that governs URL matching, applied to email validation.

How to Match a URL with Regex: 3 Edge Cases (2026)

How to Match a URL with Regex — and the 3 Edge Cases Most Patterns Miss

TL;DR

How to match a URL with regex in JavaScript

The naive pattern and why it almost works

Anchoring: scanning text vs validating one string

Set the flags before you blame the pattern

Why your URL regex grabs the period at the end of a sentence

A period is a legal URL character

Refuse a closing punctuation character

Or strip trailing punctuation after matching

How do I match URLs with parentheses in them

The balanced-parens problem

Match one or two levels explicitly

Why you cannot match arbitrary depth in JavaScript

How to match URLs without http:// (scheme-less links)

Recognize the domain-dot-slash shape

The tradeoff: liberal matching means false positives

Should I use regex or new URL() to validate a URL

Let the URL constructor do the parsing

When you genuinely need both

The modern option: URLPattern

Putting it together: a checklist

References

Related on iKit

Related posts

Convert HEIC to JPG, PNG & WebP in the Browser (2026)

Serve AVIF, WebP & JPG With One <picture> Tag (2026)

How to Batch Convert 50 Images at Once in 2026 (No Upload)

How to Match a URL with Regex — and the 3 Edge Cases Most Patterns Miss

#TL;DR

#How to match a URL with regex in JavaScript

#The naive pattern and why it almost works

#Anchoring: scanning text vs validating one string

#Set the flags before you blame the pattern

#Why your URL regex grabs the period at the end of a sentence

#A period is a legal URL character

#Refuse a closing punctuation character

#Or strip trailing punctuation after matching

#How do I match URLs with parentheses in them

#The balanced-parens problem

#Match one or two levels explicitly

#Why you cannot match arbitrary depth in JavaScript

#How to match URLs without http:// (scheme-less links)

#Recognize the domain-dot-slash shape

#The tradeoff: liberal matching means false positives

#Should I use regex or new URL() to validate a URL

#Let the URL constructor do the parsing

#When you genuinely need both

#The modern option: URLPattern

#Putting it together: a checklist

#References

#Related on iKit

Related posts

Convert HEIC to JPG, PNG & WebP in the Browser (2026)

Serve AVIF, WebP & JPG With One <picture> Tag (2026)

How to Batch Convert 50 Images at Once in 2026 (No Upload)

TL;DR

How to match a URL with regex in JavaScript

The naive pattern and why it almost works

Anchoring: scanning text vs validating one string

Set the flags before you blame the pattern

Why your URL regex grabs the period at the end of a sentence

A period is a legal URL character

Refuse a closing punctuation character

Or strip trailing punctuation after matching

How do I match URLs with parentheses in them

The balanced-parens problem

Match one or two levels explicitly

Why you cannot match arbitrary depth in JavaScript

How to match URLs without http:// (scheme-less links)

Recognize the domain-dot-slash shape

The tradeoff: liberal matching means false positives

Should I use regex or new URL() to validate a URL

Let the URL constructor do the parsing

When you genuinely need both

The modern option: URLPattern

Putting it together: a checklist

References

Related on iKit