Why does encodeURIComponent leave hyphen, tilde, and underscore alone?

Because ECMA-262 explicitly mirrors RFC 3986's unreserved set when defining `encodeURIComponent`. Those four punctuation marks — hyphen, period, underscore, tilde — are unreserved, so a conforming encoder is required to pass them through unchanged. The same spec also exempts `!`, `*`, `'`, `(`, and `)` for historical reasons inherited from RFC 2396, which is why the JS output sometimes looks looser than a pedantic RFC 3986 encoder would produce.

What changed between RFC 2396 and RFC 3986 for unreserved characters?

The biggest change is that tilde (`~`) moved into the unreserved set. RFC 2396 (1998) treated tilde as a 'mark' character that some encoders escaped to `%7E`. RFC 3986 (2005) reclassifies it as unreserved, so newer encoders pass it through. RFC 3986 also dropped `!`, `*`, `'`, `(`, and `)` from the safe set — they're sub-delims now, technically reserved. That's why two encoders can produce different output for the same input and both can claim spec compliance.

How do I quickly test if a character is unreserved?

Run a regex test in your browser console: `/^[A-Za-z0-9\-._~]$/.test(c)`. If it returns `true`, the character is unreserved and never needs encoding. If `false`, you have to decide whether the character is reserved with structural meaning (then leave it alone in the right context) or genuinely needs `%XX` encoding. For a paste-and-check UI without writing code, the iKit URL Encoder runs the same test on every character of your input in the browser.

RFC 3986 in 2026: What Counts as an Unreserved Character

Q: What is an unreserved character in RFC 3986?

An unreserved character is one of the 66 ASCII characters that RFC 3986 guarantees never need percent-encoding in any part of a URL. The set is the 26 uppercase letters, 26 lowercase letters, 10 digits, plus four punctuation marks: hyphen (-), period (.), underscore (_), and tilde (~). Anything outside that set either is a reserved character with structural meaning, or must be percent-encoded as `%XX` byte sequences.

You're staring at a URL where a backend wrote %2D for a hyphen, and your CDN normalised it back to - on the way out. The hash signature breaks. Somewhere in the stack, an encoder decided a hyphen was unsafe — and it was wrong by twenty years of spec. RFC 3986 has a precise answer to which characters never need encoding, and almost every library subtly disagrees with it. This is the field guide.

TL;DR

RFC 3986 lists exactly 66 unreserved ASCII characters: A–Z, a–z, 0–9, and -, ., _, ~.
An unreserved character must never be percent-encoded, and a decoder that sees %2D must treat it as identical to -.
Reserved characters (: / ? # [ ] @ ! $ & ' ( ) * + , ; =) have structural meaning and only sometimes need encoding.
encodeURIComponent keeps the RFC 3986 unreserved set, plus ! * ' ( ) from the older RFC 2396 — a known mismatch.
Test any character in the browser with /^[A-Za-z0-9\-._~]$/, or paste the whole URL into the iKit URL Encoder.

What is an unreserved character in RFC 3986

The unreserved set is the smallest, simplest concept in RFC 3986 — and it's the one that catches the most teams out, because the spec is shorter than the folklore around it.

The four character categories in RFC 3986

RFC 3986 sorts every byte of a URL into one of four buckets. Knowing which bucket a character falls into tells you whether it needs encoding, and where.

Category	What it does	Example characters
Unreserved	Never need encoding	`A–Z`, `a–z`, `0–9`, `-`, `.`, `_`, `~`
Gen-delims	Separate parts of the URL	`:`, `/`, `?`, `#`, `[`, `]`, `@`
Sub-delims	Punctuation inside components	`!`, `$`, `&`, `'`, `(`, `)`, `*`, `+`, `,`, `;`, `=`
Other	Must be percent-encoded	space, `<`, `>`, `"`, `{`, `}`, non-ASCII bytes

Gen-delims and sub-delims together make up the reserved set. A character being reserved does not mean it must always be encoded — it means it carries a special meaning somewhere in a URL, and you have to decide based on context.

The exact 66 unreserved characters

The unreserved set, per Section 2.3, is:

ALPHA / DIGIT / "-" / "." / "_" / "~"

Count it out: 26 uppercase letters, 26 lowercase letters, 10 digits, and four punctuation marks. That's 66 total. There are no other unreserved characters — not the dollar sign, not the asterisk, not the parenthesis, not the exclamation mark. If you've seen those passed through unencoded by browsers and curl, that's not because the spec allows it; it's because reserved characters often don't need encoding in specific URL components.

Why "unreserved" is the only never-escape set

The contract for an unreserved character is the strongest in the spec: a producer must never percent-encode it, and a consumer that receives %2D (the encoding for hyphen) must treat it as equivalent to a literal -. This is called URI normalisation and it's exactly why CDNs and caches will rewrite %2D back to - — they're behaving correctly, your encoder was not.

For reserved characters, normalisation is the opposite: encoded and unencoded forms are not equivalent, because the difference signals whether the character is being used structurally (? as the query separator) or as data (%3F as a literal question mark inside a value).

Why encodeURIComponent does not encode hyphen, tilde, or underscore

This is the single most-Googled "bug" in URL encoding, and it's not a bug. The behaviour is mandated by the ECMAScript specification for a reason that goes back twenty years.

What the ECMAScript spec actually requires

encodeURIComponent is required to leave the following characters unescaped:

A-Z a-z 0-9 - _ . ! ~ * ' ( )

The first four groups are the RFC 3986 unreserved set. The remaining five — !, *, ', (, ) — are sub-delims that used to be in the unreserved set under RFC 2396 (which JavaScript predates). When the URL spec was rewritten in 2005, those five moved into sub-delims, but encodeURIComponent couldn't follow without breaking every existing site on the web. The MDN reference for encodeURIComponent calls this out explicitly.

The asterisk and parenthesis edge case

If you need bit-identical output between Java's URLEncoder.encode, Python's urllib.parse.quote, and JavaScript's encodeURIComponent, you have to patch around !, *, ', (, ). A common one-liner:

function rfc3986Encode(str) {
  return encodeURIComponent(str).replace(
    /[!*'()]/g,
    c => '%' + c.charCodeAt(0).toString(16).toUpperCase()
  );
}

Run it on it's-a-test!. Plain encodeURIComponent returns it's-a-test!. The strict version returns it%27s-a-test%21. Both decode back to the same string, but the wire bytes differ — which matters when you're computing a SHA-256 over the URL as part of an OAuth 1.0a or AWS SigV4 signature.

How encodeURI differs in its safe set

encodeURI is meant for whole URLs and leaves the entire reserved set unescaped on top of unreserved. That means it won't touch : / ? # [ ] @ ! $ & ' ( ) * + , ; = — so a query string like ?q=a&b=c survives intact. Use encodeURIComponent when you're inserting a single value into a URL; use encodeURI only on a complete, trusted URL that just happens to contain accented characters. The trap is using encodeURI on user input that contains & — you'll silently inject extra query parameters.

What changed between RFC 2396 and RFC 3986

The current URL spec is twenty years old, but anything older than 2005 still operates by the previous one. Long-lived libraries, signing algorithms, and protocol specs sometimes freeze the RFC 2396 rules in place — knowing the diff is how you debug them.

Tilde moves into unreserved

In RFC 2396, tilde (~) was a "mark" character — technically unreserved, but borderline enough that some encoders preferred to play safe and emit %7E. RFC 3986 promotes it to a first-class unreserved character. Modern encoders pass it through; legacy ones still escape it. If you ever see %7E in a Google Maps share URL it's because their signing code freezes the older behaviour for compatibility.

Mark characters move into sub-delims

RFC 2396 had a group called "mark" containing - _ . ! ~ * ' ( ). RFC 3986 split that group: the first four became unreserved, and ! * ' ( ) became sub-delims. The practical effect is that !, *, ', (, ) can now appear unencoded inside a URL component depending on context, but a strict encoder is allowed to percent-encode them — both are valid output.

Square brackets join gen-delims for IPv6

RFC 3986 added [ and ] as gen-delims so IPv6 literal hosts like http://[2001:db8::1]/path could be parsed. Encoders written against RFC 2396 will percent-encode the brackets, which usually breaks the URL — your DNS resolver can't see them as the IPv6 host marker any more.

How to test if a character needs URL encoding

You don't need a library for this. The check is a single character class.

A 5-line JavaScript test

function isUnreserved(c) {
  return /^[A-Za-z0-9\-._~]$/.test(c);
}

isUnreserved('-');  // true
isUnreserved('+');  // false (it's sub-delim)
isUnreserved(' ');  // false (must be %20)

Pair it with encodeURIComponent when you want the wire byte:

function encodeIfNeeded(c) {
  return isUnreserved(c) ? c : encodeURIComponent(c);
}

If you're building a regex against unreserved characters more than once a year, paste it into the iKit Regex Tester — every character class trips someone up at least once.

The copy-paste unreserved table

Memorising 66 characters is silly. Memorise the four punctuation marks and you're done.

Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
Digits: 0 1 2 3 4 5 6 7 8 9
Punctuation: - (hyphen-minus), . (period), _ (underscore), ~ (tilde)

That's it. Anything else either has structural meaning or has to be percent-encoded.

Why your regex test is probably wrong

Common bugs in homegrown URL-safety regexes:

Using \w — that's [A-Za-z0-9_], so you'll accept underscore but reject hyphen, period, and tilde.
Forgetting to escape the hyphen — [-A-Za-z0-9._~] works because the hyphen is at the start, but [A-Za-z0-9-._~] is a typo away from a range like 0-9- that does the wrong thing in some engines.
Including + or * "because they look harmless" — they're sub-delims and a strict consumer will treat encoded and unencoded forms as different bytes.

The unreserved test is exactly /^[A-Za-z0-9\-._~]$/. Anything more permissive is letting through structural characters; anything less is over-encoding.

Reserved-character bugs that ship to production

The unreserved set tells you what's always safe. The reserved set tells you what's sometimes safe — and that "sometimes" is where the bugs live.

Path segment vs query value

A literal ? is allowed inside a fragment (#) but not inside a path or query. A literal # is never allowed inside a path or query — both have to be %23. The WHATWG URL Living Standard defines a per-component encode set for browsers, which is why pasting https://example.com/page?q=#tag into Chrome's address bar gives you a different escape pattern than encoding the same string through encodeURIComponent. The browser knows you're building a URL; encodeURIComponent is encoding a string.

Plus sign in form encoding

+ is unreserved in zero specs — it's a sub-delim, and in application/x-www-form-urlencoded it explicitly means space. Pasting [email protected] into a form input and sending it as-is over a form-encoded POST will arrive at the server as alice [email protected] and break the email lookup. The fix is encodeURIComponent on the value before submission, which turns + into %2B. Base64-encoded payloads run into the same trap — every + in standard Base64 has to be %2B on the wire, which is why URL-safe Base64 exists (it uses - and _ instead). The iKit Base64 tool has both alphabets one click apart.

Application-specific delimiters

Some APIs treat , or ; as a delimiter inside a path segment (matrix URIs do this). RFC 3986 lets you encode those as %2C and %3B to escape the delimiter — but a normalising proxy will not decode them back, because they're reserved. That's the inverse of the unreserved guarantee: encoded sub-delims stay encoded through the pipeline.

Related on iKit

How URL Encoding Works in 2026 — Component, URI, and Form — the broader walkthrough of percent-encoding rules this post drills into.
encodeURIComponent vs encodeURI: When to Use Which (2026) — sibling piece on the two JavaScript encoders and their differing safe sets.
Why Your URL Has Plus Signs: Form Encoding Explained (2026) — what happens when sub-delim + collides with form encoding's space rule.
Double-Encoded URLs: How to Spot and Fix Them (2026) — the failure mode where an unreserved character gets encoded twice through proxy chains.
How to Debug a 400 Bad Request With URL Decoding (2026) — practical decoding workflow when the server rejects a URL you thought was fine.
URL Encode Online: Stop Pasting Sensitive URLs (2026) — why the encoding step belongs in your browser, not someone else's server.
How to Encode and Decode Base64 — With Real Examples — sibling encoding format with the same "looks like text, actually byte data" issue.
Decode Base64 to a Downloadable File in Browser (2026) — the privacy-first companion when your Base64 payload happens to be a file.

RFC 3986 in 2026: What Counts as an Unreserved Character

RFC 3986 in 2026: What Counts as an Unreserved Character

TL;DR

What is an unreserved character in RFC 3986

The four character categories in RFC 3986

The exact 66 unreserved characters

Why "unreserved" is the only never-escape set

Why encodeURIComponent does not encode hyphen, tilde, or underscore

What the ECMAScript spec actually requires

The asterisk and parenthesis edge case

How encodeURI differs in its safe set

What changed between RFC 2396 and RFC 3986

Tilde moves into unreserved

Mark characters move into sub-delims

Square brackets join gen-delims for IPv6

How to test if a character needs URL encoding

A 5-line JavaScript test

The copy-paste unreserved table

Why your regex test is probably wrong

Reserved-character bugs that ship to production

Path segment vs query value

Plus sign in form encoding

Application-specific delimiters

Related on iKit

Related posts

Double-Encoded URLs: How to Spot and Fix Them (2026)

HS256 vs RS256: Which JWT Algorithm Should You Pick (2026)

Inside a JWT: A Field-by-Field Guide to Standard Claims (2026)

RFC 3986 in 2026: What Counts as an Unreserved Character

#TL;DR

#What is an unreserved character in RFC 3986

#The four character categories in RFC 3986

#The exact 66 unreserved characters

#Why "unreserved" is the only never-escape set

#Why encodeURIComponent does not encode hyphen, tilde, or underscore

#What the ECMAScript spec actually requires

#The asterisk and parenthesis edge case

#How encodeURI differs in its safe set

#What changed between RFC 2396 and RFC 3986

#Tilde moves into unreserved

#Mark characters move into sub-delims

#Square brackets join gen-delims for IPv6

#How to test if a character needs URL encoding

#A 5-line JavaScript test

#The copy-paste unreserved table

#Why your regex test is probably wrong

#Reserved-character bugs that ship to production

#Path segment vs query value

#Plus sign in form encoding

#Application-specific delimiters

#Related on iKit

Related posts

Double-Encoded URLs: How to Spot and Fix Them (2026)

HS256 vs RS256: Which JWT Algorithm Should You Pick (2026)

Inside a JWT: A Field-by-Field Guide to Standard Claims (2026)

TL;DR

What is an unreserved character in RFC 3986

The four character categories in RFC 3986

The exact 66 unreserved characters

Why "unreserved" is the only never-escape set

Why encodeURIComponent does not encode hyphen, tilde, or underscore

What the ECMAScript spec actually requires

The asterisk and parenthesis edge case

How encodeURI differs in its safe set

What changed between RFC 2396 and RFC 3986

Tilde moves into unreserved

Mark characters move into sub-delims

Square brackets join gen-delims for IPv6

How to test if a character needs URL encoding

A 5-line JavaScript test

The copy-paste unreserved table

Why your regex test is probably wrong

Reserved-character bugs that ship to production

Path segment vs query value

Plus sign in form encoding

Application-specific delimiters

Related on iKit