RFC 3986 in 2026: What Counts as an Unreserved Character
RFC 3986 names just 66 characters as unreserved — and most encoders mishandle the rest of printable ASCII. Here's what each one actually is, and why it matters.
RFC 3986 in 2026: What Counts as an Unreserved Character
You're staring at a URL where a backend wrote %2D for a hyphen, and your CDN normalised it back to - on the way out. The hash signature breaks. Somewhere in the stack, an encoder decided a hyphen was unsafe — and it was wrong by twenty years of spec. RFC 3986 has a precise answer to which characters never need encoding, and almost every library subtly disagrees with it. This is the field guide.
TL;DR
- RFC 3986 lists exactly 66 unreserved ASCII characters:
A–Z,a–z,0–9, and-,.,_,~. - An unreserved character must never be percent-encoded, and a decoder that sees
%2Dmust treat it as identical to-. - Reserved characters (
: / ? # [ ] @ ! $ & ' ( ) * + , ; =) have structural meaning and only sometimes need encoding. encodeURIComponentkeeps the RFC 3986 unreserved set, plus! * ' ( )from the older RFC 2396 — a known mismatch.- Test any character in the browser with
/^[A-Za-z0-9\-._~]$/, or paste the whole URL into the iKit URL Encoder.
What is an unreserved character in RFC 3986
The unreserved set is the smallest, simplest concept in RFC 3986 — and it's the one that catches the most teams out, because the spec is shorter than the folklore around it.
The four character categories in RFC 3986
RFC 3986 sorts every byte of a URL into one of four buckets. Knowing which bucket a character falls into tells you whether it needs encoding, and where.
| Category | What it does | Example characters |
|---|---|---|
| Unreserved | Never need encoding | A–Z, a–z, 0–9, -, ., _, ~ |
| Gen-delims | Separate parts of the URL | :, /, ?, #, [, ], @ |
| Sub-delims | Punctuation inside components | !, $, &, ', (, ), *, +, ,, ;, = |
| Other | Must be percent-encoded | space, <, >, ", {, }, non-ASCII bytes |
Gen-delims and sub-delims together make up the reserved set. A character being reserved does not mean it must always be encoded — it means it carries a special meaning somewhere in a URL, and you have to decide based on context.
The exact 66 unreserved characters
The unreserved set, per Section 2.3, is:
ALPHA / DIGIT / "-" / "." / "_" / "~"
Count it out: 26 uppercase letters, 26 lowercase letters, 10 digits, and four punctuation marks. That's 66 total. There are no other unreserved characters — not the dollar sign, not the asterisk, not the parenthesis, not the exclamation mark. If you've seen those passed through unencoded by browsers and curl, that's not because the spec allows it; it's because reserved characters often don't need encoding in specific URL components.
Why "unreserved" is the only never-escape set
The contract for an unreserved character is the strongest in the spec: a producer must never percent-encode it, and a consumer that receives %2D (the encoding for hyphen) must treat it as equivalent to a literal -. This is called URI normalisation and it's exactly why CDNs and caches will rewrite %2D back to - — they're behaving correctly, your encoder was not.
For reserved characters, normalisation is the opposite: encoded and unencoded forms are not equivalent, because the difference signals whether the character is being used structurally (? as the query separator) or as data (%3F as a literal question mark inside a value).
Why encodeURIComponent does not encode hyphen, tilde, or underscore
This is the single most-Googled "bug" in URL encoding, and it's not a bug. The behaviour is mandated by the ECMAScript specification for a reason that goes back twenty years.
What the ECMAScript spec actually requires
encodeURIComponent is required to leave the following characters unescaped:
A-Z a-z 0-9 - _ . ! ~ * ' ( )
The first four groups are the RFC 3986 unreserved set. The remaining five — !, *, ', (, ) — are sub-delims that used to be in the unreserved set under RFC 2396 (which JavaScript predates). When the URL spec was rewritten in 2005, those five moved into sub-delims, but encodeURIComponent couldn't follow without breaking every existing site on the web. The MDN reference for encodeURIComponent calls this out explicitly.
The asterisk and parenthesis edge case
If you need bit-identical output between Java's URLEncoder.encode, Python's urllib.parse.quote, and JavaScript's encodeURIComponent, you have to patch around !, *, ', (, ). A common one-liner:
function rfc3986Encode(str) {
return encodeURIComponent(str).replace(
/[!*'()]/g,
c => '%' + c.charCodeAt(0).toString(16).toUpperCase()
);
}
Run it on it's-a-test!. Plain encodeURIComponent returns it's-a-test!. The strict version returns it%27s-a-test%21. Both decode back to the same string, but the wire bytes differ — which matters when you're computing a SHA-256 over the URL as part of an OAuth 1.0a or AWS SigV4 signature.
How encodeURI differs in its safe set
encodeURI is meant for whole URLs and leaves the entire reserved set unescaped on top of unreserved. That means it won't touch : / ? # [ ] @ ! $ & ' ( ) * + , ; = — so a query string like ?q=a&b=c survives intact. Use encodeURIComponent when you're inserting a single value into a URL; use encodeURI only on a complete, trusted URL that just happens to contain accented characters. The trap is using encodeURI on user input that contains & — you'll silently inject extra query parameters.
What changed between RFC 2396 and RFC 3986
The current URL spec is twenty years old, but anything older than 2005 still operates by the previous one. Long-lived libraries, signing algorithms, and protocol specs sometimes freeze the RFC 2396 rules in place — knowing the diff is how you debug them.
Tilde moves into unreserved
In RFC 2396, tilde (~) was a "mark" character — technically unreserved, but borderline enough that some encoders preferred to play safe and emit %7E. RFC 3986 promotes it to a first-class unreserved character. Modern encoders pass it through; legacy ones still escape it. If you ever see %7E in a Google Maps share URL it's because their signing code freezes the older behaviour for compatibility.
Mark characters move into sub-delims
RFC 2396 had a group called "mark" containing - _ . ! ~ * ' ( ). RFC 3986 split that group: the first four became unreserved, and ! * ' ( ) became sub-delims. The practical effect is that !, *, ', (, ) can now appear unencoded inside a URL component depending on context, but a strict encoder is allowed to percent-encode them — both are valid output.
Square brackets join gen-delims for IPv6
RFC 3986 added [ and ] as gen-delims so IPv6 literal hosts like http://[2001:db8::1]/path could be parsed. Encoders written against RFC 2396 will percent-encode the brackets, which usually breaks the URL — your DNS resolver can't see them as the IPv6 host marker any more.
How to test if a character needs URL encoding
You don't need a library for this. The check is a single character class.
A 5-line JavaScript test
function isUnreserved(c) {
return /^[A-Za-z0-9\-._~]$/.test(c);
}
isUnreserved('-'); // true
isUnreserved('+'); // false (it's sub-delim)
isUnreserved(' '); // false (must be %20)
Pair it with encodeURIComponent when you want the wire byte:
function encodeIfNeeded(c) {
return isUnreserved(c) ? c : encodeURIComponent(c);
}
If you're building a regex against unreserved characters more than once a year, paste it into the iKit Regex Tester — every character class trips someone up at least once.
The copy-paste unreserved table
Memorising 66 characters is silly. Memorise the four punctuation marks and you're done.
- Letters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z - Letters:
a b c d e f g h i j k l m n o p q r s t u v w x y z - Digits:
0 1 2 3 4 5 6 7 8 9 - Punctuation:
-(hyphen-minus),.(period),_(underscore),~(tilde)
That's it. Anything else either has structural meaning or has to be percent-encoded.
Why your regex test is probably wrong
Common bugs in homegrown URL-safety regexes:
- Using
\w— that's[A-Za-z0-9_], so you'll accept underscore but reject hyphen, period, and tilde. - Forgetting to escape the hyphen —
[-A-Za-z0-9._~]works because the hyphen is at the start, but[A-Za-z0-9-._~]is a typo away from a range like0-9-that does the wrong thing in some engines. - Including
+or*"because they look harmless" — they're sub-delims and a strict consumer will treat encoded and unencoded forms as different bytes.
The unreserved test is exactly /^[A-Za-z0-9\-._~]$/. Anything more permissive is letting through structural characters; anything less is over-encoding.
Reserved-character bugs that ship to production
The unreserved set tells you what's always safe. The reserved set tells you what's sometimes safe — and that "sometimes" is where the bugs live.
Path segment vs query value
A literal ? is allowed inside a fragment (#) but not inside a path or query. A literal # is never allowed inside a path or query — both have to be %23. The WHATWG URL Living Standard defines a per-component encode set for browsers, which is why pasting https://example.com/page?q=#tag into Chrome's address bar gives you a different escape pattern than encoding the same string through encodeURIComponent. The browser knows you're building a URL; encodeURIComponent is encoding a string.
Plus sign in form encoding
+ is unreserved in zero specs — it's a sub-delim, and in application/x-www-form-urlencoded it explicitly means space. Pasting [email protected] into a form input and sending it as-is over a form-encoded POST will arrive at the server as alice [email protected] and break the email lookup. The fix is encodeURIComponent on the value before submission, which turns + into %2B. Base64-encoded payloads run into the same trap — every + in standard Base64 has to be %2B on the wire, which is why URL-safe Base64 exists (it uses - and _ instead). The iKit Base64 tool has both alphabets one click apart.
Application-specific delimiters
Some APIs treat , or ; as a delimiter inside a path segment (matrix URIs do this). RFC 3986 lets you encode those as %2C and %3B to escape the delimiter — but a normalising proxy will not decode them back, because they're reserved. That's the inverse of the unreserved guarantee: encoded sub-delims stay encoded through the pipeline.
Related on iKit
- How URL Encoding Works in 2026 — Component, URI, and Form — the broader walkthrough of percent-encoding rules this post drills into.
- encodeURIComponent vs encodeURI: When to Use Which (2026) — sibling piece on the two JavaScript encoders and their differing safe sets.
- Why Your URL Has Plus Signs: Form Encoding Explained (2026) — what happens when sub-delim
+collides with form encoding's space rule. - Double-Encoded URLs: How to Spot and Fix Them (2026) — the failure mode where an unreserved character gets encoded twice through proxy chains.
- How to Debug a 400 Bad Request With URL Decoding (2026) — practical decoding workflow when the server rejects a URL you thought was fine.
- URL Encode Online: Stop Pasting Sensitive URLs (2026) — why the encoding step belongs in your browser, not someone else's server.
- How to Encode and Decode Base64 — With Real Examples — sibling encoding format with the same "looks like text, actually byte data" issue.
- Decode Base64 to a Downloadable File in Browser (2026) — the privacy-first companion when your Base64 payload happens to be a file.
Related posts
Double-Encoded URLs: How to Spot and Fix Them (2026)
A double-encoded URL is when a value runs through encodeURIComponent twice. Here's how to spot %2520 in seconds and fix the API bug it caused.
HS256 vs RS256: Which JWT Algorithm Should You Pick (2026)
HS256 vs RS256 is the first decision when issuing a JWT — symmetric speed vs asymmetric key separation. Here's how to pick the right algorithm in 2026.
Inside a JWT: A Field-by-Field Guide to Standard Claims (2026)
Every JWT carries the same standard claims — iss, sub, aud, exp, iat, nbf, jti. Here's what each one means, RFC 7519 references, and the bugs they cause.