iKit
Technical · 10 min read ·

How URL Encoding Works in 2026 (Component, URI, Form)

URL encoding looks simple until your API drops a plus sign. Here's the real difference between component, URI, and form encoding — with code, fixes, and 2026 rules.

How URL Encoding Works in 2026 (Component, URI, Form)

How URL Encoding Works in 2026 — Component, URI, and Form

Stripe sends a webhook. Your handler logs ?email=alice%2Bdev and silently drops the +dev portion. Six hours and one customer-support thread later, you realise the form decoder upstream stripped a perfectly legal byte. URL encoding looks like a 30-year-old solved problem until a single character breaks the request — and almost every team gets caught the same way at least once. This guide is the fix.

TL;DR

  • URL encoding maps unsafe characters to %XX byte sequences so URLs survive HTTP, DNS, and middleware written in different decades.
  • encodeURI() skips reserved chars like /, ?, #; encodeURIComponent() escapes them. Use the second for query values.
  • Form encoding (application/x-www-form-urlencoded) writes spaces as +; URI encoding writes %20. Mixing the two is the #1 silent bug.
  • RFC 3986 still defines what's "unreserved." The set is smaller than most code assumes — only 66 characters total.
  • Decode in your browser with the iKit URL Encoder — every byte stays in the tab.

What URL Encoding Actually Does

A URL is a string of ASCII characters that has to traverse routers, proxies, DNS resolvers, and web servers written across thirty years of the internet. To survive that journey intact, every byte that isn't a "safe" character has to be escaped. URL encoding — officially called percent-encoding — replaces unsafe bytes with a % followed by two uppercase hex digits.

The 1990s problem URL encoding was built to solve

When Tim Berners-Lee published the first URL spec, the only consensus across networks was 7-bit ASCII. Mail gateways truncated 8-bit characters. Newsgroup relays mangled spaces. Modems with parity bits flipped the high bit on anything outside 0x20–0x7E. Percent-encoding was the lowest-common-denominator answer: any byte you can't trust on the wire becomes three safe bytes — %, hex, hex.

Today HTTP/2 frames are binary and TLS removes most of the wire-level worry. But the encoding rules stuck. Your ?q=café still travels as ?q=caf%C3%A9 because the URL spec was written in 1994 and the entire web depends on it staying compatible.

Reserved vs unreserved characters (RFC 3986)

RFC 3986 splits printable ASCII into three buckets:

  • Unreserved — never need encoding: A–Z, a–z, 0–9, plus -, _, ., and ~. That is the entire safe set. Twenty-six plus twenty-six plus ten plus four — 66 characters total, full stop.
  • Reserved — have a special meaning somewhere in a URL: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. Whether they need escaping depends on which part of the URL you're sitting in.
  • Everything else — must be percent-encoded.

The shocking part for most developers: a hyphen-minus is unreserved, but every other punctuation mark — even the humble backtick — has wire-protocol meaning somewhere. If you've ever wondered why some URLs come back with ~ and others with %7E, that's why; older encoders treated tilde as reserved, newer ones don't.

Why you see %20 and %E4%B8%AD everywhere

%20 is space (0x20). %E4%B8%AD is the Chinese character 中, written as three bytes because URL-encoded UTF-8 represents non-ASCII characters as their byte sequence. A single emoji can balloon to twelve percent-encoded bytes (🚀 is %F0%9F%9A%80, doubled if there's a variation selector). This matters when you calculate URL length limits — Cloudflare cuts off around 8 KB, and your "short" share link can be longer than you think once a few emoji slip in.

Three Flavors of URL Encoding (and When Each One Applies)

The trap is that there isn't one URL encoding — there are three, with different rules about which characters get escaped and which stay literal. Mixing them up is the source of every URL bug worth knowing.

Component encoding (encodeURIComponent) — the one you usually want

If you're inserting a single value into a URL — a query parameter, a path segment, a hash fragment — you want component encoding. It escapes everything that has structural meaning in a URL: &, =, ?, /, #, and the rest of the reserved set. That way your value can't accidentally split into a second parameter or jump into the path.

const search = 'is "URL encoding" hard?';
const url = `/api/q?term=${encodeURIComponent(search)}`;
// /api/q?term=is%20%22URL%20encoding%22%20hard%3F

Notice the ? becomes %3F and the space becomes %20. Without encodeURIComponent, the literal ? would start a second query string and the server would see term=is "URL encoding" hard plus an empty parameter.

encodeURIComponent on MDN leaves only A–Z a–z 0–9 - _ . ! ~ * ' ( ) unescaped, which is RFC 3986's unreserved set plus a small group of "mark" characters that the older RFC 2396 considered safe. In practice that means component encoding produces nearly-paranoid output, and that's exactly the property you want.

URI encoding (encodeURI) — whole-URL safe but lazy

encodeURI() is built for whole URLs and assumes the structural characters are already in the right places. It leaves : / ? # & = + , ; and friends alone, which makes it dangerous if you're concatenating user input — a & in the middle of a name will pass through unchanged and add a phantom parameter:

const url = encodeURI(
  'https://api.x.com/users?q=salt&pepper'
);
// https://api.x.com/users?q=salt&pepper
// (untouched — every reserved char survived!)

Reach for encodeURI only when you have a complete, trusted URL with maybe some accented characters in the path, and you just need to make it ASCII-clean. For everything else, decompose the URL into pieces and encode each piece with encodeURIComponent.

Form encoding (application/x-www-form-urlencoded) — why you see + instead of %20

When an HTML form submits over GET, or when a fetch call sends Content-Type: application/x-www-form-urlencoded, spaces become +, not %20. This is form encoding, originally defined by HTML 2.0 and now standardised by the WHATWG URL Living Standard. It coexists awkwardly with RFC 3986 — they agree on most things, disagree on space.

new URLSearchParams({ q: 'hello world' }).toString();
// q=hello+world

encodeURIComponent('hello world');
// hello%20world

Both round-trip back to hello world, but a literal + in user input is a different story. [email protected] form-encoded correctly is alice%2Bdev%40example.com. If the encoder mistakenly thinks + is already-safe, you get alice+dev%40example.com, and decoding that turns the + back into a space — alice [email protected]. Plus signs in email addresses are the textbook example, and they break sign-up flows on a depressing percentage of B2C apps.

A Side-by-Side Comparison You Can Trust

Character encodeURI encodeURIComponent Form encoding
space %20 %20 +
& & %26 %26
+ + %2B %2B
/ / %2F %2F
? ? %3F %3F
# # %23 %23
%E4%B8%AD %E4%B8%AD %E4%B8%AD

Two takeaways. First, form encoding equals component encoding except it uses + for space. Second, encodeURI is the loose one — it leaves the structural punctuation alone, which is exactly why you should rarely use it on values.

Real Bugs URL Encoding Causes (and Fixes)

These three patterns account for the majority of URL-encoding incidents I have ever debugged. They all look obvious in hindsight and quietly disastrous in production.

Search strings with &, =, and ? that truncate your query

A user pastes red & yellow into a search box. Your code does url = `?q=${value}` instead of using encodeURIComponent. The server receives q=red and yellow= as a second parameter. Search returns nothing, and a junior dev spends the afternoon reading the server logs trying to figure out why. Fix: encode every value, every time, even the ones you "know" are safe. There is no business reason to ship a URL builder that does string concatenation.

Plus-signs in email aliases that vanish on the server

PHP's $_GET and frameworks built on top of it (Laravel, Symfony, WordPress) decode + as space when the request was form-encoded. If your front-end produced alice%2Bdev%40example.com and sent it as form data, the server re-decodes it and the + vanishes back to a space. The cleanest fix is to send JSON instead of form-encoded, since JSON has no special meaning for +:

// Front-end: send JSON, never form-encoded
fetch('/api/subscribe', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({ email })
});

If you must keep form encoding (legacy server, third-party integration), encode + twice on the way out — once as %2B, then wrap the whole string in another encodeURIComponent if it's going inside another encoded field.

Double-encoded URLs and how to spot them

A URL that's been encoded twice looks like this: https%253A%252F%252Fapi.x.com%252Fusers. The %25 is a literal % — your URL was encoded once, then the encoded string was encoded again by some middleware that didn't realise it had already been encoded. The give-away is the four-character pattern %25 followed by hex; if you see those, decode once and check.

A quick browser check: paste your URL into the iKit URL Encoder, hit decode once, and see if the result still contains %. If it does, decode again — but stop chasing it past two passes. More than two means a real bug upstream that needs a code fix, not another decode pass.

How To Encode (and Decode) URLs Without a Server

URL encoding is plain math — there is no reason to send a URL containing customer data, an auth token, or a webhook secret to a third-party server just to translate %2F back to /. Privacy-first encoding tools run entirely in your browser tab.

In the browser with iKit

The iKit URL Encoder handles both encode and decode, lets you pick component vs URI vs form modes, and never sends a single byte to a server. It's the same approach we take with our Base64 Encoder/Decoder and our JSON Decoder — JavaScript in the page, full stop. If you need to generate a fresh random token to put in a URL, the iKit Password Generator outputs URL-safe alphabets too.

From the JavaScript console

Every modern browser has the encoders built in. Open DevTools and try:

encodeURIComponent('日本語');
// '%E6%97%A5%E6%9C%AC%E8%AA%9E'

decodeURIComponent('%2Fpath%3Fq%3Dhello');
// '/path?q=hello'

new URLSearchParams({
  name: 'Ada Lovelace',
  q: 'a+b'
}).toString();
// 'name=Ada+Lovelace&q=a%2Bb'

If decodeURIComponent throws URIError: malformed URI sequence, you have a stray % followed by something that isn't two hex digits. That is almost always a sign of a partially-encoded string — usually a value that was truncated or wrapped in extra % somewhere upstream.

From the terminal with Python or jq

Sometimes you're staring at a log line and want a one-liner. Python ships everywhere and is usually the fastest path:

python3 -c "import urllib.parse, sys; \
print(urllib.parse.unquote(sys.argv[1]))" \
"hello%20world%21"
# hello world!

Or jq if you have it (brew install jq):

echo '"hello%20world"' | jq -r '@uri'
# "hello%2520world"
# (note: @uri ENCODES, doesn't decode)

jq's @uri filter is component encoding, not decoding — a fact that catches every developer at least once. To decode, lean on python3 or paste the value into the iKit URL Encoder.

Related on iKit

Related posts