URL Encoder & Decoder Guide

Q: What is the difference between encodeURI and encodeURIComponent in JavaScript?

encodeURI() encodes a full URI and preserves characters that have special meaning in URLs, such as :, /, ?, #, &, and =. encodeURIComponent() encodes a single URI component (like a query parameter value) and DOES encode those special characters. Use encodeURIComponent() for encoding individual values and encodeURI() for encoding a complete URL.

Q: Why do spaces sometimes appear as %20 and sometimes as +?

The %20 encoding comes from RFC 3986 (generic URI syntax) and is universally valid in any part of a URL. The + encoding for spaces comes from the application/x-www-form-urlencoded format defined in the HTML specification, which is used specifically for form submissions and query strings. Both are correct in their respective contexts, but %20 is the safer choice when you are unsure.

1. What Is URL Encoding?

URL encoding, formally known as percent encoding, is the process of converting characters into a format that can be safely transmitted within a Uniform Resource Locator (URL). URLs can only contain a limited set of characters from the US-ASCII character set, and certain characters within that set have special reserved meanings. Any character that falls outside the permitted set -- or a reserved character used outside its special purpose -- must be encoded before it can appear in a URL.

The encoding mechanism is simple: each unsafe character is replaced by a percent sign (%) followed by exactly two hexadecimal digits that represent the character's byte value. For example, a space character (ASCII value 32, hexadecimal 20) is encoded as %20. An ampersand (&, ASCII value 38, hexadecimal 26) is encoded as %26.

URL encoding exists because of a fundamental design constraint. When Tim Berners-Lee designed the URL syntax in the early 1990s, URLs needed to be compact, universally transmittable, and parseable by software. Characters like spaces, angle brackets, curly braces, and non-ASCII characters could cause ambiguity or break parsers. Reserved characters like ?, &, =, and # serve as delimiters within the URL structure -- using them as literal data without encoding would confuse parsers about where one component ends and another begins.

Consider a search query like price < $50 & color = blue. If you placed this directly in a URL query string, the & would be misinterpreted as a parameter separator, the = as a key-value delimiter, the < as potentially dangerous input, the $ as a special character, and the spaces would truncate the URL in many contexts. URL encoding transforms this into price%20%3C%20%2450%20%26%20color%20%3D%20blue, making every character unambiguous.

Today, URL encoding is a fundamental building block of the web. Every browser, web server, HTTP client library, and API framework implements URL encoding. Every time you submit a form, click a link with query parameters, or call a REST API, URL encoding is at work behind the scenes ensuring that your data arrives intact and unambiguous.

2. How Percent Encoding Works

Percent encoding is conceptually straightforward, but the details matter. Understanding the algorithm helps you debug encoding issues and choose the right encoding function for your use case.

The Basic Algorithm

For ASCII characters, the encoding process works as follows:

Determine whether the character needs encoding (is it reserved, unsafe, or non-ASCII?).
If it does, take its byte value and express it as two uppercase hexadecimal digits.
Prepend a percent sign to form the encoded triplet: %HH.

Here are some common examples:

Character   ASCII Value   Hex    Encoded
Space       32            20     %20
!           33            21     %21
#           35            23     %23
$           36            24     %24
&           38            26     %26
+           43            2B     %2B
/           47            2F     %2F
:           58            3A     %3A
=           61            3D     %3D
?           63            3F     %3F
@           64            40     %40

Encoding Multi-Byte Characters

For characters outside the ASCII range (code points above 127), the character is first encoded into its UTF-8 byte sequence, and then each byte is individually percent-encoded. This is the approach mandated by modern standards and is sometimes called IRI-to-URI conversion.

For example, the euro sign (€, U+20AC) has the UTF-8 byte sequence E2 82 AC, so it is encoded as %E2%82%AC.

Character       Code Point   UTF-8 Bytes      Encoded
ü (u-umlaut)   U+00FC       C3 BC            %C3%BC
€ (euro)        U+20AC       E2 82 AC         %E2%82%AC
ß (sharp s)    U+00DF       C3 9F            %C3%9F
你 (Chinese)   U+4F60       E4 BD A0         %E4%BD%A0

Decoding Process

URL decoding (also called percent decoding) is the reverse process. A decoder scans the string for percent signs. When it finds one, it reads the next two hexadecimal characters, converts them to a byte value, and replaces the three-character sequence (%HH) with the corresponding byte. After all percent sequences are decoded, the resulting byte sequence is interpreted as UTF-8 text.

Decoders must also handle the + sign as a space character when processing application/x-www-form-urlencoded data, though this is specific to form data and not part of the general percent-encoding specification.

3. RFC 3986: The URI Standard

RFC 3986, published in January 2005, is the current definitive standard for Uniform Resource Identifier (URI) syntax. It supersedes RFC 2396 and is the specification that governs how URLs are constructed, parsed, and resolved. Understanding RFC 3986 is essential for anyone who works with URLs programmatically.

URI Structure

RFC 3986 defines the generic URI syntax with the following components:

  scheme://authority/path?query#fragment

  https://user:[email protected]:8080/search?q=hello&lang=en#results
  \___/   \________________________/\_____/ \_____________/ \_____/
    |              |                    |          |            |
  scheme       authority              path      query       fragment

Each component has its own rules about which characters are allowed literally and which must be percent-encoded. A character that is valid in one component may need encoding in another. For instance, @ is a delimiter in the authority component but can appear unencoded in a query string.

The ABNF Grammar

RFC 3986 defines the allowed characters using an ABNF (Augmented Backus-Naur Form) grammar. The key productions are:

URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part     = "//" authority path-abempty
              / path-absolute
              / path-rootless
              / path-empty
authority     = [ userinfo "@" ] host [ ":" port ]
query         = *( pchar / "/" / "?" )
fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded   = "%" HEXDIG HEXDIG
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved      = gen-delims / sub-delims
gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

This grammar is precise: it tells you exactly which characters are allowed in each position without encoding.

Normalization

RFC 3986 also defines URI normalization -- the process of transforming a URI into a canonical form for comparison. Key normalization rules include:

Case normalization: The scheme and host are case-insensitive and should be lowercased. Percent-encoded triplets should use uppercase hex digits (%2F, not %2f).
Percent-encoding normalization: Unreserved characters that are percent-encoded should be decoded. For example, %41 (letter A) should be normalized to A.
Path segment normalization: Dot segments (. and ..) should be resolved. For example, /a/b/../c becomes /a/c.
Default port removal: If the port matches the scheme default (80 for HTTP, 443 for HTTPS), it should be omitted.

4. Reserved vs Unreserved Characters

RFC 3986 divides characters into three categories: unreserved, reserved, and all other characters. Understanding these categories is critical for knowing when to encode and when not to.

Unreserved Characters (Never Encode)

Unreserved characters can appear in any part of a URI without being percent-encoded. In fact, RFC 3986 states that unreserved characters should not be encoded, and if they are encoded, they must be decoded during normalization.

Unreserved = A-Z a-z 0-9 - . _ ~

Letters:    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
            a b c d e f g h i j k l m n o p q r s t u v w x y z
Digits:     0 1 2 3 4 5 6 7 8 9
Others:     - (hyphen)  . (period)  _ (underscore)  ~ (tilde)

These 66 characters are safe everywhere in a URL. You never need to worry about encoding them.

Reserved Characters (Context-Dependent)

Reserved characters have special meaning within the URI syntax. They act as delimiters that separate the scheme, authority, path, query, and fragment components. Whether a reserved character needs encoding depends on the context.

General delimiters (gen-delims):    : / ? # [ ] @
Sub-delimiters (sub-delims):        ! $ & ' ( ) * + , ; =

When serving their delimiter purpose, reserved characters must NOT be encoded. For example, the ? that separates the path from the query string must remain as a literal ?.

When used as data within a component, reserved characters MUST be encoded. For example, if a query parameter value contains an &, it must be encoded as %26 to avoid being interpreted as a parameter separator.

// Correct: & as delimiter between parameters
https://example.com/search?q=hello&lang=en

// Correct: & as literal data within a parameter value
https://example.com/search?company=AT%26T

All Other Characters (Always Encode)

Any character that is neither unreserved nor reserved must always be percent-encoded when it appears in a URI. This includes:

Space (%20)
Control characters (ASCII 0-31 and 127)
Non-ASCII characters (accented letters, CJK ideographs, emoji, etc.)
Characters like { } | \ ^ ` < > "

5. Encoding Query Parameters

Query parameters are the most common context where developers need to perform URL encoding. Understanding how query strings work -- and the difference between the URI specification and form encoding -- is essential for building correct URLs.

Query String Structure

A query string begins with a ? and consists of key-value pairs separated by &. Each key is separated from its value by =:

https://example.com/search?q=url+encoding&page=2&sort=date

Key-Value Pairs:
  q    = url encoding
  page = 2
  sort = date

Form Encoding vs URI Encoding

There are two encoding standards for query parameters, and they differ in how they handle spaces:

RFC 3986 (URI encoding): Spaces are encoded as %20. This is the general-purpose URI encoding defined by the URI specification. It applies to all parts of the URI.

application/x-www-form-urlencoded (form encoding): Spaces are encoded as +. This format is defined by the HTML specification (originally from the CGI specification) and is used specifically when browsers submit HTML forms with method="GET" or method="POST" with the default content type.

Original:          hello world & goodbye
RFC 3986:          hello%20world%20%26%20goodbye
Form-encoded:      hello+world+%26+goodbye

Most web servers and frameworks accept both %20 and + as spaces when parsing query strings. However, outside of query strings (such as in the path component), + is a literal plus sign, not a space.

Encoding Individual Values, Not Entire URLs

A critical mistake is to apply URL encoding to an entire URL rather than to individual parameter values. If you encode an entire URL, you will encode the structural characters (://, /, ?, =, &) that are needed for the URL to function:

// WRONG: Encoding the entire URL
https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%26page%3D1
// This is now a broken, unusable URL

// CORRECT: Encode only the parameter values
https://example.com/search?q=hello%20world&page=1

Always build your URL by encoding individual values and then assembling them with the proper delimiters.

Nested URLs as Parameter Values

A common scenario is passing a URL as a query parameter value -- for example, a redirect URL or a callback URL. The entire nested URL must be encoded:

// Original callback URL:
https://myapp.com/callback?status=ok

// Nested as a query parameter value:
https://auth.example.com/login?redirect=https%3A%2F%2Fmyapp.com%2Fcallback%3Fstatus%3Dok

The nested URL's ://, /, ?, and = must all be encoded so they are not interpreted as part of the outer URL's structure.

6. International Characters and UTF-8

The web is global, and URLs frequently contain non-ASCII characters -- accented letters in European languages, ideographs in Chinese, Japanese, and Korean, Arabic and Cyrillic scripts, and even emoji. URL encoding handles all of these through UTF-8.

The IRI Standard

RFC 3987 defines Internationalized Resource Identifiers (IRIs), which extend URIs to allow Unicode characters directly. An IRI is converted to a URI by percent-encoding all non-ASCII characters using their UTF-8 byte sequences. Modern browsers display IRIs in the address bar for readability but transmit the percent-encoded URI form over HTTP.

UTF-8 Encoding in URLs

When a non-ASCII character needs to appear in a URL, the standard process is:

Convert the character to its Unicode code point.
Encode the code point using UTF-8, producing 1 to 4 bytes.
Percent-encode each byte individually.

Character: cafe with accent (café)
  é = U+00E9
  UTF-8: C3 A9
  URL: caf%C3%A9

Character: Tokyo in Japanese (東京)
  東 = U+6771 -> UTF-8: E6 9D B1 -> %E6%9D%B1
  京 = U+4EAC -> UTF-8: E4 BA AC -> %E4%BA%AC
  URL: %E6%9D%B1%E4%BA%AC

Character: Smiley emoji (😀)
  😀 = U+1F600 -> UTF-8: F0 9F 98 80 -> %F0%9F%98%80

Internationalized Domain Names (IDN)

Domain names use a different encoding system called Punycode (defined in RFC 3492) rather than percent encoding. An internationalized domain name is converted to an ASCII-compatible encoding (ACE) with an xn-- prefix:

Unicode domain:    münchen.de
Punycode (ACE):    xn--mnchen-3ya.de

Unicode domain:    例え.jp
Punycode (ACE):    xn--r8jz45g.jp

Modern browsers display the Unicode form in the address bar but resolve the Punycode form through DNS. This distinction is important: domain names use Punycode, while path and query components use percent-encoded UTF-8.

Legacy Encodings

Before UTF-8 became the standard, web pages used various character encodings (ISO-8859-1, Windows-1252, Shift_JIS, EUC-KR, etc.), and form data was encoded using the page's character encoding. This led to ambiguity -- the same percent-encoded sequence could represent different characters depending on the assumed encoding. Today, UTF-8 is the universal standard for URL encoding, and modern specifications explicitly require UTF-8. If you encounter legacy systems that use other encodings, convert to UTF-8 at the boundary.

7. URL Encoding in Different Languages

Every major programming language provides built-in functions for URL encoding and decoding. However, the functions differ in subtle but important ways. Choosing the right function is critical for correct behavior.

JavaScript

// encodeURIComponent -- use for encoding parameter values
encodeURIComponent("hello world & goodbye")
// "hello%20world%20%26%20goodbye"

// decodeURIComponent -- decode a single component
decodeURIComponent("hello%20world%20%26%20goodbye")
// "hello world & goodbye"

// encodeURI -- use for encoding a full URI (preserves : / ? # & =)
encodeURI("https://example.com/path?q=hello world")
// "https://example.com/path?q=hello%20world"

// decodeURI -- decode a full URI
decodeURI("https://example.com/path?q=hello%20world")
// "https://example.com/path?q=hello world"

// URLSearchParams -- handles form encoding automatically
const params = new URLSearchParams();
params.set("q", "hello world & goodbye");
params.toString();
// "q=hello+world+%26+goodbye"  (uses + for spaces)

Key distinction: Use encodeURIComponent() for individual query parameter keys and values. Use encodeURI() only when you have a full URL that just needs non-ASCII characters or spaces encoded. Never use escape() -- it is deprecated and does not handle UTF-8 correctly.

Python

from urllib.parse import quote, unquote, urlencode, quote_plus

# quote -- RFC 3986 encoding (spaces become %20)
quote("hello world & goodbye")
# "hello%20world%20%26%20goodbye"

# quote with safe parameter -- preserve certain characters
quote("hello/world", safe="/")
# "hello/world"

# quote_plus -- form encoding (spaces become +)
quote_plus("hello world & goodbye")
# "hello+world+%26+goodbye"

# unquote -- decode percent-encoded strings
unquote("hello%20world%20%26%20goodbye")
# "hello world & goodbye"

# urlencode -- encode a dictionary of parameters
urlencode({"q": "hello world", "page": "1"})
# "q=hello+world&page=1"

Go

import "net/url"

// url.QueryEscape -- form encoding for query parameters
url.QueryEscape("hello world & goodbye")
// "hello+world+%26+goodbye"

// url.PathEscape -- RFC 3986 encoding for path segments
url.PathEscape("hello world & goodbye")
// "hello%20world%20&%20goodbye"

// url.QueryUnescape -- decode query-encoded strings
url.QueryUnescape("hello+world+%26+goodbye")
// "hello world & goodbye"

// url.Values -- build query strings safely
v := url.Values{}
v.Set("q", "hello world")
v.Set("page", "1")
v.Encode()
// "page=1&q=hello+world"

Java

import java.net.URLEncoder;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;

// URLEncoder.encode -- form encoding (spaces become +)
URLEncoder.encode("hello world & goodbye", StandardCharsets.UTF_8);
// "hello+world+%26+goodbye"

// URLDecoder.decode -- decode form-encoded strings
URLDecoder.decode("hello+world+%26+goodbye", StandardCharsets.UTF_8);
// "hello world & goodbye"

// For RFC 3986 encoding, use java.net.URI
new URI("https", "example.com", "/path", "q=hello world", null).toASCIIString();
// "https://example.com/path?q=hello%20world"

PHP

// urlencode -- form encoding (spaces become +)
urlencode("hello world & goodbye");
// "hello+world+%26+goodbye"

// rawurlencode -- RFC 3986 encoding (spaces become %20)
rawurlencode("hello world & goodbye");
// "hello%20world+%26%20goodbye"

// urldecode / rawurldecode -- corresponding decoders
urldecode("hello+world+%26+goodbye");
// "hello world & goodbye"

// http_build_query -- build query strings from arrays
http_build_query(["q" => "hello world", "page" => "1"]);
// "q=hello+world&page=1"

8. Common Pitfalls and Mistakes

URL encoding seems simple, but subtle mistakes can cause bugs that are difficult to diagnose. Here are the most common pitfalls developers encounter.

Double Encoding

Double encoding occurs when data that has already been percent-encoded is encoded again. The percent sign (%) itself gets encoded as %25, turning the already-encoded sequences into garbled output:

Original:           hello world
First encoding:     hello%20world       (correct)
Double encoding:    hello%2520world     (broken! %25 = %, so this decodes to hello%20world)

This typically happens when:

Multiple layers of code each apply encoding without checking if the input is already encoded
A framework automatically encodes URL parameters, but you manually encoded them first
Data passes through multiple systems that each apply their own encoding

Prevention: Encode data exactly once, at the point where you construct the URL. Never encode data "just in case" -- know whether your framework or library handles encoding automatically.

Encoding Entire URLs Instead of Components

As discussed in the query parameters section, encoding an entire URL destroys its structure. Always build URLs by encoding individual components and assembling them:

// WRONG
const url = encodeURIComponent(`https://example.com/search?q=${query}`);

// CORRECT
const url = `https://example.com/search?q=${encodeURIComponent(query)}`;

Confusing encodeURI and encodeURIComponent

In JavaScript, using encodeURI() to encode a query parameter value will fail to encode critical characters like &, =, and +, because encodeURI() assumes these are structural delimiters:

const value = "a=1&b=2";

// WRONG: encodeURI doesn't encode & and =
encodeURI(value);   // "a=1&b=2"  (unchanged! breaks the query string)

// CORRECT: encodeURIComponent encodes everything
encodeURIComponent(value);   // "a%3D1%26b%3D2"  (safe as a parameter value)

Plus Sign Confusion

The + character means a space in application/x-www-form-urlencoded context (query strings from form submissions), but it is a literal + in other URL components. This causes problems when:

You use form-encoded data in a path segment (the + will be interpreted literally, not as a space)
You need a literal + in a query string (you must encode it as %2B)
You mix encoding standards (some APIs expect %20 for spaces, not +)

Forgetting to Encode Path Segments

It is easy to remember to encode query parameter values but forget that path segments also need encoding. If a path segment contains a /, it will be misinterpreted as a path delimiter:

// File path as a URL segment: "reports/2026/Q1"
// WRONG: this creates three path segments
/files/reports/2026/Q1

// CORRECT: encode the value as a single segment
/files/reports%2F2026%2FQ1

Not Handling Unicode Correctly

Older encoding functions in some languages use Latin-1 or the system's default encoding instead of UTF-8. Always specify UTF-8 explicitly when available, and verify that your encoding functions produce UTF-8 percent-encoded output. A telltale sign of encoding mismatch is that accented characters or ideographs decode as garbage characters (mojibake).

9. Best Practices

Always Use UTF-8

UTF-8 is the universal standard for URL encoding. All modern specifications require it, and all modern browsers and servers expect it. Never use legacy encodings like Latin-1 or Shift_JIS for URL encoding. If you interface with legacy systems, convert to UTF-8 at the boundary.

Encode at Construction Time

Encode values at the moment you build the URL, not before and not after. This avoids double encoding and ensures that every value is encoded exactly once. Use URL builder APIs (like JavaScript's URL and URLSearchParams, Python's urllib.parse.urlencode, or Go's url.Values) that handle encoding automatically.

Use URL Builder APIs

Instead of manually concatenating URL strings, use your language's URL builder:

// JavaScript -- URL and URLSearchParams
const url = new URL("https://example.com/search");
url.searchParams.set("q", "hello world & goodbye");
url.searchParams.set("page", "1");
url.toString();
// "https://example.com/search?q=hello+world+%26+goodbye&page=1"

# Python -- urllib.parse
from urllib.parse import urlencode, urljoin
base = "https://example.com/search"
params = urlencode({"q": "hello world & goodbye", "page": "1"})
full_url = f"{base}?{params}"

These APIs handle the details correctly: they encode parameter values, insert the proper delimiters, and avoid double encoding.

Decode Before Processing, Encode Before Transmitting

When you receive URL-encoded data (from query parameters, form submissions, or API responses), decode it immediately into its natural form for processing. When you need to include data in a URL, encode it at the last moment before constructing the URL. This "decode early, encode late" principle keeps your application logic clean and prevents encoding errors from propagating.

Validate After Decoding

Always validate user input after URL decoding, not before. A malicious input like %3Cscript%3E will pass validation if you check the encoded form (it looks harmless), but after decoding it becomes <script>. Security validation (XSS prevention, SQL injection prevention, path traversal prevention) must always operate on the decoded data.

Use Uppercase Hex Digits

RFC 3986 recommends using uppercase hexadecimal digits in percent-encoded triplets (%2F rather than %2f). While most decoders accept either, using uppercase is the normalized form and ensures maximum interoperability.

Do Not Encode Unreserved Characters

Encoding unreserved characters (like letters, digits, hyphens, and underscores) is technically valid but unnecessary. It makes URLs harder to read and violates the normalization rules of RFC 3986. For example, %41 should be A, and %2D should be -.

Test with Edge Cases

When building URL handling code, test with these edge cases:

Strings containing every reserved character: :/?#[]@!$&'()*+,;=
Unicode characters from multiple scripts (Latin, CJK, Arabic, emoji)
Already-encoded strings (to catch double encoding)
Empty strings and strings with only spaces
Very long values (some servers have URL length limits, typically 2,000 to 8,000 characters)
Null bytes and control characters

10. Using Our Free URL Encoder/Decoder Tool

Our free URL Encoder/Decoder tool makes it easy to encode and decode URLs and URL components directly in your browser. No data is sent to any server -- all processing happens locally on your machine.

Encode Mode

Paste any text and instantly get the URL-encoded output. The tool supports both RFC 3986 percent encoding (spaces as %20) and form encoding (spaces as +), so you can choose the format that matches your use case.

Decode Mode

Paste a URL-encoded string and see the decoded output immediately. The tool automatically handles both %20 and + as spaces. Multi-byte UTF-8 sequences are decoded correctly, and invalid sequences are flagged with clear error messages.

Key Features

100% client-side: Your data never leaves your browser
RFC 3986 and form encoding modes: Choose the right encoding for your context
Full Unicode support: Encode and decode characters from any language or script
One-click copy: Copy encoded or decoded output to your clipboard instantly
Error detection: Clear feedback for invalid percent-encoded sequences

Encode & Decode URLs Instantly

Stop guessing which characters need encoding. Use our free tool to encode and decode URLs, query parameters, and international text right in your browser -- with zero data sent to any server.

Try the URL Encoder/Decoder Now

Frequently Asked Questions

What is URL encoding (percent encoding)?

URL encoding, also called percent encoding, is a mechanism for converting characters that are not allowed in a URL into a safe representation. Each unsafe character is replaced with a percent sign (%) followed by two hexadecimal digits representing the character's byte value. For example, a space becomes %20 and an ampersand becomes %26.

What is the difference between encodeURI and encodeURIComponent in JavaScript?

encodeURI() encodes a full URI and preserves characters that have special meaning in URLs, such as :, /, ?, #, &, and =. encodeURIComponent() encodes a single URI component (like a query parameter value) and does encode those special characters. Use encodeURIComponent() for encoding individual values and encodeURI() for encoding a complete URL.

Why do spaces sometimes appear as %20 and sometimes as +?

The %20 encoding comes from RFC 3986 (generic URI syntax) and is universally valid in any part of a URL. The + encoding for spaces comes from the application/x-www-form-urlencoded format defined in the HTML specification, which is used specifically for form submissions and query strings. Both are correct in their respective contexts, but %20 is the safer choice when you are unsure.

Which characters need to be URL encoded?

Characters that must be URL encoded include: spaces, non-ASCII characters (accented letters, CJK characters, emoji), and any reserved characters when used outside their special purpose. Reserved characters include : / ? # [ ] @ ! $ & ' ( ) * + , ; =. Unreserved characters that never need encoding are A-Z, a-z, 0-9, hyphen (-), underscore (_), period (.), and tilde (~).

How are international characters (like Chinese or Arabic) URL encoded?

International characters are first converted to their UTF-8 byte representation, and then each byte is percent-encoded. For example, the German u-umlaut (U+00FC) is encoded as %C3%BC because its UTF-8 representation is the two bytes C3 and BC. A Chinese character may require three percent-encoded bytes. Modern browsers display the decoded characters in the address bar for readability but send the encoded form in HTTP requests.

What is RFC 3986 and why does it matter for URL encoding?

RFC 3986 is the current standard for Uniform Resource Identifier (URI) syntax, published in 2005. It defines which characters are allowed in each part of a URI, which characters are reserved, and how percent-encoding must be performed. It matters because it is the authoritative specification that browsers, servers, and libraries follow when constructing and parsing URLs. Following RFC 3986 ensures your URLs are interoperable across all platforms.

URL Encoder & Decoder Guide: Encode and Decode URLs Online