A hash function is a mathematical algorithm that takes an input (often called a "message") of any arbitrary size and produces a fixed-size output, commonly referred to as a "hash," "digest," or "checksum." The output size depends on the algorithm: MD5 always produces 128 bits (32 hexadecimal characters), SHA-256 always produces 256 bits (64 hexadecimal characters), and so on, regardless of whether the input is a single character or a multi-gigabyte file.
Hash functions are one of the foundational building blocks of modern computer science and information security. They appear everywhere in software engineering: verifying the integrity of downloaded files, storing passwords securely, creating digital signatures, building data structures like hash tables, deduplicating data in storage systems, and powering blockchain technologies like Bitcoin and Ethereum.
A cryptographic hash function is a special class of hash function that satisfies additional security properties, making it suitable for use in contexts where adversarial attacks are a concern. These properties distinguish cryptographic hash functions from simpler, non-cryptographic hash functions (like CRC32 or FNV) that are used purely for performance-oriented tasks like hash table lookups.
A well-designed cryptographic hash function exhibits the following properties:
To illustrate, here are the SHA-256 hashes of two nearly identical inputs:
Input: "hello"
SHA-256: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Input: "Hello"
SHA-256: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969
The only difference between the two inputs is the capitalization of the first letter, yet the resulting hashes are completely different. There is no discernible pattern or relationship between them. This is the avalanche effect in action, and it is what makes hash functions so useful for integrity verification: any modification to the data, no matter how small, is immediately detectable.
While the internal mathematics of cryptographic hash functions are complex, the general process follows a consistent pattern across most algorithms. Understanding this high-level structure helps you appreciate why these functions behave the way they do.
Most widely used hash functions -- including MD5, SHA-1, and the SHA-2 family (SHA-256, SHA-512) -- are built on a design principle called the Merkle-Damgard construction. This approach works as follows:
This iterative design means that hash functions can process data in a streaming fashion. You do not need to load an entire file into memory to hash it -- you can feed the data block by block, which is essential for hashing large files.
The compression function is the core of any hash algorithm. It performs a series of bitwise operations, modular arithmetic, and logical functions that thoroughly mix the input bits. In SHA-256, for example, each block goes through 64 rounds of processing, each involving:
These operations are carefully designed so that every bit of the output depends on every bit of the input. This creates the avalanche effect and makes the function resistant to cryptanalytic attacks.
The one-way nature of hash functions comes from the deliberate destruction of information during the compression process. Each round of the compression function combines data in ways that cannot be unambiguously reversed. Multiple different inputs can produce the same intermediate state, so working backward from the output to the input requires guessing, not computing. For a 256-bit hash, brute-forcing the preimage requires on average 2^255 attempts -- a number so large that it exceeds the computational capacity of all the world's computers running for the lifetime of the universe.
Not all hash algorithms are created equal. Over the decades, advances in cryptanalysis have revealed weaknesses in older algorithms, while newer ones have been designed to withstand modern attack techniques. Here is a detailed comparison of the four most commonly encountered hash algorithms.
MD5 was once the dominant hash algorithm, used extensively for file integrity checks, digital signatures, and password hashing. However, in 2004, researchers demonstrated practical collision attacks against MD5, and by 2008, attackers had used MD5 collisions to forge a rogue CA certificate. Today, MD5 should never be used for any security purpose.
Despite its cryptographic weakness, MD5 remains popular for non-security applications: quick checksums to detect accidental file corruption, cache key generation, and content-addressable storage where adversarial attacks are not a concern. Its speed and ubiquity keep it in widespread use for these benign purposes.
Example:
Input: "QuickUtil"
MD5: a3f7b2c1d4e5f6a7b8c9d0e1f2a3b4c5 (32 characters)
SHA-1 was designed by the National Security Agency (NSA) and published by NIST as a federal standard. For many years, it was the standard hash function for TLS/SSL certificates, Git commits, and code signing. However, theoretical attacks emerged in 2005, and in 2017, Google and CWI Amsterdam demonstrated the first practical SHA-1 collision (the "SHAttered" attack), producing two different PDF files with the same SHA-1 hash.
Major browsers and certificate authorities have since stopped accepting SHA-1 certificates. Git still uses SHA-1 for commit identification, though the project has been transitioning to SHA-256. SHA-1 should not be used for new security applications, but remains in legacy systems.
Example:
Input: "QuickUtil"
SHA-1: 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b (40 characters)
SHA-256 is a member of the SHA-2 family, designed by the NSA and published by NIST. It is currently the most widely used secure hash function and is considered the gold standard for general-purpose cryptographic hashing. SHA-256 is used in TLS certificates, Bitcoin's proof-of-work system, software package verification, code signing, and countless security protocols.
No practical attacks against SHA-256 are known. The best known attack is a brute-force search with a complexity of 2^128 (due to the birthday paradox for collision finding), which is far beyond the capability of any foreseeable technology. SHA-256 is the recommended choice for most applications that require a cryptographic hash function.
Example:
Input: "QuickUtil"
SHA-256: 7d8f9e0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e (64 characters)
SHA-512 is the larger sibling of SHA-256 in the SHA-2 family. It operates on 64-bit words (compared to SHA-256's 32-bit words) and processes data in 1024-bit blocks through 80 rounds (compared to SHA-256's 64 rounds). The larger output provides an even higher security margin.
An interesting performance characteristic: on 64-bit processors, SHA-512 is often faster than SHA-256 because its internal operations are natively 64-bit. This makes SHA-512 an excellent choice when you want both maximum security and high performance on modern hardware.
Example:
Input: "QuickUtil"
SHA-512: 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b
3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d (128 characters)
Algorithm Output Size Speed Security Status Recommended Use
--------- ----------- ----- --------------- ---------------
MD5 128 bits Fastest Broken (collisions) Non-security checksums only
SHA-1 160 bits Fast Deprecated Legacy systems only
SHA-256 256 bits Moderate Secure General-purpose security
SHA-512 512 bits Fast (64b) Secure High-security applications
One of the most practical and widespread uses of hash functions is verifying file integrity. When you download software, an operating system ISO, or a firmware update, you need to be confident that the file you received is identical to the file the publisher released -- that it was not corrupted during transfer or tampered with by an attacker.
The process is straightforward: the publisher computes the hash of the original file and publishes the hash value alongside the download link. After downloading the file, you compute the hash of your local copy and compare it to the published value. If the hashes match, the file is identical to the original. If they differ, the file has been modified -- either by corruption or tampering.
This works because of two hash function properties: determinism (the same file always produces the same hash) and collision resistance (it is practically impossible for a modified file to produce the same hash as the original).
Every major operating system includes built-in tools for computing file hashes:
# Linux / macOS
md5sum myfile.iso
sha1sum myfile.iso
sha256sum myfile.iso
sha512sum myfile.iso
# macOS alternative
shasum -a 256 myfile.iso
# Windows (PowerShell)
Get-FileHash myfile.iso -Algorithm MD5
Get-FileHash myfile.iso -Algorithm SHA1
Get-FileHash myfile.iso -Algorithm SHA256
Get-FileHash myfile.iso -Algorithm SHA512
Many major software projects publish SHA-256 checksums for their releases:
docker pull command verifies the integrity of every layer by comparing its hash to the manifest.Here is a practical example of verifying a downloaded file:
# Download a file
curl -LO https://example.com/release-v2.0.tar.gz
# Compute its SHA-256 hash
sha256sum release-v2.0.tar.gz
# Output: 7d8f9e0a1b2c3d4e5f6a... release-v2.0.tar.gz
# Compare with the published hash
# If they match, the file is authentic and intact
echo "7d8f9e0a1b2c3d4e5f6a... release-v2.0.tar.gz" | sha256sum --check
# Output: release-v2.0.tar.gz: OK
Always use SHA-256 or SHA-512 for security-critical integrity checks. MD5 checksums are still commonly published, but they cannot protect against a determined attacker who could craft a malicious file with the same MD5 hash.
Password storage is one of the most important -- and most frequently mishandled -- applications of hashing in software development. Understanding how password hashing works, and how it differs from general-purpose hashing, is essential for every developer.
Storing passwords in plain text is a catastrophic security practice. If an attacker gains access to your database (through SQL injection, a backup leak, or an insider threat), they immediately have every user's password. Since many people reuse passwords across services, a single breach can compromise accounts across dozens of other platforms.
Hashing passwords before storing them means the database contains only hash values, not the actual passwords. When a user logs in, you hash their submitted password and compare it to the stored hash. If the hashes match, the password is correct. If the database is breached, the attacker gets hashes, not passwords -- and since hash functions are one-way, they cannot directly compute the original passwords from the hashes.
While SHA-256 is a strong cryptographic hash function, using it directly for password hashing has critical weaknesses:
A "salt" is a random value that is generated uniquely for each password and prepended (or appended) to the password before hashing. The salt is stored alongside the hash in the database. Salting defeats rainbow tables because every password, even identical ones, produces a different hash due to the unique salt.
# Without salt (vulnerable to rainbow tables):
SHA-256("password123") = ef92b778... (same for every user with this password)
# With salt (each user gets a unique hash):
salt1 = "x7k9m2p4"
SHA-256("x7k9m2p4" + "password123") = 3a8f7d2e...
salt2 = "q3r8t5w1"
SHA-256("q3r8t5w1" + "password123") = 9c1b4e6f... (different!)
Modern password hashing uses specialized algorithms that incorporate salting, key stretching (deliberately making the hash computation slow), and memory-hard functions to resist GPU and ASIC attacks:
Key takeaway: Never use MD5, SHA-1, SHA-256, or SHA-512 directly for password hashing. Always use bcrypt, scrypt, or Argon2id with a unique random salt per password. General-purpose hash functions like SHA-256 are excellent for data integrity and checksums, but they are not suitable for protecting passwords.
Beyond file integrity and password storage, hash functions play a central role in broader data verification and digital signature schemes.
HMAC combines a hash function with a secret key to produce an authentication code. Unlike a plain hash, an HMAC can only be computed by someone who possesses the secret key. This makes HMAC suitable for verifying both the integrity and authenticity of a message.
HMAC-SHA256(key, message) = hash
# Only someone with the key can compute the correct HMAC
# If the message is modified, the HMAC will not match
# If someone without the key tries to forge an HMAC, they cannot
HMACs are used extensively in API authentication (webhook signatures), JWT token verification, session tokens, and secure cookie signing.
Digital signatures use asymmetric cryptography (public/private key pairs) combined with hash functions. The signing process works as follows:
Hash functions are essential here because signing the full document with asymmetric cryptography would be computationally expensive (RSA operations are slow). By hashing the document first, you reduce it to a fixed-size digest that can be signed efficiently.
Blockchain technology relies heavily on hash functions. In Bitcoin, for example:
Many modern systems use hash-based addressing, where data is stored and retrieved using its hash value as the key:
Every major programming language provides built-in or standard library support for computing cryptographic hashes. Here are examples in several popular languages.
// Browser: using the Web Crypto API
async function sha256(message) {
const msgBuffer = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
// Usage
const hash = await sha256('Hello, World!');
console.log(hash);
// Node.js: using the crypto module
const crypto = require('crypto');
function sha256(message) {
return crypto.createHash('sha256').update(message).digest('hex');
}
function md5(message) {
return crypto.createHash('md5').update(message).digest('hex');
}
// Hash a file
const fs = require('fs');
function hashFile(filepath, algorithm = 'sha256') {
return new Promise((resolve, reject) => {
const hash = crypto.createHash(algorithm);
const stream = fs.createReadStream(filepath);
stream.on('data', data => hash.update(data));
stream.on('end', () => resolve(hash.digest('hex')));
stream.on('error', reject);
});
}
import hashlib
# String hashing
message = "Hello, World!"
md5_hash = hashlib.md5(message.encode()).hexdigest()
sha1_hash = hashlib.sha1(message.encode()).hexdigest()
sha256_hash = hashlib.sha256(message.encode()).hexdigest()
sha512_hash = hashlib.sha512(message.encode()).hexdigest()
print(f"MD5: {md5_hash}")
print(f"SHA-1: {sha1_hash}")
print(f"SHA-256:{sha256_hash}")
print(f"SHA-512:{sha512_hash}")
# File hashing (memory-efficient, streaming)
def hash_file(filepath, algorithm='sha256'):
h = hashlib.new(algorithm)
with open(filepath, 'rb') as f:
while chunk := f.read(8192):
h.update(chunk)
return h.hexdigest()
print(hash_file('myfile.iso'))
package main
import (
"crypto/md5"
"crypto/sha1"
"crypto/sha256"
"crypto/sha512"
"fmt"
"io"
"os"
)
func main() {
message := []byte("Hello, World!")
// String hashing
fmt.Printf("MD5: %x\n", md5.Sum(message))
fmt.Printf("SHA-1: %x\n", sha1.Sum(message))
fmt.Printf("SHA-256:%x\n", sha256.Sum256(message))
fmt.Printf("SHA-512:%x\n", sha512.Sum512(message))
}
// File hashing
func hashFile(filepath string) (string, error) {
f, err := os.Open(filepath)
if err != nil {
return "", err
}
defer f.Close()
h := sha256.New()
if _, err := io.Copy(h, f); err != nil {
return "", err
}
return fmt.Sprintf("%x", h.Sum(nil)), nil
}
import java.security.MessageDigest;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
public class HashExample {
public static String hash(String input, String algorithm) throws Exception {
MessageDigest md = MessageDigest.getInstance(algorithm);
byte[] digest = md.digest(input.getBytes(StandardCharsets.UTF_8));
StringBuilder hex = new StringBuilder();
for (byte b : digest) {
hex.append(String.format("%02x", b));
}
return hex.toString();
}
public static void main(String[] args) throws Exception {
String message = "Hello, World!";
System.out.println("MD5: " + hash(message, "MD5"));
System.out.println("SHA-1: " + hash(message, "SHA-1"));
System.out.println("SHA-256: " + hash(message, "SHA-256"));
System.out.println("SHA-512: " + hash(message, "SHA-512"));
}
}
# Hash a string
echo -n "Hello, World!" | md5sum
echo -n "Hello, World!" | sha1sum
echo -n "Hello, World!" | sha256sum
echo -n "Hello, World!" | sha512sum
# Important: use -n to avoid hashing the trailing newline
# Without -n, echo adds a newline, giving a different hash
# Hash a file
sha256sum /path/to/file.tar.gz
# Verify a file against a known hash
echo "expected_hash filename" | sha256sum --check
Hash functions are powerful tools, but they can be misused in ways that create serious security vulnerabilities. Here are the most important security considerations and common mistakes to avoid.
This is the single most important rule. MD5 has been broken since 2004 and SHA-1 since 2017. Despite this, both continue to appear in new code. Modern collision attacks against MD5 can be performed in seconds on a laptop. SHA-1 collisions require more effort but are well within reach of well-funded attackers. Use SHA-256 or SHA-512 for all security-critical applications.
As discussed in the password hashing section, general-purpose hash functions are too fast for password hashing. Use bcrypt, scrypt, or Argon2id. These algorithms are deliberately slow, include built-in salting, and are designed to resist GPU and ASIC attacks.
Hash functions based on the Merkle-Damgard construction (MD5, SHA-1, SHA-256) are vulnerable to length extension attacks. If you know SHA-256(message) but not the message itself, you can compute SHA-256(message || padding || extension) without knowing the original message. This is why you should use HMAC for keyed hashing, not SHA-256(key + message).
# WRONG: vulnerable to length extension attacks
mac = SHA256(secret_key + message)
# RIGHT: use HMAC, which handles the key securely
mac = HMAC-SHA256(secret_key, message)
When comparing hash values (especially in authentication contexts), use a constant-time comparison function. Standard string comparison (==) returns as soon as a mismatch is found, leaking information about how many characters matched. An attacker can exploit this timing difference to guess the correct hash one character at a time.
# Python: use hmac.compare_digest for constant-time comparison
import hmac
is_valid = hmac.compare_digest(computed_hash, expected_hash)
// Node.js: use crypto.timingSafeEqual
const crypto = require('crypto');
const isValid = crypto.timingSafeEqual(
Buffer.from(computedHash, 'hex'),
Buffer.from(expectedHash, 'hex')
);
Publishing a file's SHA-256 hash alongside the download does not help if both the file and the hash are hosted on the same compromised server. An attacker who can replace the file can also replace the published hash. For maximum security, hash values should be signed with a GPG key or published on a separate, trusted channel.
Hash tables use non-cryptographic hash functions internally, and an attacker who can control the keys can craft inputs that all hash to the same bucket, degrading the hash table from O(1) to O(n) performance. This is the basis of hash collision denial-of-service (HashDoS) attacks. Many languages now use randomized hash functions for hash tables to mitigate this.
Here is a summary of best practices for using hash functions effectively and securely.
Hash functions operate on bytes, not characters. Before hashing a string, you must encode it to bytes using a specific encoding (typically UTF-8). Different encodings of the same string produce different byte sequences and therefore different hashes. Always document and enforce the encoding used in your system.
Do not load entire files into memory to hash them. Use the streaming API provided by your language's hash library. Feed data to the hash function in chunks (typically 4 KB or 8 KB at a time). This allows you to hash files of any size with constant memory usage.
While hexadecimal is case-insensitive, the convention in most tools and systems is lowercase hex. Storing hashes in a consistent format prevents comparison bugs. Some systems use Base64 encoding for hashes (especially in security headers and JWTs), but hexadecimal is more common for file checksums and database storage.
In systems that verify file or data integrity, hash mismatches should be treated as security events. Log them, alert on them, and investigate the cause. A hash mismatch could indicate accidental corruption, a software bug, or an active attack.
Hash algorithms have a finite lifespan. MD5 lasted about 13 years before being broken; SHA-1 lasted about 22 years. Design your systems to support algorithm agility -- the ability to switch to a new hash algorithm without a complete rewrite. Store the algorithm identifier alongside the hash value (e.g., sha256:7d8f9e0a1b...), so old hashes can be identified and migrated.
Our free Hash Generator tool makes it easy to compute MD5, SHA-1, SHA-256, and SHA-512 hashes directly in your browser. No data is sent to any server -- all processing happens locally on your machine using the Web Crypto API.
Type or paste any text, and instantly see the hash output for all supported algorithms simultaneously. Compare hashes side-by-side, and copy any result to your clipboard with a single click.
Drag and drop a file (or click to browse) to compute its hash. The tool processes files entirely in your browser using streaming, so even large files can be hashed without uploading them anywhere. Use this to verify downloaded files against published checksums.
Stop writing throwaway scripts to compute hashes. Use our free tool to generate MD5, SHA-1, SHA-256, and SHA-512 hashes for text and files right in your browser -- with zero data sent to any server.
Try the Hash Generator NowA cryptographic hash function is a mathematical algorithm that takes an input of any size and produces a fixed-size output (the hash or digest). Key properties include determinism (same input always produces the same output), avalanche effect (small input changes produce vastly different outputs), one-way computation (you cannot reverse a hash to find the original input), and collision resistance (it is extremely difficult to find two different inputs that produce the same hash).
MD5 produces a 128-bit (32 hex character) hash and is considered cryptographically broken -- it should only be used for non-security checksums. SHA-1 produces a 160-bit (40 hex character) hash and is also deprecated for security purposes. SHA-256 produces a 256-bit (64 hex character) hash and is widely used and considered secure. SHA-512 produces a 512-bit (128 hex character) hash and offers the highest security margin among these algorithms.
MD5 is cryptographically broken and should never be used for security purposes such as digital signatures, certificate verification, or password hashing. However, MD5 is still acceptable for non-security use cases like checksums for data integrity checks (detecting accidental corruption), cache key generation, and deduplication of non-adversarial data.
No. Cryptographic hash functions are designed to be one-way functions. You cannot mathematically reverse a hash to recover the original input. However, attackers can use precomputed lookup tables (rainbow tables), dictionary attacks, or brute-force attempts to find inputs that produce a given hash. This is why passwords should be hashed with specialized algorithms like bcrypt or Argon2 that include salting and key stretching.
Hashing is a one-way process that produces a fixed-size digest from any input. You cannot recover the original data from a hash. Encryption is a two-way process that transforms data using a key, and the original data can be recovered using the correct decryption key. Hashing is used for integrity verification and password storage, while encryption is used for confidentiality.
SHA-256 is recommended because MD5 and SHA-1 have known collision vulnerabilities. Researchers have demonstrated practical collision attacks against both MD5 and SHA-1, meaning attackers can craft two different inputs that produce the same hash. SHA-256 has no known practical attacks and is the standard recommended by NIST, used in TLS certificates, Bitcoin, and many security protocols.
Learn JWT structure, claims, signing algorithms, and how to decode and verify tokens for secure authentication.
Master Base64 encoding and decoding with this comprehensive guide covering the algorithm, use cases, and code examples.
Master JSON syntax, formatting best practices, validation techniques, and common parsing errors.