Hash functions are ubiquitous in information security: they are used for integrity checks, content-addressed storage, digital signatures, password verification, and more. This document treats three families of constructions that have received the most attention in practice and in the literature: MD5 (an historical design of the MD family), SHA-2 (notably SHA-256, standardized in FIPS 180), and SHA-3 (the Keccak-based family standardized in FIPS 202). Where necessary we contrast the design principles (MerkleDamgrd vs. sponge) and explain the consequences for security. Key primary sources are cited: the MD5 specification (RFC 1321), the SHA standards (FIPS 180-4), the SHA-3 standard (FIPS 202), and major cryptanalytic work on MD5 and related constructions. citeturn0search0turn0search7turn3search2
Let (H:{0,1}^* o{0,1}^n) be a deterministic function mapping binary strings of arbitrary length to fixed-length (n)-bit outputs. The following definitions are standard (see comprehensive references in applied cryptography). citeturn0search3
Definition (Preimage resistance). (H) is preimage resistant if, given a randomly chosen (y{0,1}^n), it is computationally infeasible to find (x) such that (H(x)=y).
Definition (Second-preimage resistance). (H) is second-preimage resistant if, given a fixed (x), it is computationally infeasible to find (x' eq x) with (H(x')=H(x)).
Definition (Collision resistance). (H) is collision resistant if it is computationally infeasible to find any pair ((x,x')), (x eq x'), with (H(x)=H(x')).
These properties are not independent in a quantitative sense: generic bounds for attack complexity follow from combinatorial arguments and are summarized below. The ``birthday bound'' gives a generic (O(2)) work factor for collision search in an ideal (n)-bit hash, while preimage attacks are generic (O(2^n)). A rigorous derivation and discussion can be found in standard texts. citeturn0search3
Many classical hash functions (MD4/MD5, SHA-1, SHA-2) follow the MerkleDamgrd paradigm: compress a chaining value iteratively with fixed-size message blocks and a compression function (f). The high-level reduction theorem is:
Informal theorem (MerkleDamgrd collision lifting). If the underlying fixed-length compression function (f) is collision-prone (i.e., there exist two distinct inputs to (f) that yield the same output), then one can construct collisions for the iterated MerkleDamgrd hash by appending appropriate blocks. The conversesecurity of (f) implies security of the iterated constructionholds under standard MD-compliant padding assumptions. citeturn2search10turn2search4
Consequence: structural weaknesses in a compression function can be amplified to the full hash. This is the mechanism that allowed cryptanalysts to move from local differential paths in the MD5 compression to full-message collisions. For historical exposition and formal proofs see the original Damgrd paper and subsequent analyses. citeturn2search10
MD5 is defined in RFC 1321 (1992). The algorithm maps arbitrary-length messages to a 128-bit digest and is specified by the padding rule, an initial 128-bit chaining value expressed as four 32-bit words, and a 64-step compression built from logical boolean functions, modular addition and left-rotation operations. Implementers should consult the RFC for exact constants, endianness conventions and a reference implementation. citeturn0search0
Key invariants:
- Output length (n=128) bits.
- Internal state width (v=128) bits (A,B,C,D each 32 bits).
- The additive constants (T_j) (one per step) are derived from the fractional parts of (|(j+1)|) scaled by (2) (integer part taken). citeturn0search0
Given message (M), MD5 forms [ M' = M ,|, 1 ,|, 0^k ,|, (len), ] where (len=|M|) in bits, (()) denotes the 64-bit little-endian encoding of (len), and (k) is chosen so (|M'| 0). The padded message is parsed into (t) blocks (M_1,,M_t) each of 512 bits, and each block is viewed as 16 little-endian 32-bit words (M_i[015]). citeturn0search0
Define four boolean functions (operating on 32-bit words):
[
F(X,Y,Z) = (X AND Y) OR ( NOT X AND Z)
G(X,Y,Z) = (X AND Z) OR (Y AND NOT Z)
H(X,Y,Z) = X XOR Y XOR Z
I(X,Y,Z) = Y XOR (X OR NOT Z)
]
The per-step update (conceptual) for step index (j) is:
[ ext{tmp} = A + f_j(B,C,D) + M[g_j] + T_j } ]
[ A,B,C,D D,; B + ROL( ext{tmp}, s_j),; B,; C ]
Here (f_j) is one of (F,G,H,I) depending on the round, (g_j) selects which message word is used, (s_j) is the rotation amount, and (T_j) is the additive constant. After all 64 steps a block-specific result is added back to the chaining variables modulo (2). The RFC lists tables for (g_j), (s_j) and (T_j); implementers must use those exact tables to interoperate. citeturn0search0
+-------------------------+
| 512-bit message block |
+-------------------------+
| Parse -> 16 x 32-bit W |
+-------------------------+
| Initialize A,B,C,D |
| for j in 0..63: |
| tmp = A + f_j(B,C,D) |
| + W[g_j] + T_j |
| A = D |
| D = C |
| C = B |
| B = B + ROL(tmp,s_j) |
+-------------------------+
| Add to chaining state |
+-------------------------+
- Determinism: MD5 is a deterministic function; identical messages produce identical digests by construction.
- Avalanche: The composition of non-linear boolean functions with rotations and modular addition is intended to produce diffusion where single-bit changes propagate across many output bits after a few rounds. This property is heuristic and measured empirically; it does not constitute formal resistance to collision-finding, which depends on the algebraic and differential structure of the compression. citeturn0search0
- One-wayness: Formally one-wayness is a complexity-theoretic property; given the size of the output space (128 bits) generic preimage search has complexity (O(2)), but targeted cryptanalysis (e.g., exploiting internal structure) may reduce this work factor. In MD5, practical collision attacks have significantly weakened the assumed security margin. citeturn0search1
Beginning with theoretical differential techniques and culminating in practical demonstrations, researchers led by Xiaoyun Wang showed that MD5's compression function admits differential paths that lead to collisions much faster than the generic birthday bound would suggest for an ideal 128-bit function. Wang et al.'s work demonstrated concrete collision-finding methods and performance estimates that were orders of magnitude faster than brute-force. citeturn0search1
Marc Stevens and coauthors developed chosen-prefix collision techniques that dramatically increased the practical threat model: attackers can construct collisions where two chosen prefixes (arbitrary data controlled by the attacker) are extended with computed suffixes to produce colliding digests. These chosen-prefix collisions enabled concrete forgeries (notably rogue CA certificate constructions) and are a central reason why MD5 is considered unsafe for any collision-sensitive application. Stevens and collaborators provided complexity estimates and demonstrations; modern chosen-prefix methods reduce the computational cost substantially relative to naive estimates. citeturn0search2turn0search10
- Generic collision complexity for an ideal 128-bit digest is (O(2)) work (birthday bound). citeturn0search3
- Wang's differential methods and subsequent refinements produced collisions in timescales that are feasible for attackers with modest resources (original demonstrations reported wall-clock times from minutes to hours on workstation clusters of the time). citeturn0search1
- Chosen-prefix collisions constructed by Stevens and colleagues reported practical costs in the range of (2)(2) compression-function evaluations for specific constructions with optimizations; these orders of magnitude are well within reach of modern cluster or cloud resources when amortized. Practically, this implies MD5-based signatures and certificate usage are insecure. citeturn0search2turn0search10
Implication. Collision resistancerequired by a range of cryptographic uses such as digest-based digital signaturesis demonstrably broken for MD5. Any system depending on MD5's collision resistance (e.g., signing an MD5 digest) must be considered compromised. citeturn0search1turn0search10
SHA-256 and the broader SHA-2 family are specified in the NIST Secure Hash Standard (FIPS 180-4). SHA-256 produces a 256-bit digest and uses an iterated compression function over 512-bit blocks with an internal state of eight 32-bit words (total 256-bit chaining state). The specification defines the message schedule, constants (K_t), the boolean functions and rotation/shift operators used in each round. citeturn3search1
Let the message schedule be (W_0,,W) derived from the 16 message words and extended by:
[ W_t = sigma1(W) + W + sigma0(W) + W}, t=1663 ]
where (sigma0,sigma1) are small-word rotations/xors defined in the standard. The per-round update uses functions:
[
ext{Ch}(x,y,z) = (x AND y) XOR ( NOT x AND z)\
ext{Maj}(x,y,z) = (x AND y) XOR (x AND z) XOR (y AND z)\
Sigma0(x) = ext{ROTR}^2(x) XOR ext{ROTR}(x) XOR ext{ROTR}(x)
Sigma1(x) = ext{ROTR}^6(x) XOR ext{ROTR}(x) XOR ext{ROTR}(x)
]
The compression updates eight working variables (a..h) per round and then adds them back to the chain variables. The complete spec and constants are tabulated in FIPS 180-4. citeturn3search1
[W_t] [K_t]
\ /
+---------+
| temp1 | = h + 1(e) + Ch(e,f,g) + K_t + W_t
| temp2 | = 0(a) + Maj(a,b,c)
+---------+
h = g
g = f
f = e
e = d + temp1
d = c
c = b
b = a
a = temp1 + temp2
(Repeat for t=0..63, then add a..h to the chaining state.) citeturn3search1
As of the present literature and standards, SHA-256 remains free of published practical collision attacks comparable to MD5's differential attacks; the best known attacks against SHA-256 are significantly more expensive than the generic birthday bound and still far beyond practical reach. The SHA-2 family benefits from a larger internal state and a more conservative design against differential characteristics than MD5. NIST continues to recommend SHA-2 for cryptographic hashing and signatures where mandated. citeturn3search1
SHA-3 (FIPS 202) departs from the MerkleDamgrd paradigm and adopts the sponge construction with a permutation-based internal primitive (Keccak-f). The sponge construction provides natural resistance to length-extension attacks inherent to classic MerkleDamgrd designs and is parameterized by a rate (r) and capacity (c) such that the claimed security level is at most (c/2). SHA-3 complements SHA-2 as an algorithmically distinct family and is standardized for contexts requiring diversification of primitives. citeturn3search2
ASCII schematic of the sponge (absorb/squeeze):
Input blocks -> XOR into first r bits of state -> Permutation f -> (repeat)
After absorption: Squeeze outputs from first r bits, applying f between output blocks.
Capacity c = state width - r (controls security level)
Let (n) be the output length in bits and (N=2^n). Sampling (k) independent random digests, the probability that no pair collides equals ((1 - i/N)). Using the approximation ((1-x)�pprox -x) for small (x) and replacing the product by the exponential of a sum yields
[ [ ext{no collision}] �pprox (-rac{k(k-1)}{2N} ight). ]
Setting ([ ext{collision}]=1/2) and solving for (k) gives (k�pprox �pprox 1.177 2). Thus generic collision resistance for an (n)-bit hash degrades to (O(2)) work. For MD5 (n=128) this is (O(2)); for SHA-256 (n=256) this is (O(2)). The distinction highlights why modern hash families move to larger outputs: to maintain infeasibility in the face of the birthday bound. This derivation is standard; see primary cryptography texts for rigorous development. citeturn0search3
Storing H(password)
(even if H=SHA-256 or H=MD5) in a database exposes the system to offline dictionary and brute-force attacks: an adversary obtaining the database may systematically guess candidate passwords and compute their hashes to find matches. If H
is fast (as MD5 and SHA-256 are by design), then attackers can test millions or billions of guesses per second on commodity hardware or GPUs. Therefore naive use of general-purpose hash functions for passwords is discouraged by modern guidance. citeturn1search3turn1search1
A salt (s) is a per-entry unique, non-secret random string. The stored verifier is typically (H(p|s)) (concatenation or other canonical combination). Salting defends against:
- Precomputed rainbow tables (tables must be recomputed per salt), and
- Mass compromise inference (same password yields different stored values for distinct salts).
Mathematically, salting increases the effective entropy of the input to the hash as seen by the attacker, and prevents amortization of computation across many accounts. OWASP and NIST both recommend unique salts and modern key-derivation functions. citeturn1search3turn1search1
Modern password storage requires a deliberately expensive KDF. Key recommendations include:
- Argon2 (winner of the Password Hashing Competition), with an IETF RFC and implementation guidance (Argon2id is recommended for balanced resistance to side-channel and GPU attacks). Argon2 lets you tune memory, parallelism, and time cost parameters. citeturn1search5turn1search11
- bcrypt and scrypt remain widely used; scrypt adds memory-hardness but has different trade-offs. PBKDF2 is standardized (e.g., RFC2898) but offers only CPU-cost tuning and is considered less resistant than memory-hard functions for modern adversaries with massive parallel hardware. OWASP and NIST guidance recommend Argon2 or a similar memory-hard function. citeturn1search3turn1search11
For each user record:
- Generate salt (s) uniformly at random (e.g., 16+ bytes).
- Compute verifier (v = (password, s; ext{params})).
- Store ( (s, v, ext{params}) ) in the authentication database.
This design prevents simple dictionary attacks using precomputation and increases the cost per guess due to Argon2's memory and time parameters. citeturn1search5turn1search3
- Deprecate MD5 for security-critical tasks (signatures, certificate hashing, TLS, etc.). MD5 collisions are practical. Replace with SHA-256 or SHA-3 family where non-password integrity is needed. citeturn0search1turn0search10
- Do not use general-purpose fast hashes for passwords. Replace stored MD5/SHA-1/SHA-256 password hashes with a memory-hard KDF (Argon2), using salt and a migration strategy that supports on-login re-hashing. citeturn1search5turn1search3
- Use digital-signature schemes over the canonical digest functions mandated by standards (RSA/ECDSA over SHA-2/SHA-3 as appropriate). Avoid signing simple MD5 digests. citeturn3search1turn3search2
- Document and monitor: document cryptographic choices, parameter choices (Argon2 cost), and maintain an upgrade plan. Security recommendations evolvemonitor NIST and relevant standards bodies. citeturn1search1
Let the outputs be uniformly random in a space of size (N=2^n). For (k) samples, the probability all are distinct is [ (1-rac{i}{N} ight). ] Taking logarithms and approximating ((1-x)�pprox -x) yields (([ ext{distinct}]) �pprox -rac{k(k-1)}{2N}). Solving for (k) when ([ ext{distinct}]=1/2) gives (k�pprox). This standard analytic approximation is the origin of the (2) heuristic for collision complexity. citeturn0search3
Assume compression function (f) admits two inputs (x eq x') such that (f(x)=f(x')). Then select a common prefix and use (x) and (x') as block inputs at the same compression position; because their outputs coincide, the remaining stream of compression steps produces identical chaining values and therefore identical final digests. Formalizing this argument requires careful attention to padding (MD-compliance) and the handling of message lengths; details appear in Damgrd's original treatment. citeturn2search10
- R. Rivest, The MD5 Message-Digest Algorithm, RFC 1321 (1992). citeturn0search0
- X. Wang, Y. Yin, H. Yu, Finding Collisions in the MD5 Compression Function (Eurocrypt/Advances in Cryptology) and related reports on breaking MD5. citeturn0search1turn0search21
- M. Stevens et al., Chosen-prefix collisions for MD5 and applications (papers and technical reports on chosen-prefix collisions and certificate forgeries). citeturn0search2turn0search10
- NIST, FIPS 180-4: Secure Hash Standard (SHS) (SHA-1, SHA-2 family). citeturn3search1
- NIST, FIPS 202: SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions. citeturn3search2
- NIST, SP 800-63B: Digital Identity Guidelines (Authentication and Lifecycle Management) (guidance on password handling). citeturn1search1
- OWASP, Password Storage Cheat Sheet (practical guidance). citeturn1search3
- Argon2 specification and RFC (Argon2id, memory-hard KDF recommendations). citeturn1search5turn1search11
Contemporary cryptography relies on a careful matching of algorithms to use-cases. MD5 remains a historically and pedagogically important design, but it should be avoided for any collision-sensitive security tasks because of practical collision attacks. For integrity and signature applications, SHA-2 and SHA-3 are the modern, standards-backed choices. For password storage, use a memory-hard KDF (Argon2) with unique salts and a well-considered parameter choice. The mathematical theory (birthday bound, MerkleDamgrd reductions, differential cryptanalysis) explains both the limits of generic bounds and why structural attacks can invalidate nave assumptions. Practitioners should follow NIST and OWASP guidance and maintain a migration roadmap away from obsolete constructions. citeturn0search1turn0search10turn3search1
echo "Hello World" > file.txt
md5sum file.txt
# Output (128-bit digest): b10a8db164e0754105b7a99be72e3fe5
MD5 is vulnerable to collisions use SHA-256 instead.
sha256sum file.txt
# Output (256-bit digest): a591a6d40bf420404a011733cfb7b190
SHA-256 has no known practical collisions as of 2025.
from argon2 import PasswordHasher
ph = PasswordHasher()
hash = ph.hash("mypassword123")
print(hash) # Argon2id hash with unique salt and memory-hard parameters
Do not use raw MD5 or SHA-256 for password storage.
+---------------------------+
| Input split in r-bit rate |
+---------------------------+
|
v
+-------------------+
| XOR into state |
| Apply permutation |
+-------------------+
|
v
[Absorb -> Squeeze]
|
+----------------------------+
| Output hash (digest) |
+----------------------------+
- MD5: insecure, useful only for pedagogy or non-critical checksums.
- SHA-256: robust, de-facto standard for digital signatures and integrity.
- SHA-3: modern alternative, resistant to length-extension, based on Sponge construction.
- Password storage: use Argon2id, bcrypt, or scrypt with proper salting and parameters.
This document provides a unified reference on three major families of hash functions: MD5, SHA-2 (with emphasis on SHA-256), and SHA-3 (Keccak). It integrates two complementary perspectives:
- Implementation-oriented: full constant tables, extended pseudocode, and step-by-step algorithmic details.
- Mathematical and security-oriented: formal definitions, theorems, cryptanalysis, collision and preimage resistance, and modern best practices.
The combined document includes ASCII diagrams, mermaid charts, and practical use cases. It serves as both a technical manual for implementation and a theoretical guide for cryptographic study.
- MD5 detailed pseudocode with constant tables
- SHA-256 detailed pseudocode with constant table
K[t]
- diagrams (as blocks) and ASCII diagrams
- Explanatory sections: collisions, salting, KDFs and best practices
- References
- Output: 128 bits (16 bytes).
- Block size: 512 bits.
- Internal state: A,B,C,D (32-bit words).
- Padding: append
1
bit, k zero bits until length 448 (mod 512), append 64-bit little-endian length.
The rotation amounts s[j]
and the per-step additive constants T[j]
are fixed by the specification (RFC 1321). The message index function g(j)
depends on the round.
s = [
7,12,17,22, 7,12,17,22, 7,12,17,22, 7,12,17,22,
5,9,14,20, 5,9,14,20, 5,9,14,20, 5,9,14,20,
4,11,16,23, 4,11,16,23, 4,11,16,23, 4,11,16,23,
6,10,15,21, 6,10,15,21, 6,10,15,21, 6,10,15,21
]
T = [
0xd76aa478,0xe8c7b756,0x242070db,0xc1bdceee,
0xf57c0faf,0x4787c62a,0xa8304613,0xfd469501,
0x698098d8,0x8b44f7af,0xffff5bb1,0x895cd7be,
0x6b901122,0xfd987193,0xa679438e,0x49b40821,
0xf61e2562,0xc040b340,0x265e5a51,0xe9b6c7aa,
0xd62f105d,0x02441453,0xd8a1e681,0xe7d3fbc8,
0x21e1cde6,0xc33707d6,0xf4d50d87,0x455a14ed,
0xa9e3e905,0xfcefa3f8,0x676f02d9,0x8d2a4c8a,
0xfffa3942,0x8771f681,0x6d9d6122,0xfde5380c,
0xa4beea44,0x4bdecfa9,0xf6bb4b60,0xbebfbc70,
0x289b7ec6,0xeaa127fa,0xd4ef3085,0x04881d05,
0xd9d4d039,0xe6db99e5,0x1fa27cf8,0xc4ac5665,
0xf4292244,0x432aff97,0xab9423a7,0xfc93a039,
0x655b59c3,0x8f0ccc92,0xffeff47d,0x85845dd1,
0x6fa87e4f,0xfe2ce6e0,0xa3014314,0x4e0811a1,
0xf7537e82,0xbd3af235,0x2ad7d2bb,0xeb86d391
]
- For j in 0..15: g = j
- For j in 16..31: g = (5*j + 1) mod 16
- For j in 32..47: g = (3*j + 5) mod 16
- For j in 48..63: g = (7*j) mod 16
// Helper: LEFTROTATE(x, n) rotates x left by n bits on 32-bit word
function MD5(M):
// 1. Preprocessing
original_bit_len = len(M) * 8
append bit '1' to M
append k bits '0' where k is smallest >=0 such that (len(M)+1+k) 448 (mod 512)
append 64-bit little-endian integer original_bit_len
parse padded M' into N 512-bit blocks M[0..N-1]
// 2. Initialize registers
A0 = 0x67452301
B0 = 0xEFCDAB89
C0 = 0x98BADCFE
D0 = 0x10325476
// 3. Tables: s[0..63], T[0..63] as defined above
for i = 0 to N-1:
// break block into 16 32-bit little-endian words M_i[0..15]
A = A0; B = B0; C = C0; D = D0
for j = 0 to 63:
if 0 j 15:
f = (B C) | ((~B) D)
g = j
else if 16 j 31:
f = (D B) | ((~D) C) // equivalent form of G
g = (5*j + 1) mod 16
else if 32 j 47:
f = B ^ C ^ D
g = (3*j + 5) mod 16
else:
f = C ^ (B | (~D))
g = (7*j) mod 16
tmp = (A + f + M_i[g] + T[j]) mod 2^32
tmp = LEFTROTATE(tmp, s[j])
// rotate registers
A = D
D = C
C = B
B = (B + tmp) mod 2^32
// add this block's result to the current hash value
A0 = (A0 + A) mod 2^32
B0 = (B0 + B) mod 2^32
C0 = (C0 + C) mod 2^32
D0 = (D0 + D) mod 2^32
// output (little-endian)
digest = LE32(A0) || LE32(B0) || LE32(C0) || LE32(D0)
return digest
- Output: 256 bits.
- Block size: 512 bits.
- Internal state: H0..H7 (eight 32-bit words).
- Padding: append
1
bit, k0
bits until length 448 (mod 512), append 64-bit big-endian length.
2.2 K[0..63] constants (first 32 bits of the fractional parts of the cube roots of the first 64 primes)
K = [
0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5,0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5,
0xd807aa98,0x12835b01,0x243185be,0x550c7dc3,0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174,
0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc,0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da,
0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7,0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967,
0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13,0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85,
0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3,0xd192e819,0xd6990624,0xf40e3585,0x106aa070,
0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5,0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3,
0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208,0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
]
function SHA256(M):
// Preprocessing
original_bit_len = len(M) * 8
append bit '1' to M
append k bits '0' until len(M)+1+k 448 (mod 512)
append 64-bit big-endian integer original_bit_len
parse padded M' into N 512-bit blocks M[0..N-1]
// Initialize hash values (big-endian)
H0 = 0x6a09e667
H1 = 0xbb67ae85
H2 = 0x3c6ef372
H3 = 0xa54ff53a
H4 = 0x510e527f
H5 = 0x9b05688c
H6 = 0x1f83d9ab
H7 = 0x5be0cd19
for i = 0 to N-1:
// prepare the message schedule W[0..63]
for t = 0 to 15:
W[t] = BE32(M_i[t]) // parse 32-bit big-endian words
for t = 16 to 63:
s0 = (ROTR^7(W[t-15])) xor (ROTR^18(W[t-15])) xor (W[t-15] >> 3)
s1 = (ROTR^17(W[t-2])) xor (ROTR^19(W[t-2])) xor (W[t-2] >> 10)
W[t] = (W[t-16] + s0 + W[t-7] + s1) mod 2^32
// Initialize working variables
a = H0; b = H1; c = H2; d = H3; e = H4; f = H5; g = H6; h = H7
// Main loop
for t = 0 to 63:
S1 = (ROTR^6(e)) xor (ROTR^11(e)) xor (ROTR^25(e))
ch = (e f) xor ((~e) g)
temp1 = (h + S1 + ch + K[t] + W[t]) mod 2^32
S0 = (ROTR^2(a)) xor (ROTR^13(a)) xor (ROTR^22(a))
maj = (a b) xor (a c) xor (b c)
temp2 = (S0 + maj) mod 2^32
h = g
g = f
f = e
e = (d + temp1) mod 2^32
d = c
c = b
b = a
a = (temp1 + temp2) mod 2^32
// Add the compressed chunk to the current hash value
H0 = (H0 + a) mod 2^32
H1 = (H1 + b) mod 2^32
H2 = (H2 + c) mod 2^32
H3 = (H3 + d) mod 2^32
H4 = (H4 + e) mod 2^32
H5 = (H5 + f) mod 2^32
H6 = (H6 + g) mod 2^32
H7 = (H7 + h) mod 2^32
// Produce the final digest (big-endian)
digest = BE32(H0) || BE32(H1) || BE32(H2) || BE32(H3) || BE32(H4) || BE32(H5) || BE32(H6) || BE32(H7)
return digest
flowchart TD
A[Input message M] --> B[Padding + 64-bit LE length]
B --> C[Split into 512-bit blocks (M_i)]
C --> D[Parse block -> W[0..15] (32-bit LE words)]
D --> E[Initialize A,B,C,D (IVs)]
E --> F[Compression: 4 rounds 16 steps]
F --> G[Per step: tmp = A + f(B,C,D) + W[g] + T; B = B + ROL(tmp,s)]
G --> H[Update chaining values]
H --> I[After all blocks: Digest = LE(A)||LE(B)||LE(C)||LE(D)]
flowchart TD
A[Input message M] --> B[Padding + 64-bit BE length]
B --> C[Split into 512-bit blocks]
C --> D[Message schedule W[0..63]]
D --> E[Initialize a..h (H0..H7)]
E --> F[64 rounds: temp1,temp2,0,1,Ch,Maj]
F --> G[Update working vars a..h]
G --> H[Add to Hash State H0..H7]
H --> I[Digest = H0||...||H7 (256 bits)]
ASCII compact diagram: MD5 single-block
+---------------------------+
| 512-bit block M_i |
| W0 W1 ... W15 |
+-----------+---------------+
|
+------v-------+
| Init A,B,C,D |
+------v-------+
| 64 step loop |
| (4 rounds) |
+------v-------+
| Add to state |
+--------------+
- MD5: Practical collision and chosen-prefix collision methods exist. Avoid for security-critical contexts.
- SHA-1: Broken for collision resistance (e.g., SHAttered demonstration). Avoid.
- SHA-256: No practical collisions as of 2025; recommended for signatures and integrity.
- SHA-3: Sponge construction; conservative alternative to SHA-2 family.
- Password storage: Use Argon2id or bcrypt/scrypt with unique salts. Do not store raw MD5/SHA-256 hashes for passwords.
- Rivest, R. "The MD5 Message-Digest Algorithm", RFC 1321, 1992.
- NIST FIPS 180-4, Secure Hash Standard (SHA-2), 2015.
- NIST FIPS 202, SHA-3 Standard (Keccak), 2015.
- Wang, X., Yin, Y., Yu, H.: "Finding Collisions in the MD5 Compression Function."
- Stevens, M. et al.: "Chosen-prefix collisions for MD5 and applications."