Hash

General Updated Jan 2026

What Is a Hash?

A hash is the output of a cryptographic hash function — a mathematical algorithm that takes an input of any size and produces a fixed-length string of characters. No matter whether you hash a single word, an entire book, or a terabyte video file, the output will always be the same length. In blockchain, hashes are the glue that holds everything together: they link blocks, verify transaction integrity, secure proof-of-work, and enable efficient data structures like Merkle trees.

Think of a hash as a digital fingerprint. Just as a human fingerprint uniquely identifies a person without revealing what they look like, a hash uniquely identifies a piece of data without revealing what the data contains. You cannot reconstruct the original data from its hash, and even a tiny change to the input produces a completely different hash output. This property is what makes hashes indispensable in cryptography and blockchain technology.

Core Properties of Cryptographic Hash Functions

Not all hash functions are suitable for blockchain use. The algorithms used in major blockchains (SHA-256 for Bitcoin, Keccak-256 for Ethereum) satisfy several critical properties:

Deterministic: The same input always produces the same output. If you hash the word “hello” a million times using SHA-256, you will get 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 every single time. This is essential for verification — anyone can re-run the hash function and confirm the result matches.

Pre-image resistance (one-way): Given a hash output, it is computationally infeasible to find the original input. There is no mathematical “inverse” operation. The only way to find an input that produces a given hash is brute-force guessing, which for SHA-256 would require trying 2^256 possible inputs on average — a number larger than the number of atoms in the observable universe.

Second pre-image resistance: Given an input, it is infeasible to find a different input that produces the same hash. This prevents an attacker from substituting malicious data that hashes to the same value as legitimate data.

Collision resistance: It is infeasible to find any two different inputs that produce the same hash. Note that collisions are mathematically guaranteed to exist (there are infinite possible inputs but only 2^256 possible outputs for SHA-256), but finding one should require astronomical computational effort.

Avalanche effect: A tiny change in the input produces a dramatically different output. Change “hello” to “hellp” and the SHA-256 hash becomes completely different: e3b5e7e8c0f1a2d3f4e5d6c7b8a9f0e1d2c3b4a5f6e7d8c9b0a1f2e3d4c5b6. This makes it impossible to predict how a change to input affects the output.

Fast computation: Hashing should be fast for legitimate uses. A modern CPU can compute millions of SHA-256 hashes per second. This matters for block verification — every node on the network must be able to hash blocks quickly to validate the chain.

SHA-256

SHA-256 (Secure Hash Algorithm 256-bit) is a member of the SHA-2 family designed by the NSA and published by NIST. It produces a 256-bit (32-byte) output, typically represented as a 64-character hexadecimal string. Bitcoin uses SHA-256 extensively:

  • Block hashing: The block header is hashed twice (SHA-256d) to produce the block hash. Miners must find a hash that starts with a certain number of leading zeros — this is the proof-of-work.
  • Transaction hashing: Individual transactions are hashed to create a transaction ID (txid).
  • Merkle tree construction: Transaction hashes are paired and re-hashed up the tree to produce the Merkle root, which is included in the block header.
  • Address generation: Public keys are hashed (with SHA-256 and RIPEMD-160) to produce Bitcoin addresses.

SHA-256 has been extensively analyzed since its publication in 2001 and no practical attacks have been found. It remains one of the most trusted hash functions in the world, used not just in Bitcoin but in TLS certificates, password storage, code signing, and countless other security applications.

Keccak-256

Keccak-256 is the hash function used by Ethereum. It was designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche and won the NIST SHA-3 competition in 2012. However, Ethereum uses the original Keccak-256 submission rather than the final NIST-standardized SHA-3 (there are minor differences in the padding scheme).

Keccak-256 also produces a 256-bit output. It is used throughout the Ethereum ecosystem:

  • Transaction hashing: Every transaction has a hash computed using Keccak-256.
  • State root computation: The entire state of the Ethereum network (all account balances, contract storage, contract code) is organized into a Patricia Merkle Trie and the root hash is stored in each block header.
  • Solidity keccak256() function: Smart contracts can compute Keccak-256 hashes natively. This is used for signature verification, commitment schemes, and pseudo-random number generation (though not securely — on-chain randomness is a known challenge).
  • Address derivation: Ethereum addresses are the last 20 bytes of the Keccak-256 hash of the public key.

Hash Functions in Practice

Beyond their role in blockchain construction, hash functions enable several important use cases:

Digital signatures: When you sign a transaction, you are not signing the transaction data directly — you sign the hash of the transaction data. This is more efficient and equally secure because of pre-image resistance.

Data integrity verification: Hashes allow you to verify that data has not been tampered with. If you download a file and its hash matches the published hash, you can be confident the file is authentic. This is how software distributions verify downloads.

Proof-of-work: Bitcoin mining is essentially a hash guessing game. Miners repeatedly hash the block header with different nonces until they find a hash below the target difficulty. The only way to find a valid hash is through brute force, which requires real computational energy, creating the security guarantee.

Commit-reveal schemes: You can commit to a value by publishing its hash, then reveal the value later. Because hashes are one-way, nobody can determine your committed value from the hash, but they can verify it when you reveal it. This is used in voting systems and fair randomness generation.

Content addressing: IPFS (InterPlanetary File System) uses hashes as addresses. Instead of asking for a file by name, you ask for it by its hash. If even one byte of the file changes, its hash is different, so you always get exactly the content you requested.

Hash Length Extension Attacks and Security Considerations

One subtle security issue with some hash functions (particularly MD5 and SHA-1) is the length extension attack: given H(m) and the length of m, an attacker can compute H(m || padding || extension) without knowing m. HMAC (Hash-based Message Authentication Code) is designed to prevent this by using the hash function in a nested construction.

Modern blockchains avoid this issue by using domain separation — different uses of the hash function include different prefixes or context tags, ensuring that a hash computed for one purpose cannot be reused for another.

Comparison of Major Hash Functions

FunctionOutput SizeUsed ByStatus
SHA-256256 bitsBitcoin, TLS, certificatesSecure
Keccak-256256 bitsEthereum, SoliditySecure
SHA-1160 bitsLegacy systemsBroken (2017)
MD5128 bitsLegacy checksumsBroken (2004)
BLAKE2bVariableFilecoin, ZcashSecure, faster than SHA-256
BLAKE3VariableModern applicationsSecure, extremely fast
Poseidon256 bitszk-SNARK systemsSecure, ZK-optimized

Common Pitfalls

  • Confusing hashing with encryption: Hashing is one-way; encryption is two-way. You cannot “decrypt” a hash to recover the original data.
  • Using broken hash functions: MD5 and SHA-1 should never be used for security-critical applications. They have known collision attacks.
  • Assuming hashes are random: While hash outputs look random, they are deterministic. For cryptographic randomness, you need a proper randomness source (like VRFs or oracle-based solutions).
  • Double hashing misconception: Bitcoin uses SHA-256d (SHA-256 applied twice) not for additional security but to address length extension attacks and align with the original Satoshi design.