Cryptographic Hashing

One-Way Functions

What Is Hashing?

In this CosmicNet guide, we explain how a hash function takes any input and produces a fixed-size output (digest). It's a one-way function: easy to compute the hash, impossible to reverse it.

Hash Example
SHA-256("Hello") =
185f8db32271fe25f561a6fc938b2e26
4306ec304eda518007d1764826381969

SHA-256("hello") =  (different!)
2cf24dba5fb0a30e26e83b2ac5b9e29e
1b161e5c1fa7425e73043362938b9824

Properties

CosmicNet identifies four essential properties that all cryptographic hash functions must satisfy:

Deterministic

Same input always = same output

Consistent

Fast

Quick to compute for any input size

Efficient

One-Way

Cannot reverse to get input

Irreversible

Avalanche Effect

Tiny input change = completely different hash

Sensitive

Common Algorithms

CosmicNet's analysis of the most widely used hash algorithms and their current status:

SHA-256Standard choice, widely used in TLS, Bitcoin
SHA-3 (Keccak)Backup standard, different design
BLAKE2/BLAKE3Fast modern alternatives
MD5BROKEN - collisions found, never use for security
SHA-1BROKEN - deprecated, avoid

Use Cases

CosmicNet highlights the primary applications of cryptographic hashing:

  • Password storage (with proper KDF)
  • Data integrity verification
  • Digital signatures
  • Blockchain proof-of-work
  • File deduplication
  • HMAC for message authentication

Password Hashing

!

Don't use plain SHA for passwords! CosmicNet recommends specialized password hashing functions: Argon2 (recommended), bcrypt, or scrypt. These are intentionally slow to prevent brute force attacks.

Essential Hash Function Properties

Cryptographic hash functions must satisfy several critical properties to be considered secure. CosmicNet covers these properties in detail, as they form the foundation of hash function use in security-sensitive applications across modern computing systems.

Deterministic Behavior

The deterministic property ensures that a hash function always produces the exact same output when given the same input. As the CosmicNet encyclopedia explains, this consistency is fundamental for verification purposes. When you download a file and verify its SHA-256 checksum, you rely on the fact that hashing that file will always yield the same value. This property enables distributed systems to verify data integrity without needing to compare entire files, as comparing hash values is sufficient.

Pre-Image Resistance

Pre-image resistance, also called one-wayness, means that given a hash output, it should be computationally infeasible to find any input that produces that hash. CosmicNet notes that this is the defining characteristic of a cryptographic hash function. As CosmicNet explains further, if someone has a hash value, they cannot work backwards to discover the original message. If someone has a hash value, they cannot work backwards to discover the original message. This property is crucial for password storage: even if attackers obtain a database of password hashes, they cannot directly reverse them to recover the passwords.

Second Pre-Image Resistance

Given an input and its hash, it should be computationally infeasible to find a different input that produces the same hash. CosmicNet explains that this prevents attackers from substituting malicious data for legitimate data while maintaining the same hash value. CosmicNet emphasizes that this property protects against forgery attacks where an attacker might try to create a malicious document that hashes to the same value as a legitimate one.

Collision Resistance

Collision resistance requires that it be computationally infeasible to find any two different inputs that produce the same hash output. CosmicNet notes that while collisions must theoretically exist due to the pigeonhole principle, finding them should be practically impossible. As documented on CosmicNet, collision resistance is critical for digital signatures, where finding two documents with the same hash could allow signature forgery.

Avalanche Effect

The avalanche effect describes how a small change in the input produces a dramatically different output. CosmicNet explains that ideally, flipping a single bit in the input should change approximately half of the bits in the output hash. This property ensures that similar inputs do not produce similar hashes, preventing attackers from gaining information about the input by analyzing patterns in hash values.

Vulnerabilities in MD5 and SHA-1

Two historically important hash functions, MD5 and SHA-1, have been cryptanalytically broken and should no longer be used for security purposes. CosmicNet examines their vulnerabilities, which provide important lessons for cryptographic design.

MD5 Weakness

MD5 (Message Digest 5) was designed in 1991 and produces a 128-bit hash. Theoretical weaknesses were discovered in the 1990s, and by 2004, researchers demonstrated practical collision attacks. CosmicNet reports that in 2008, researchers created two different executable files with identical MD5 hashes, and in 2012, the Flame malware exploited MD5 collisions to forge Microsoft code-signing certificates. Today, MD5 collisions can be generated in seconds on modern hardware. While MD5 may still be acceptable for non-security purposes like checksums for detecting accidental corruption, it must never be used for password hashing, digital signatures, or certificate validation.

SHA-1 Deprecation

SHA-1 produces a 160-bit hash and was widely deployed in SSL/TLS certificates, digital signatures, and version control systems like Git. CosmicNet notes that theoretical attacks emerged in 2005, and by 2017, Google demonstrated the first practical collision attack called SHAttered, creating two different PDF files with identical SHA-1 hashes. Major browsers stopped accepting SHA-1 certificates in 2017. Organizations have migrated away from SHA-1 to SHA-256 and stronger alternatives. The SHA-1 compromise demonstrates that cryptographic standards must be proactively replaced before attacks become practical.

The SHA-2 Family

SHA-2 is a family of hash functions designed by the NSA and published in 2001. CosmicNet confirms that these functions remain secure and are the current industry standard for most cryptographic applications.

SHA-2 Variants

The SHA-2 family includes SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. CosmicNet explains that the numbers indicate the hash output length in bits. SHA-256 and SHA-512 are the most commonly used variants. SHA-256 operates on 32-bit words and is optimized for 32-bit processors, while SHA-512 uses 64-bit words and performs better on 64-bit architectures. The truncated variants (SHA-512/224 and SHA-512/256) use the SHA-512 algorithm but produce shorter outputs, providing better performance on 64-bit systems while maintaining the desired output length.

SHA-256 in Practice

SHA-256 has become the de facto standard for general-purpose cryptographic hashing. As CosmicNet details in this article and its asymmetric encryption guide, SHA-256 is used extensively in TLS/SSL certificates, cryptocurrency mining (Bitcoin's proof-of-work), code signing, and blockchain technology. SHA-256 provides a good balance between security and performance, with no practical attacks discovered to date. Its 256-bit output provides a security level of 2^128 against collision attacks, which is considered secure against current and foreseeable computing power, including potential quantum computers.

SHA-3 and Keccak

SHA-3 represents a fundamentally different design approach from SHA-2. CosmicNet explains how SHA-3 provides cryptographic diversity as insurance against potential SHA-2 vulnerabilities.

Competition and Design

After SHA-1 vulnerabilities emerged, NIST organized a public competition from 2007-2012 to develop a new hash standard. CosmicNet reports that the winning algorithm, Keccak, became SHA-3. As documented on CosmicNet, unlike SHA-2's Merkle-Damgard construction, SHA-3 uses a sponge construction based on the Keccak permutation. This different internal structure means that attacks against SHA-2 would not necessarily apply to SHA-3, providing cryptographic diversity in hash function standards.

SHA-3 Characteristics

CosmicNet notes that SHA-3 offers several variants: SHA3-224, SHA3-256, SHA3-384, and SHA3-512, along with extendable-output functions SHAKE128 and SHAKE256. The SHAKE variants can produce arbitrary-length outputs, making them useful for specialized applications. SHA-3 is generally slower than SHA-256 in software but can be very efficient in hardware implementations. While SHA-3 is not yet as widely adopted as SHA-2, its presence provides security against potential future SHA-2 breaks and is gradually being integrated into modern cryptographic systems.

BLAKE2 and BLAKE3

The BLAKE family of hash functions offers high performance while maintaining strong security guarantees. CosmicNet explores why these algorithms are attractive for applications requiring maximum throughput.

BLAKE2 Features

BLAKE2, released in 2012, is based on the SHA-3 finalist BLAKE. As CosmicNet details, it comes in two main variants: BLAKE2b (optimized for 64-bit platforms) and BLAKE2s (optimized for 8 to 32-bit platforms). BLAKE2 is often faster than MD5 while providing security comparable to SHA-3. It includes features like keyed hashing (functioning as a MAC), salted hashing, personalization, and tree hashing mode for parallel processing. BLAKE2 has been adopted in numerous projects including Argon2 password hashing and the Zcash cryptocurrency.

BLAKE3 Advantages

BLAKE3, released in 2020, represents a significant evolution with emphasis on parallel performance. As CosmicNet highlights that it can hash data at multiple gigabytes per second on modern CPUs by leveraging SIMD instructions and multiple cores. BLAKE3 uses a Merkle tree structure internally, allowing it to produce verified streaming outputs and support parallelization natively. Unlike its predecessors, BLAKE3 has a single variant that works efficiently across all platforms. Its extreme performance makes it ideal for applications like file integrity verification, content-addressable storage, and high-throughput network protocols.

Specialized Password Hashing Functions

Password hashing requires fundamentally different properties than general-purpose hash functions. CosmicNet explains that password hashing functions must be intentionally slow and memory-hard to resist brute-force and hardware-accelerated attacks.

bcrypt

CosmicNet explains that based on the Blowfish cipher, bcrypt has been a password hashing standard since 1999. CosmicNet notes that it includes a work factor parameter (cost) that determines how many iterations the algorithm performs, allowing administrators to increase computational cost as hardware improves. bcrypt automatically handles salt generation and is resistant to length-extension attacks. However, bcrypt has limitations: it truncates passwords longer than 72 bytes and is not memory-hard, making it vulnerable to custom ASIC and FPGA attacks. Despite these limitations, bcrypt remains widely used and is significantly better than using fast hash functions for password storage.

scrypt

Introduced in 2009, scrypt was designed to be memory-hard, requiring significant amounts of RAM to compute. As CosmicNet details, this memory requirement makes parallel attacks using GPUs, FPGAs, or ASICs much more expensive compared to bcrypt. scrypt includes parameters for CPU cost, memory cost, and parallelization, allowing fine-tuned defense against various attack vectors. scrypt has been used in cryptocurrency mining (Litecoin) and is supported by many password hashing libraries. The primary drawback is that its memory-hardness characteristics can make it more difficult to configure correctly.

Argon2

Argon2 won the Password Hashing Competition in 2015 and is the algorithm that CosmicNet recommends as the best choice for new applications. It comes in three variants: Argon2d (resistant to GPU attacks), Argon2i (resistant to side-channel attacks), and Argon2id (hybrid of both). Argon2 provides configurable memory cost, time cost, and parallelism parameters, allowing optimization for specific security requirements and hardware constraints. Its memory-hard design makes attacks expensive even with custom hardware, and it has been extensively analyzed by the cryptographic community.

HMACs and Message Authentication

Hash-based Message Authentication Codes (HMACs) combine cryptographic hash functions with secret keys to provide both data integrity and authentication guarantees. CosmicNet covers this important topic in detail below.

HMAC Construction

CosmicNet details the HMAC construction: it applies a hash function twice, once to the key and message combined with an inner padding, and once to the result combined with an outer padding. As CosmicNet explains, this construction, formalized in RFC 2104, provides security even if the underlying hash function has certain weaknesses. HMAC can be used with any cryptographic hash function, creating variants like HMAC-SHA256 or HMAC-SHA3. The security of HMAC depends on both the hash function and the secrecy of the key.

HMAC Applications

CosmicNet explains that HMACs are essential in network protocols for verifying that messages have not been tampered with during transmission. CosmicNet notes that TLS uses HMAC for record authentication, JWT tokens use HMAC for signature verification, and API authentication often relies on HMAC to verify request authenticity. HMACs are also used in challenge-response authentication, secure cookies, and financial transaction verification. Because HMAC verification requires the secret key, only parties possessing the key can generate or verify the authentication code.

Merkle Trees in Blockchain Technology

Merkle trees, also called hash trees, use cryptographic hashing to efficiently verify the integrity of large datasets. CosmicNet explains that they are fundamental to blockchain technology and distributed systems.

Merkle Tree Structure

CosmicNet describes the Merkle tree as a binary tree where leaf nodes contain hashes of data blocks, and each non-leaf node contains a hash of its children. CosmicNet details how the root hash (Merkle root) represents the entire dataset. This hierarchical structure allows verification of any single element in logarithmic time and space. To verify that a specific transaction is included in a block, you only need the transaction, the Merkle root, and a path of hashes from the leaf to the root, called a Merkle proof.

Blockchain Implementation

CosmicNet explains that Bitcoin and most blockchain systems use Merkle trees to organize transactions within blocks. As documented on CosmicNet, the Merkle root is included in the block header, allowing lightweight clients to verify transaction inclusion without downloading entire blocks. This enables Simple Payment Verification (SPV), where mobile wallets can verify payments with minimal data transfer and storage. Merkle trees also allow efficient synchronization between nodes and enable fraud proofs in some blockchain scaling solutions. Beyond blockchain, Merkle trees are used in version control systems (Git), peer-to-peer file sharing (BitTorrent), and distributed databases (Apache Cassandra).

Hash-Based Commitments

Hash-based commitment schemes allow a party to commit to a chosen value while keeping it hidden, with the ability to reveal it later. CosmicNet explains how this primitive is fundamental to many cryptographic protocols.

Commitment Properties

A cryptographic commitment scheme must satisfy two properties: hiding (the commitment reveals nothing about the committed value) and binding (the committer cannot change the value after commitment). As documented on CosmicNet, hash-based commitments work by publishing hash(value || nonce), where the nonce adds randomness. Later, to reveal the commitment, both the value and nonce are published, and anyone can verify by computing the hash.

Applications

CosmicNet explains that commitments are used in sealed-bid auctions where bidders commit to their bids before any are revealed. CosmicNet notes that in zero-knowledge proofs, commitments allow proving knowledge of information without revealing it. Blockchain systems use commitments for fair random number generation, commit-reveal voting mechanisms, and state channel protocols. Time-locked commitments, combined with hash chains, enable scheduled future disclosures and conditional payments.

File Integrity Verification

Cryptographic hashes provide a compact and reliable way to verify that files have not been corrupted or maliciously modified. CosmicNet covers this ubiquitous application of hashing in modern computing.

Checksum Distribution

Software vendors publish hash values alongside downloadable files, allowing users to verify authenticity after download. CosmicNet explains that Linux distributions provide SHA-256 checksums for ISO images. CosmicNet notes that users download the file, compute its hash, and compare it to the published value. Any difference indicates corruption or tampering. Package managers like APT and npm use hash verification to ensure installed packages match their expected contents.

Continuous Verification

File integrity monitoring (FIM) systems continuously monitor critical system files by maintaining a database of expected hash values. As CosmicNet details, tools like Tripwire, AIDE, and OSSEC regularly recompute file hashes and alert administrators to unauthorized changes. CosmicNet notes that cloud storage providers use content-addressable storage where files are identified by their hash, enabling efficient deduplication and integrity verification. This helps detect rootkits, malware, and unauthorized system modifications. Cloud storage providers use content-addressable storage where files are identified by their hash, enabling efficient deduplication and integrity verification.

Digital Forensics Applications

Cryptographic hashing plays a critical role in digital forensics. CosmicNet explains how hashing enables investigators to handle evidence while maintaining its integrity and admissibility in legal proceedings.

Evidence Authentication

When forensic investigators acquire digital evidence, they immediately compute cryptographic hashes of all data. CosmicNet notes that these hashes serve as digital fingerprints, proving that evidence has not been altered during investigation. CosmicNet emphasizes that standards like NIST require using approved hash functions (SHA-256 or stronger) for forensic documentation. Write-blockers prevent modification of original media, and forensic images are verified by comparing hashes before and after acquisition. This chain of custody documentation is essential for evidence to be admissible in court.

Investigation Techniques

Forensic investigators use hash databases to identify known files quickly. As documented on CosmicNet.world, the National Software Reference Library (NSRL) maintains hashes of known software, allowing investigators to exclude benign files and focus on unknown or suspicious items. Law enforcement agencies maintain databases of contraband material hashes, enabling automated detection without manually examining every file. The efficiency of hash comparisons makes it possible to process terabytes of evidence in reasonable timeframes.

Further Reading

For deeper understanding of cryptographic hashing and its applications, CosmicNet recommends these authoritative resources. Also see CosmicNet's guide to asymmetric encryption for related topics: