Archive Node
An archive node stores the complete historical state of a blockchain at every block height, enabling full historical queries and analytics.
Key Takeaways
- An archive node retains every historical state of a blockchain from the genesis block forward, unlike a full node that only keeps the current state after validating all blocks.
- Archive nodes are essential infrastructure for block explorers, analytics platforms, and DeFi protocols that need to query balances, contract storage, or account state at arbitrary past block heights.
- Storage requirements vary dramatically by chain and client: Ethereum archive nodes range from 2 TB (Erigon) to over 12 TB (legacy Geth), while Bitcoin's UTXO model makes the concept less relevant since full nodes already store the complete transaction history by default.
What Is an Archive Node?
An archive node is a blockchain node that stores the complete historical state of the network at every block height. While a standard full node validates every block and transaction from genesis, it only retains the current state: the latest account balances, contract storage values, and unspent transaction outputs. An archive node goes further by preserving snapshots of that state at every single block, enabling queries like "what was this account's balance at block 15,000,000?"
The distinction matters most on account-based blockchains like Ethereum, where the global state (a massive key-value store of all account balances and smart contract data) changes with every block. A full node applies each state transition but prunes old versions to save disk space. An archive node never prunes, keeping every intermediate state accessible for queries.
Think of a full node as a bank that knows every customer's current balance. An archive node is a bank that can also tell you every customer's balance on any specific date going back to the day the bank opened.
How It Works
To understand archive nodes, it helps to understand how blockchain state is stored and managed across different node types.
Node Types on the Spectrum
Blockchain nodes exist on a spectrum from lightweight to comprehensive:
- Pruned node: validates every block from genesis but discards old block data after validation, keeping only the current state. On Bitcoin, a pruned node can operate with as little as 2 to 10 GB of storage.
- Full node: validates and stores all blocks and the current state. On Bitcoin, this requires roughly 700 GB. On Ethereum, around 1 to 1.3 TB for the execution client.
- Archive node: stores everything a full node does, plus every historical state at every block height. On Ethereum, this ranges from 2 TB to over 12 TB depending on the client.
All three node types perform full validation. The difference is solely about what data they retain after validation. A pruned node is just as secure as an archive node for verifying new transactions: it simply cannot answer questions about the past.
State Storage on Ethereum
Ethereum stores global state in a Merkle Patricia Trie: a tree-like data structure where each leaf represents an account's balance, nonce, code hash, and storage root. When a transaction modifies an account, new trie nodes are created for the changed path while unchanged branches are shared with the previous state.
A full node only keeps the latest version of this trie. When it processes block N+1, it can garbage-collect trie nodes that are no longer reachable from the new state root. An archive node retains every state root and all associated trie nodes, making any historical state retrievable via RPC calls like eth_getBalance with a specific block number parameter.
// Query current balance (works on any full node)
eth_getBalance("0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045", "latest")
// Query balance at a specific block (requires archive node)
eth_getBalance("0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045", "0xE4E1C0")
// Trace a historical transaction (requires archive node)
debug_traceTransaction("0xabc123...")State Storage on Bitcoin
Bitcoin's UTXO model handles historical state differently from Ethereum's account model. Rather than maintaining a running balance for each address, Bitcoin tracks discrete unspent transaction outputs. The UTXO set is the current state: the collection of all coins that haven't been spent yet.
Because Bitcoin transactions explicitly reference which UTXOs they consume, the full transaction history is already embedded in the block data. A standard Bitcoin Core full node stores every block ever produced and can reconstruct any past state by replaying transactions. This means the distinction between "full node" and "archive node" is less pronounced on Bitcoin than on Ethereum.
In Bitcoin terminology, "archival node" typically refers to a node that stores the complete block history (as opposed to a pruned node that discards old blocks). This is the default configuration for Bitcoin Core. For a deeper comparison of these two state models, see the UTXO vs. account model research article.
Storage Requirements
The practical barrier to running an archive node is storage. Different client implementations use different storage strategies, producing dramatically different disk footprints:
Ethereum Archive Node Sizes (2026)
| Client | Archive Size | Notes |
|---|---|---|
| Geth (hash-based) | 12+ TB | Legacy mode with full historical Merkle proofs |
| Geth (path-based) | ~2 TB | Newer mode; does not support historical eth_getProof |
| Erigon | 1.8 to 2.2 TB | Optimized flat storage layout; syncs in under 3 days |
| Reth | ~2.8 TB | Rust-based client with growing adoption |
Bitcoin Node Sizes (2026)
| Configuration | Storage | Capabilities |
|---|---|---|
| Full archival node | ~700 GB | Stores all blocks, can serve historical data to peers |
| Pruned node | 2 to 10 GB | Full validation, cannot serve old blocks |
The contrast is striking: Bitcoin's full archival node at ~700 GB is smaller than a single Ethereum full node. This reflects the different state models: Ethereum's account model accumulates state over time (every new contract adds permanent storage), while Bitcoin's UTXO model only tracks current unspent outputs (roughly 7 GB for the UTXO set itself).
Use Cases
Most blockchain participants never need an archive node. Wallets, payment processors, and even Lightning Network nodes only need the current state. Archive nodes serve specialized infrastructure roles:
Block Explorers and Analytics
Services like Etherscan, Blockchair, and Dune Analytics require archive nodes to display historical balances, trace transaction execution, and compute aggregate statistics. When you look up an address on a block explorer and see its balance history charted over time, that data comes from querying an archive node at successive block heights.
DeFi Protocol Infrastructure
DeFi protocols rely on archive nodes for several critical functions: computing historical price feeds for oracle verification, replaying past liquidation events for risk modeling, and auditing smart contract state changes. Indexing services like The Graph process events by scanning historical blocks with archive access.
Debugging and Development
Developers use archive nodes to trace failed transactions, replay execution step by step, and test contract interactions at specific historical states. The debug_traceTransaction RPC call, essential for diagnosing reverts and gas issues, requires archive state.
Regulatory and Compliance
Chain analysis firms use archive nodes to reconstruct transaction flows, identify address clusters, and trace funds across time. Tax reporting tools also need historical state to compute cost basis and capital gains at specific points in time.
EIP-4444 and History Expiry
Ethereum's EIP-4444 (Bound Historical Data in Execution Clients) introduced a significant shift in how nodes handle historical data. As of July 2025, all major execution clients support Partial History Expiry (PHE), which allows full nodes to drop pre-Merge block bodies and receipts. This frees 300 to 500 GB of storage, letting full nodes fit comfortably on 2 TB drives.
The implication for archive nodes: as full nodes shed historical data by default, archive nodes become the primary source for pre-Merge block data. Infrastructure providers like Alchemy, Infura, and QuickNode offer archive access as a premium service, and the demand for self-hosted archive nodes grows among teams that cannot rely on third-party APIs for historical queries.
Why It Matters
Archive nodes represent a tradeoff at the heart of blockchain design: accessibility versus resource requirements. The more historical data a network makes easy to query, the more useful it becomes for analytics, compliance, and development, but the harder it becomes for individuals to run nodes.
This tradeoff is why Bitcoin's UTXO model is often praised for keeping full validation accessible. A Bitcoin full node at ~700 GB stores everything needed to validate the chain and serve historical blocks to peers. Ethereum's richer state model enables smart contracts and complex DeFi, but at the cost of much larger archive requirements.
For Layer 2 networks like Spark, the design of state storage directly impacts node accessibility and decentralization. Keeping historical state manageable ensures that more participants can independently verify the network, reinforcing the censorship resistance that makes blockchains valuable.
Risks and Considerations
Centralization of Historical Access
Because archive nodes are expensive to run, historical blockchain data tends to concentrate among a few large infrastructure providers. If most developers query Alchemy or Infura instead of running their own archive nodes, a service outage or policy change at those providers could disrupt large portions of the ecosystem.
Hardware Costs
Running an Ethereum archive node requires high-performance NVMe SSDs (not spinning disks) due to the random read patterns involved in state queries. A 4 TB NVMe drive suitable for archive use can cost several hundred dollars, and the storage requirement grows with every block. This is a recurring cost that scales with chain activity.
Sync Time
Initial sync for an archive node takes significantly longer than a full node sync. An Ethereum archive node can take days to weeks to sync from genesis, depending on hardware and client. During this period, the node is not useful for queries and consumes significant bandwidth and CPU resources.
Data Integrity
Unlike current-state queries (which can be verified against the latest block header), verifying historical state requires trusting that the archive node correctly preserved past state roots. In practice, this is ensured by the same consensus mechanism that validates blocks: each block header commits to its state root via a cryptographic hash, so tampered historical state would produce a different root hash.
This glossary entry is for informational purposes only and does not constitute financial or investment advice. Always do your own research before using any protocol or technology.