Solving State Bloat

At their core, blockchains are replicated deterministic state machines, where “state” refers to data that a node must hold to be able to process new incoming blocks and transactions. For the uninitiated, a state machine is a computer science concept that refers to an abstract machine that manages transitions between valid conditions and can be in exactly one of a finite number of states at any given time. In blockchains, there is “a state,” which describes the current state of the ledger, and transactions, which trigger state transitions.

One of the most important concepts to clear from the get-go is the difference between state and history. Namely, state refers only to current data in use, or the final result after a node has processed all the blocks and transactions from the first (genesis) block to the latest one. Blockchains are ‘immutable’ only in that their history or the data stored in blocks that have already been mined cannot be changed. The state, therefore, can be contrasted with history, which represents the information about past events that nodes can save for later rebroadcasting or archiving purposes, but don’t necessarily need to validate blocks and transactions or continue processing the chain.

Therefore, a blockchain’s state isn’t immutable, but constantly changing. Whenever a new block is added to the blockchain, every node in the network updates its copy of the ledger to reflect this change, and the consensus protocol ensure that the current state seen by each node is consistently the same. To achieve this, blockchains must ensure that every node sees the same history and processes transactions according to the same rules. Whenever a node deviates from the rules or attempts to cheat the system by proposing fraudulent or invalid transactions, the honest nodes in the system will reject those transactions, and the dishonest node will be punished by getting slashed (in Proof-of-Stake systems) or not receiving block rewards and being left with a hefty electricity bill (in Proof-of-Work systems).

State Management in Bitcoin

To that point, different blockchains manage state and history in different ways. For example, in Bitcoin, state refers to all unspent transaction outputs (UTXOs), where each UTXO represents a specified amount of bitcoin assigned to a specific owner or public address. On the other hand, history consists of a series of transactions with input(s) and output(s). When a transaction is broadcasted, the UTXOs referenced by the transaction inputs are marked as spent, removed from the UTXO collection, and new UTXOs (from the transaction outputs) are added to the collection.

One can imagine the Bitcoin network as a book (transactions) marked with sticky notes (UTXOs). Every time someone spends or receives bitcoin, they write an entry in the book. Each entry is like a page in the book, and once a page or entry is written, it becomes a permanent part of the book. On the other hand, the sticky notes are special notes attached to the pages that indicate only bitcoins that haven’t been spent yet. As bitcoins are spent, some sticky notes are removed while new ones are added to the subsequent pages. The sticky notes collectively represent the state that is constantly changing, while the book represents the history that is immutable.

The key point to understand here is that the text in the book’s pages (TXOs) represents all transactions that have ever happened, whether the coins involved have been spent again or not, while the sticky notes (UTXOs) represent only the coins that haven't been spent yet. In other words, Bitcoin’s history is a detailed record of every transaction, while the current state is just a snapshot of bitcoins that haven’t been spent yet, meaning the latter is a subset of the former. And precisely due to this subset relationship between history and state (the data types in history and state are of the same type), the size of the state is always significantly smaller than that of the historical data.

State Management in Ethereum

In Ethereum, the largest account-based blockchain, the state can be imagined as a giant ledger or spreadsheet, where each row represents an account. Every account has details like its balance (how many tokens it holds) and nonces, or if it's a smart contract, some additional data related to that contract, including contract code and storage. Whenever a transaction happens in Ethereum, nodes update the balances or data corresponding to the respective accounts in the spreadsheet. In this case, Ethereum’s state represents a snapshot of the current information in the spreadsheet, while its history is constructed of transactions, which modify the state by editing the targeted accounts.

Without going too deep into the weeds, as can be seen from the above example, Ethereum manages data very differently from Bitcoin. Ethereum’s state is constructed from accounts (not UTXOs like in Bitcoin), and a transaction comprises information that triggers account modification. The state and transactions each record completely different kinds of data, so there is no subset relationship between them. This means history and state refer to data from different dimensions, and transaction history size and state size have no causal relationship.

When a transaction modifies the state, new state is created while the old state is stored as a historical state. Thus, Ethereum’s block headers (and history) have two different Merkle roots (a cryptographic method for efficiently verifying the contents of large data structures), one containing transactions and one containing state. The disk space requirements for Ethereum nodes are higher than in Bitcoin.

When discussing node requirements, a common topic is the distinction between Ethereum full nodes and archive nodes. Namely, both can process the chain (validate transactions and secure the chain), but only the latter stores Etheruem’s full transaction and state history. To that point, the disk space required to run an Ethereum full node–which prunes the historical state data and stores only the historical transaction data and current state–sits at around 1.2 terabytes, while an archive node requires about 15 terabytes.

Because discarding the historical states does not cause any problems for full Ethereum nodes in terms of processing the chain, as all historical states can be recomputed (albeit at high computational cost) as long as the Genesis block and transaction history are known, the meaningful figures to compare are the amount of data that full nodes must store: around 500 gigabytes for Bitcoin, and one terabyte for Ethereum. Considering that Ethereum is six years younger than Bitcoin, it is clear that the growth rate of its history and state is significantly higher.

State Explosion or Blockchain Bloat: The Problem at Hand

Understanding the state explosion problem in blockchains becomes rather intuitive once data management is contrasted with that of traditional (centralized) cloud computing and storage services like AWS' S3 service.

Namely, to use the AWS S3 service, users need to pay for the storage on an ongoing basis (typically fixed monthly installments per gigabyte occupied), as the data stored keeps occupying scarce disk space on Amazon's servers, and a small fee for every time they want to read from or write to the server, as this necessitates some computation. This data is managed by a single company (in this case, Amazon) and typically stored in giant data centers filled with powerful computers. This allows Amazon to leverage economies of scale and offer extremely competitive pricing for storage and computation.

On the other hand, blockchains do decentralization by replication, not distribution. This means that hundreds or thousands of nodes scattered worldwide store and compute blockchain data. As already pointed out, this data isn't distributed across many nodes, where each node would process different pieces of data but instead replicated across multiple nodes, meaning each node stores and processes the same data. This architecture, typical of blockchains, ensures robustness, and features like permissionlessness and censorship resistance, but it's also a grossly inefficient way of managing data.

This means that blockchains don't scale with new nodes joining the network, unlike centralized data centers that scale linearly with every new computer employed. Instead, the storage and computing capacity of the entire network is always equal to that of a single node. This means that the only way to scale a blockchain on the base layer is to impose higher resource requirements on the full nodes processing the chain, raising operating costs. Doing this, however, means that fewer people can afford to run full nodes, negatively impacting the decentralization of the blockchain. This is obviously unacceptable, as decentralization is the whole point of blockchains.

This is where state explosion starts becoming an evident problem. As the number of blockchain transactions, accounts, and smart contracts grows, so does the blockchain’s state, imposing an ever-increasing burden on full nodes. The bigger the blockchain size, the bigger the computing, storage, and cost requirements for full nodes to process the chain, meaning slower processing times, fewer nodes, and weaker decentralization. And while the current full node storage requirements of 500 gigabytes and one terabyte for Bitcoin and Ethereum, respectively, don’t seem as absurd at the moment, this is only because these chains haven’t solved the scalability problem and reached true mass adoption.

The Tragedy of The Commons

Considering the above, it quickly becomes evident that there’s a significant imbalance in the design of most major blockchain systems.

Namely, every single full-node operator has to pay for this state size increase by obtaining the proper hardware and the resources necessary to maintain and keep it running. This wouldn't be an issue if the users duly compensated the miners or validators for storing their data–just like they pay centralized data centers–but this isn't the case. Instead, blockchain users pay only a one-time transaction fee and, in return, get permanent usage rights to a robust storage system distributed across countless nodes, which collectively bears the costs for storing this data in perpetuity.

The previously indicated design imbalance is witnessed in the economic mismatch between what users pay and what they receive. Namely, blockchain nodes use three types of resources to process transactions: CPU, network bandwidth, and disk space. The CPU and bandwidth are intermittent resources, meaning they’re expanded every time a node has to compute or verify a block and broadcast it to the network. For this kind of replenishable resource, it makes sense to compensate the nodes with a one-time transaction fee, whose value is based on the cost of the computation and bandwidth resources it consumes.

However, disk space is a long-term occupied resource, meaning it’s permanently occupied by one user’s transaction or smart contract and cannot be used by another user later unless released by the previous owner. This means that users’ data perpetually occupies nodes’ disk space while only compensating them with a one-time transaction fee. This leads to a phenomenon called the “tragedy of the commons,” which refers to a situation where people with unrestricted access to a shared resource will tend to over-use and deplete it to the detriment of everyone, including themselves.

CKB: Fixing the Underlying Incentives Through State Rent

From the above, it becomes clear that blockchains aiming to be decentralized, sustainable, and future-proof need to be designed with addressing state explosion in mind. That being said, many theoretical solutions to the blockchain bloat problem have been proposed, with only a few being implemented in practice.

One of these is Nervos’ Layer 1, Common Knowledge Base (CKB), which solves the blockchain bloat issue by bounding the size of the blockchain state and imposing targeted state rent.

CKB was designed for modularity from the beginning, meaning it was never a goal to scale on Layer 1 but instead to leverage a multi-layered architecture and offload transaction execution off-chain. This ensures that the Layer 1 can be optimized for value preservation rather than transaction execution, which significantly reduces the amount of data that must be stored by CKB full nodes locally.

CKB puts a hard cap on state growth by tying data storage to its native token, CKByte (CKB). CKBytes represent a right to expand the global state, where one CKByte equals one byte of space on the blockchain. This means the blockchain’s state is bounded by the token supply, making it a scarce resource.

As of October 2023, CKByte's total supply is around 43.2 billion, meaning the chain’s storage capacity is capped at around 43.2 gigabytes. The inflation rate in the four years after the first halving (due in November 2023) will be around 3.444 billion CKB, meaning the blockchain’s size will only be able to expand by about 3.444 gigabytes annually. Future halvings (every 4 years) will further reduce the growth rate, eventually reaching a tail emission of 1.344 billion CKB/year.

State on CKB is directly owned and controlled by users, it is effectively privatized. State is stored in cells—first-class citizens similar to UTXOs in Bitcoin. Storage space on CKB effectively becomes like land; its total size is limited, and users must own and lock CKB to occupy it.

Understanding how state rent is imposed on CKB requires an understanding of CKB issuance.

While CKB has a “primary issuance” mechanism which undergoes halvings and is directly solely toward miners, there is also a “secondary issuance,” 1.344 billion CKB annually (in perpetuity), which is split between miners, depositors in a protocol-level smart contract (the NervosDAO) and a treasury fund (currently being burned).

When a user elects to utilize their CKB to store data on-chain, a small (proportional) portion of issuance is directed to miners. Instead of mandating periodic rent payments from state occupiers to compensate miners, CKB imposes rent through targeted inflation. This slight dilution represents the “inflation tax” or the ongoing rent that state occupiers pay miners for their ongoing services.

Long term CKB holders (not storing data on-chain) can lock their CKB in the NervosDAO, to receive this portion of secondary issuance (nullifying dilution). When the “state tenants” no longer need to utilize the scarce state space, they can consume the cells that occupy it (state pruning), release the locked CKB tokens, and deposit them into the NervosDAO to stop paying the state rent.

In summary, CKB solves the state bloat issue by (i) adopting a multi-layered architecture, reducing the data stored on-chain by moving transaction execution off-chain, (ii) limiting state growth by tying it in with the native token, (iii) making state a first-class citizen on the blockchain, effectively privatizing it, enabling state pruning and an economy around state (iv) implementing state rent through targeted inflation, which solves the common economic misalignment of users paying a one-time fee to occupy state on-chain forever.

To learn more about CKB, visit nervos.org.

This post is commissioned by Nervos and does not serve as a testimonial or endorsement by The Block. This post is for informational purposes only and should not be relied upon as a basis for investment, tax, legal or other advice. You should conduct your own research and consult independent counsel and advisors on the matters discussed within this post. Past performance of any asset is not indicative of future results.

Latest Crypto News

US regulators miss GENIUS Act's one-year deadline for final stablecoin rules

Bitcoin Japan, which holds no bitcoin, taps EVO Fund in planned $60 million raise to finally buy some

UNI burn poised to grow as Uniswap governance votes on v4 fees and Robinhood Chain expansion

Bank of America taps new leaders to bridge crypto, AI and traditional finance