Bitzuma

How Bitcoin Works: A Technical Overview for Beginners

By Rich Apodaca | Updated

If you plan to spend significant time or money on Bitcoin, few investments will pay greater dividends than learning how the entire system works at a high level. This extended guide offers a unique, visual approach to thinking about how and why Bitcoin works. No experience with cryptography or programming is necessary.

This article is based on the first chapter of the e-book Owning Bitcoin.

Simplifying Bitcoin

It’s easy to get lost in Bitcoin’s apparent technical complexity. To cut through the confusion, it’s helpful to recognize that everything that follows can be traced to one of four technologies, all of which were created years before Bitcoin’s introduction:

  • public key cryptography, a system for secure data exchange first publicly described in the late 1970s;
  • secure hash-based identifiers, the first generation of which was also implemented in the late 1970s;
  • peer-to-peer networks, a way of connecting computers that gained widespread public attention in the late 1990s through the Napster file sharing service;
  • proof-of-work, a spam deterrent first described in 1997.

Satoshi Nakamoto’s 2008 white paper introducing Bitcoin is a masterpiece of clarity. In just nine pages, it describes a new form of algorithmically-governed money that needed no trusted custodians.

But the white paper was written for an audience of experts. As such, it leaves out a great deal of vital background information. This article fills in the gaps. After reading it, you’ll understand Bitcoin’s technical underpinnings and will have a solid foundation for learning more advanced topics.

Electronic Cash

People have used cash systems for centuries, trading physical tokens for goods and services. Until Bitcoin, most of us only ever used cash tokens made of paper and metal. Breakthroughs in computers and computer networks make a new kind of token possible: electronic cash.

Electronic cash is a system of money based on tokens created not from paper and metal, but from digital data. Like the physical tokens after which they’re modeled, digital tokens can be exchanged for goods and services. With no physical presence, however, digital tokens offer many practical advantages.

SVG Image
Cash Transaction. Alice gives Bob one coin in exchange for one apple. This exchange of value requires no middleman.

Imagine that Alice wants to buy an apple from Bob’s fruit stand using physical cash. After agreeing on a price, Alice gives Bob a coin as payment. In return, Bob gives Alice the apple. After this exchange, Alice has a new apple, and Bob has a new coin. Notice that Alice and Bob traded directly with each other. They needed neither a broker nor bank.

Now imagine that Alice once again wants to buy an apple from Bob’s fruit stand, this time paying with electronic cash. After agreeing to a price, Alice needs to give Bob an electronic token as payment. One reason to trade this way might be convenience. Neither party would need to worry about transporting or storing physical cash. Another reason might be to trade remotely and at scale. For example, Alice could directly pay Bob for one apple or a truckload of apples from a distance.

For this idea to work, Alice and Bob need to answer some tough questions, including:

  • Who mints digital tokens and how?
  • How are digital tokens distributed after minting?
  • How does the owner of a digital token give it to another person, and what prevents the original owner from taking it back?
  • What happens when the value of a digital token exceeds the value of a good or service being bought?
  • What prevents the total value of all tokens from growing uncontrollably?
  • What protects digital tokens against theft, forgery and duplication?
  • Who keeps this system running and what keeps these people honest?

Bitcoin is often mistaken as just another electronic payment system like Visa or PayPal, but there are two critical differences. First, these other systems don’t use electronic tokens as a medium of exchange. Instead, a value representing local currency is deducted from a customer account and credited to that of a merchant. Second, these other systems require multiple levels of trusted parties between buyer and seller to prevent fraud.

Electronic cash isn’t a new idea. The early days of the Internet saw a few research efforts dedicated to developing electronic cash systems. All of them failed. The idea that a scarce digital token could be securely transferred, person-to-person, in exchange for goods and services began to seem like a mirage.

Milton Friedman on Electronic Cash (1999). Many foresaw the possibility of electronic cash, but it would take years to develop the technology.

By 2008, the pessimists had won. Electronic cash seemed so unworkable that Satoshi Nakamoto’s white paper on the subject was mostly ignored by the experts at the time.

Digital Coins

Electronic cash is a system of money based on tokens made of digital data. Although these tokens lack physical form, they nevertheless posses well-defined properties. These properties are embodied within the idea of a digital coin.

A digital coin, referred to throughout this article as simply a “coin,” is a packet of data assigned a fixed face value and equipped with security features. These properties allow digital coins to be given to others, accepted as payment, stored for future use, and even lost or stolen.

Most metal coins are marked with a fixed number representing face value. Face value is a number expressed in terms of a currency unit. For example, a quarter dollar is a metal token with a face value of 25 cents (¢). “Cent” is the currency unit in which the quarter dollar is denominated.

Like a metal coin, a digital coin carries a fixed face value. However, this face value is denominated in the unit “bitcoin” (spelled with a lower case “b”). The abbreviation for this unit used here will be “฿” but others are available. In this article, the unit “bitcoin” is used as a mass or “uncountable” noun, meaning that only its singular form is used. For example, one says that Alice bought 1.3 bitcoin, not that she bought “1.3 bitcoins.” Another unit, the satoshi, is one hundred million times smaller than bitcoin (฿0.00000001). This unit is considered a countable noun.

Some sources incorrectly use the words “coin” and “bitcoin” interchangeably. A coin is a digital token, but bitcoin is the unit of face value in which the token is denominated. Just as “quarter” is the correct way to refer to a metal token valued at 25 cents, so is “coin” the correct way to refer to a digital token with any face value.

SVG Image
Point of Confusion. A “coin” is a digital token, but “bitcoin” is the unit of face value associated with it.

Digital coins are created in a wide assortment of denominations. For example, one coin might be marked with a face value of ฿1.344455. Another coin might be marked with a face value of ฿1.01. A third coin’s face value might be ฿0.00009431, and so on. At the other end of the scale, a single coin can carry a face value equal to the entire money supply, although such a coin is unlikely to ever occur in practice. Manufacturing and storage costs limit the denominations in which metal coins are minted, but digital coins avoid this limitation.

Multidenominational. Due to manufacturing and handling costs, US banknotes and coins are printed in a limited range of denominations. Electronic coins have no such constraints on face value.

A coin becomes useful, and therefore valuable, through a combination of scarcity and transferability. Assigning a face value to a coin is an important first step, but still not enough to create electronic cash. To understand how and why digital coins can be used as a medium of exchange, it’s important to understand how they’re secured.

Chain of Ownership

Physical cash systems rely on obscure materials and manufacturing processes to prevent counterfeiting. Although electronic cash systems can’t use these techniques, they can take advantage of the unique properties of digital information. To establish the authenticity of a coin, electronic cash systems use a chain of ownership.

A chain of ownership is a sequential list of every previous owner of a given coin. On creation, a coin’s chain of ownership lists a single owner – the issuer. Transferring ownership causes a new owner to be added to the list. Each transfer extends the chain of ownership by one entry. The last entry names the coin’s current owner.

SVG Image
Chain of Ownership. Control of a digital coin passes from one owner to the next through a chain. Bob (right), the coin’s current owner, confirms the authenticity of a coin Alice (center-right) gives him by tracing its previous owners back to a recognized issuer (left). Each transfer extends the chain by one entry.

When a user receives a digital coin, its authenticity can be verified by consulting its chain of ownership. The coin is only considered authentic if its chain of ownership can be traced to a recognized issuer. A broken, damaged, or forged chain of ownership invalidates the coin.

A similar idea comes from the world of art. The authenticity of a work can be established in part through its provenance, or chronology of owners. A potential buyer of a work can review previous ownership records back to the original artist, cross-checking this information with other historical sources. An intact provenance supports claims regarding the origin of a work, often increasing its value. Similarly, a fragmented or missing provenance raises questions about a work’s authenticity.

Provenance. Sotheby’s considers provenance as one of ten criteria when valuing art.

Clearly, a user who can forge a chain of coin ownership can create money out of thin air. For this reason, electronic cash systems need ways to allow users to detect fake chains of ownership. As we’ll see later, a significant portion of the Bitcoin network’s activities revolve around addressing this problem.

Chain of ownership as described here only works if three fundamental problems can be solved:

  • Each user need an unambiguous identity that can be added to a coin’s chain of ownership.
  • A payer needs to securely update the chain of ownership when giving a coin to a payee.
  • The issuer of a coin must be chosen in a transparent, fair, and secure way.

The next several sections describe solutions to the first two problems. The solution to the last problem is described in the final section.

Public Key Cryptography

A coin’s authenticity can be verified with a chain of ownership. However, doing so requires that users identify themselves and securely update the chain of ownership. One approach is to recast these requirements as a messaging problem. In other words, a user must create a message expressing intent to transfer ownership of a coin to another user. The message must be unforgeable, non-repudiable, and verifiable. Such messages can be created with a set of tools collectively known as public key cryptography.

Public key cryptography is a message authentication system that allows users to detect tampering and forgery. The sender of a message generates two mathematically linked keys: a private key to be kept secret; and a public key given to other users. The author of a message signs it with his or her private key. Recipients authenticate the message signature using the sender’s public key. Authentication fails if either the message or signature are modified. Chapter 2 of Owning Bitcoin describes Bitcoin’s public key cryptography system in detail.

SVG Image
Public Key Cryptography. Alice (top) digitally signs a message for Bob (bottom) using a signature algorithm (middle). First, Alice generates her public key from her private key and gives her public key to Bob. Using her private key, Alice then generates a signature. Finally, Bob uses Alice’s public key to verify her message’s signature.

Alice and Bob can use public key cryptography to authenticate messages passed between them. Imagine that Alice wants to send a message to Bob telling him she’ll stop by his fruit stand at 10:00 AM to pick up a crate of apples. However, Bob suspects Chuck, his unscrupulous competitor, of sending him bogus customer messages to waste his time. After discussing the problem, Alice and Bob agree to use public key cryptography. The goal isn’t secrecy, but rather to ensure that Chuck can’t impersonate Alice.

Alice begins by giving Bob her public key during a visit to the fruit stand. Later that week, Alice composes a text message proposing a meeting, signing it with her private key. She sends the message and signature to Bob. Bob then uses Alice’s public key to verify the signature. If verification fails, Bob knows the message is fake.

SVG Image
Message Tampering. Any change to Alice’s message invalidates the signature she gave Bob.

Even if Chuck intercepts Alice’s message, he can’t change it without invalidating the signature. Altering just one character in the message creates a mismatch. The same thing happens with the slightest alteration to the signature. Chuck could decide to throw away Alice’s signature, edit the message, and sign with his own private key. However, the deception would easily be detected by Bob, who would notice that the signature doesn’t match Alice’s public key.

Public key cryptography can secure many kinds of messages, including financial transactions. For example, Alice and Bob can use a signed message to transfer ownership of a coin as payment for a delivery. Alice begins by asking Bob for his public key. She then drafts a message naming Bob as the new owner of the coin, and signs with her private key. Alice’s signed message demonstrates to all observers that Bob now owns the coin.

SVG Image
Giving a Coin with a Signed Message. Alice composes and signs a message giving her coin to Bob, who she identifies by his public key.

Eventually, Bob might decide to use the coin to pay Carol, an employee. He begins by asking Carol for her public key. Bob drafts and signs a message of his own that names Carol as the new coin owner. He then gives Carol this message together with Alice’s old message as payment. Carol verifies the signatures of both messages using the public keys of Alice and Bob, respectively.

SVG Image
Chain of Ownership with Signed Messages. Bob uses a signed message to pay Carol (right) with a coin he previously received through Alice’s signed message (left). The two messages constitute a primitive chain of ownership.

Using both messages together makes it possible to deduce that Alice gave Bob a coin that he then gave to Carol. In other words, the two messages define a weak chain of ownership. Strengthening this system requires that Bob’s message refer to Alice’s message, among other improvements.

Transactions

Public key cryptography allows authorship of a given message to be proven. However, this technology can’t by itself create a secure chain of ownership. This problem can be solved by replacing freeform messages with transactions.

A transaction is a message encoded in a standard format that transfers ownership of a coin from one user to the next. Each coin explicitly references a previous coin, creating a chain. A user authorizes ownership transfer by signing with a private key. Software can quickly and efficiently process transactions because each one uses the same well-defined format.

SVG Image
Simplified Transaction Chain. Bob pays Alice with a coin valued at ฿2 (right). To do so, he spends the coin previously given to him by Alice (left), which in turn references a previous coin. The two transactions define a chain of ownership. A payment explicitly references the coin being spent, as denoted by the direction of arrowheads. This convention is used throughout this article, but Satoshi’s white paper reverses it.

This method of ownership transfer requires a method for uniquely identifying each coin. Only then can the next transaction refer to the coin being spent. Bitcoin solves this problem by giving each coin a unique identifier called an outpoint. The way in which outpoints are assined will becom clear shortly. For now, the most important things to remember are that changing the transaction (by changing the value of a coin, for example), changes the outpoint, and that outpoints can be computed independently of any central authority.

Imagine that Alice wants to pay Bob for a crate of apples using a transaction instead of a freeform message. She begins by obtaining the outpoint of the coin coin she wants to give to Bob. Alice then requests a public key from Bob. Combining these two pieces of information, Alice drafts and signs a transaction. She then gives Bob the full chain of ownership for her coin as payment. Bob reviews the chain of ownership for completeness, then verifies all of the signatures. Finding no errors, Bob accepts Alice’s payment.

Explicitly including the outpoint of each coin being spent in each transaction makes the chain of ownership tamper-evident. For example, modifying any intermediate transaction changes the coin’s outpoint, breaking the reference from its child. In a similar manner, the insertion or deletion of owners within the chain can also be detected.

SVG Image
Tamper-Evident Chain of Ownership. Modifying the coin Bob is trying to spend by chaning the recipient (left) changes its outpoint, breaking the link to Bob’s transaction (right).

With an understanding of the close relationship between chain of ownership and outpoints (coin identifiers), it’s now time to take up the question of how to generate them. The next three sections describe some basic math to get us there. It turns out that this math is used throughout Bitcoin’s design as well.

Numerical System

Transactions can be organized into a chain of ownership only if each coin possesses a unique, content-dependent identifier. Such identifiers are useful not just for chains of ownership, but for many other elements of electronic cash systems. The first step to generating these identifiers is to choose a numerical system.

A numerical system represents a value using a sequence of digits. Place-value notation is a familiar numerical system in which the relative position of a digit determines the maximum value it can contribute. This contrasts with other numerical systems, such as Roman numerals and tally marks, which lack the idea of place-value. Numerical systems play a critical role in the creation and manipulation of unique identifiers.

SVG Image
Decimal Representation. Each digit to the left encodes the next higher power of ten.

The most familiar place-value numerical system, based on increasing powers of 10, is known as decimal representation. A number such 1,234 is represented by four digits whose contribution to the total increases from right to left. The digit 4 at the right adds four times ten to the zero power to the total (4×100). Recall that any number raised to the zero power is one. The digit 3 one place to the left adds thirty to the total (3×101). The digit one place more to the left, 2, contributes 200 to the total (2×102). Finally, the leftmost digit, 1, contributes 1,000 to the total(1×103). Each digit contributes more to the total value than the one to its right.

Decimal representation is convenient for humans who have ten fingers, but less so for digital computers. Instead of ten fingers, a computer has only two fundamental properties to count with: “off” and “on.” A numerical system based on the number two isn’t just possible, but can be very efficient as well. This system is known as binary representation.

SVG Image
Binary Representation. Each digit to the left encodes the next higher power of two.

In binary representation, the value of digits increases in powers of two from right to left. The rightmost digit adds ones (20). The digit immediately to the left adds twos (21), and the next digits add fours (22), eights (23), sixteens (24), and so on. A binary digit (either 0 or 1) is also known as a bit.

Consider the four bit number represented as 0b1101. To avoid confusion with decimal representation, binary representation uses the prefix 0b. Examples include: 0b01; 0b1011; and 0b101010. The rightmost bit of the binary number 0b1101 adds one to the total (1×20). The digit 0 one place to the left adds zero (0×21). One more place to the left, the digit 1 adds four to the total value (1×22). The leftmost digit, 1 adds eight to the total (1×23). Summing the value contributed by each bit gives a total of 13. In other words, the value of the binary number 0b1101 is equal to the value of the decimal number 13.

Binary representation solves the counting problem for computers, but can produce long sequences of digits that are cumbersome to work with. In the example above, the value represented by two decimal digits (13) required four binary digits (0b1101). Unique identifiers require very large values, so a more compact numerical system would be useful. One system that combines the brevity of decimal notation with binary compatibility is known as hexadecimal representation.

SVG Image
Hexadecimal Digits. Hexadecimal representation is based on the number sixteen, a power of two. This correspondence yields a close relationship between binary and hexadecimal representations.

Hexadecimal representation is a place-value numerical system based on the number 16. Hexadecimal digits include all of the valid decimal digits 0 through 9. Additionally, hexadecimal digits include the six letters a through f. The digit a represents the decimal value 10, the digit b represents the decimal value 11, and so on up to the digit f, representing the decimal value 15. Capitalization of these letter digits is optional and has no significance.

To avoid confusion with values expressed in other numerical systems, hexadecimal representation uses the prefix 0x. Usually, odd digit sequences are padded with a leading 0. Examples of hexadecimal representation include: 0x02; 0x22; and 0x20f0.

SVG Image
Hexadecimal Representation. Each digit to the left encodes a higher power of sixteen.

Consider the conversion of the four-digit hexadecimal representation 0xabcd into decimal representation. The rightmost digit, d, contributes 13 (13×160) to the number’s value. The next digit to the left, c, contributes the decimal value 192 (12×161) to the total. One more place to the left, b, contributes a decimal value of 2,816 (11×162). The leftmost digit, a, contributes a decimal value of 40,960 (10×163). Adding the individual contributions of each digit gives a total decimal value of 43,981.

The close relationship between hexadecimal and binary becomes apparent by recalling that the decimal number 16 is itself a power of two (24). In other words, four binary digits (bits) can represent all 16 hexadecimal digits. A number in binary notation can be converted to hexadecimal notation by dividing its bit sequence into groups of four, starting from the right. For example, the bit sequence 0b1101 (decimal 13) can be represented as the single hexadecimal digit d.

SVG Image
Byte. A block of eight sequential bits forms a byte, which can also be represented with two hexadecimal digits.

A group of eight bits, or two hexadecimal digits, has special significance and is most commonly known as a byte. For example, the one-byte binary number 0b11100010 can be represented in hexadecimal notation as 0xe2. As a measure of data storage and network bandwidth, bytes have worked their way into everyday language through the units megabyte (one million bytes) and gigabyte (one billion bytes).

Endianness. A multi-byte numerical value (0x0a0b0c0d) may be stored using either a big-endian (left) or little-endian (right) convention.
Value Big Endian Index Little Endian Index
0x0a 0 3
0x0b 1 2
0x0c 2 1
0x0d 3 0

A computer storing a number as a sequence of bytes needs to decide which endianness to use. Endianness refers to the order in which bytes appear in an indexed storage system such as computer memory. Big-endian systems place the most significant byte at the lowest index. The most significant byte is the one that contributes the most to the overall value. For example, the most significant byte of the hexadecimal number 0x01ff is 0x01. In a big-endian system, this byte would be placed at index i and the next most significant byte, 0xff would be placed at index i + 1. Little-endian notation reverses this convention, with the most significant byte stored at index i and the next most significant byte stored at index i - 1. In both big-endian and little-endian systems bit order follows byte order. Both big-endian and little-endian storage may present in an electronic cash system. A good analysis of bit and byte ordering has been published.

Numerical systems and text are linked through a character encoding. A character encoding is a two-way correspondence between a set of characters and a set of numerical values. Character encodings have a long history and can trace their origins to such systems as Braille and Morse code.

SVG Image
ASCII. English letters, common symbols, and control characters are mapped to seven-bit numbers. For this reason, the highest hexadecimal value for an ASCII symbol is 0x7f (127 decimal).

One of the most widely-used numerical character encodings is the American Standard Code for Information Interchange (ASCII). ASCII represents the letters and common symbols of the English alphabet as a seven-bit numerical code. This code conveniently lies within a byte’s 8-bit capacity with one bit to spare. ASCII is a subset of the more universal standard Unicode. Character encodings like ASCII and Unicode allow both numbers and text to be used interchangeably within numerical procedures.

Hash Functions

The security of an electronic cash system increases by linking messages such as transactions together through unique IDs. Centralized registries, such as the kind operated by various governments, offer one way to issue unique IDs. However, the potential for corruption and service outage — not to mention bureaucratic inefficiency — makes centralized registries unsuitable for electronic cash. A better solution is available through hash functions.

A hash function accepts an arbitrary message as input and produces a numerical hash value as output. The message can be composed of any binary data of any length, including text, images, audio, and video. The hash value is deterministically selected from a large range and appears to be random to all observers. As such, hash functions can generate a unique identifier for any piece of digital data. Hash values are most commonly expressed in hexadecimal notation.

SVG Image
Hash Function. A message of any length (left) is converted to a numerical hash value (right) via an imaginary hash function (center).

Hash functions can be understood more concretely by thinking in terms of a random oracle. A random oracle consists of a box containing a gremlin, a book, a pencil, a stack of index cards, and a metal coin. Into one side of the box is cut an input slot through which a user can slide a message written on a piece of paper. Opposite this slot is cut another slot through which the gremlin slides a result written on one of the index cards. Each page of the book is divided with a vertical line; to the left is the heading “Message” and to the right is the heading “Result.”

SVG Image
Random Oracle. A box that generates random output for new messages and consistent output for previously-seen messages. Inside the box is a gremlin that uses coin tosses to generate random output and a book of previous messages to ensure consistency.

The oracle’s purpose is to produce what appears to be random output for new messages, but the same output whenever a message is resubmitted. To this end, the gremlin uses the following procedure. When a new message is received through the input slot, the gremlin consults the book. For each page in the book, the Message column is scanned for a match to the message that was just received. If a match is found, the gremlin transcribes the corresponding entry in the “Result” column to an index card, and pushes the card through the output slot. If the message is not found, the gremlin opens the book and creates a new entry after the last. Under the “Message” column, the gremlin transcribes the message. Then the gremlin takes a new index card and begins a series of 16 coin tosses. For each heads toss, the gremlin appends a “1” to the card. For each tails toss, the gremlin appends a “0”. After completing its coin tosses, the gremlin transcribes the result from the index card into the book under the “Result” column, and pushes the card through the output slot.

Imagine that Alice wants to test the oracle with a series of challenges. First, she writes the message, “To the moon!” and inserts it through the input slot. The oracle responds with an index card on which is written a sequence of 16 ones and zeroes. Using hexadecimal notation, Alice converts the result to the 16-bit integer 0xf1c3. Every time Alice submits the “To the moon!” message, she receives the same result from the oracle: 0xf1c3. Alice then writes a second message, “We set sail on this new sea.” and inserts it through the input slot. The oracle responds with an index card on which is written a binary sequence equivalent to the 16-bit integer 0x2a06. Each new message results in a different result, but Alice can decipher no pattern to the output.

Like the random oracle, hash functions offer two properties that make them attractive for issuing unique IDs:

  • Consistency. The same combination of hash function and message will always produce the same output. This ensures that a given piece of digital data always produces the same hash value output.
  • Fixed-width output. A hash function will always produce output within a constant numerical range. As such, the storage requirements for a hash value are always known.

Taken together, these two properties allow a hash function to assign a permanent, fixed-width identifier to any piece of digital data.

Although the random oracle appears in many interesting and useful thought experiments, it exists as fantasy only. What’s needed is a way to mathematically replicate the behavior of the random oracle. Fortunately, the last few decades of cryptographic research have produced several hash functions to choose from. Even better, the principles underlying hash functions can be ignored for most intents and purposes. Those interested in the details might consult this excellent reference.

SVG Image
Hash Collision. Two different messages yield the same hash value.

When used to issue unique IDs, a hash function must avoid generating collisions. A collision occurs when two different messages yield the same hash value. Collisions break the fundamental requirement of ID uniqueness, weakening the security of the electronic cash system.

SVG Image
Range. Increased range (right) gives more possible hash values, and in principle, a lower collision frequency.

One way to minimize hash value collisions is to increase the output range. Range refers to the largest value that a hash function can produce, typically measured in bits. For example, a hash function capable of 16-bit output can produce at most 65,536 (216) hash values. Although widening the output range can decrease the collision rate, adding bits increases storage and transmission costs. The best hash function offers a good tradeoff between storage costs and collision rate.

SVG Image
Uniformity. A uniform hash function (right) spreads hash values evenly over the available range.

Another way to reduce a hash function’s collision rate is to maximize uniformity. Uniformity refers to how evenly distributed hash value are. For example, a hash function capable of 32-bit output that consistently produced a single value would have very poor uniformity despite a large range. Any two messages would produce a collision 100% of the time, even though a much lower collision rate is feasible. To take full advantage of its output range, a good hash function ensures the widest possible distribution of values.

Regardless of the quality of a hash function’s output, all are subject to limitations imposed by the birthday paradox. Specifically, an ideal hash function such as the random oracle will generate a collision between two random messages in roughly one out of every 2n/2 attempts, where n is the number of bits in the output value. In the case of a hash function returning 16-bit output, a collision can be expected once every 216/2 (256) attempts. At best, adding one bit to the output of a hash function decreases collision frequency by a factor of √2.

In addition to collision resistance, a good hash function resists preimage attacks. Here, an attacker tries to find a message whose hash value matches that of a known message. An attacker for whom a preimage attack becomes practical can rewrite messages linked together via their hash values. Whereas collision attacks benefit from the birthday paradox, preimage attacks do not. In other words, an ideal hash function producing 16-bit output will require around 65,536 attempts before succeeding.

SVG Image
Preimage Attack. A message chain can be forged by finding a new message (B’, red) with the same hash value ID as one of the chain’s members (B, white). An outside observer will consider chains A-B-C and A-B’-C to be equivalent.

Examples of hash functions generally regarded as secure include SHA-256 and RIPEMD-160. SHA-256, developed by the US National Security Agency (NSA), yields hash values 256 bits (32 bytes) wide. RIPEMD-160, developed by an academic group from Belgium, yields hash values with a width of 160 bits (20 bytes). For this reason, both SHA-256 and RIPEMD-160 are widely-used to issue unique IDs in electronic cash systems.

A bit width of only 256 may at first appear woefully too narrow to support unique identifiers. It helps to consider the magnitude of this number in relation to a familiar reference point. The value 2256 is equal to approximately 1077 when expressed in decimal format, or the number “1” followed by 77 zeroes. This number is so vast that just counting that high with an extremely efficient computer would consume the combined energy output of the sun for many centuries. In other words, it’s impossible to even enumerate the values between one and 1077, much less execute a hash function that many times.

A good way to understand how hash functions work is to experiment with them interactively. One resource for doing so is the SHA-256 Online calculator. This tool supports a wide range of hash functions, including SHA-256 and RIPEMD-160.

For simplicity, the rest of this article will use short decimal numbers when discussing hash-based identifiers. Keep in mind, however, that these identifiers actually represent the output of a secure hash function.

Merkle Trees

Hash functions solve the important problem of uniquely identifying a given digital message. Sometimes, however, a collection of messages needs to be identified and referenced as an ordered list. This problem can be solved through Merkle trees.

A Merkle tree is a tamper-evident data structure that allows a collection of messages to be identified and queried as a single unit. Adding, removing, editing, or reordering the collection’s members produces detectable changes. Merkle trees are used extensively in electronic cash systems as a way to ensure the integrity of a list of messages.

SVG Image
Merkle Tree. Three items (A, B, and C) are composed into a Merkle tree. Items at each level are paired and concatenated (“|”). The concatenation is then hashed. Unpaired items are paired with themselves (e.g., C).

Construction of a Merkle tree follows a simple procedure. First, the list of messages is sorted in some way. For example, they may be sorted in order of receipt. Next, the hash value of each message is computed. This yields the bottom level of the tree. In order of appearance, each hash value is paired with its successor. If no successor exists due to odd level membership, then the last hash value is paired with itself. Appending the second member of a pairing to the first and hashing the result generates the next level.

Pairing and hashing continues until only one hash value remains — the Merkle root. The Merkle root is the top level hash value of a Merkle tree. Editing the collection of messages on which a Merkle tree is based changes the corresponding Merkle root. Even reordering messages changes the Merkle root. Given a Merkle root, a list of messages can be decompose, individually transmitted, and reassembled. The receiver can use the Merkle root to detect any changes during transmission.

The utility of Merkle trees may not be readily apparent. After all, a unique identifier for a collection can be generated naively by concatenating all members together and hashing. The resulting value will change if the underlying collection changes in any way.

This approach works best when lists are small and the entire list is required. However, these two conditions rarely hold in electronic cash systems. Often, we’d like to prove that a small number of messages belong to a much larger list. The reasons will become clear in later installments of this series.

For now, imagine trying to prove that a message belongs to a very large list using the naive approach. The only way to do so would be to request every message and confirm that the one to test is among them. Hashing the collection and comparing the result with its identifier would demonstrate that the message was a member. The main problem is that the expense of performing this test increases linearly with collection size. Although small collections could be handled with ease, larger collections may become impractical to query.

SVG Image
Merkle Path. The bottom level contains hashes of seven items in a list. To prove the membership of item A, only its hash value and three others are needed (B, C, D). Computed values (orange) complete the Merkle path.

Merkle trees offer far more efficient membership checking when used together with a Merkle proof. A Merkle proof consists of the hash values needed to construct the branch (or path) of a Merkle tree containing a message of interest. Instead of requiring all items of a collection to test for membership, a Merkle proof can be completed with only some of them.

Merkle proofs for a given message are constructed using a modification of the procedure for constructing the full tree. First, the messages of a collection are sorted and paired as before. The hash value for the sibling of a message of interest is then added to the proof. The tree is then walked upward. At each level, the missing hash value is added to the proof until the Merkle root itself is added. These hashes, together with the index of the item of interest, are conveyed as the proof.

Given a Merkle proof and message, membership can be tested as follows. First, the message is hashed. The resulting hash value is then concatenated with the bottommost sibling hash value of the proof. The item’s index reveals whether it’s the right or left item of the pairing. The concatenation is hashed, yielding the parent hash value. Concatenation and hashing continues up the chain. Obtaining the Merkle root in the last step proves the message’s membership in the list.

Hash functions and Merkle trees enable messages to be grouped efficiently and securely. Many sophisticated behaviors that electronic cash systems exhibit can be traced to this simple principle.

Inputs and Outputs

Physical cash payments rarely involve just a single coin or banknote. Instead, value is combined by merging multiple tokens and split with change. Informed by this everyday observation, Bitcoin transactions can be made more useful by supporting inputs and outputs.

A transaction output represents a specific quantity of monetary value combined with a chain of ownership. In this sense, an unspent output behaves like an electronic coin. A transaction may contain one or more outputs, allowing value to be split as needed.

SVG Image
Transaction Inputs and Outputs. Alice spends three coins (inputs) with a total value of ฿6, producing two new coins (outputs) valued at ฿5 and ฿1, respectively. Each input bears Alice’s signature and the outpoint of a single coin to be spent. The transaction ID is computed as the hash value of the transaction.

A transaction input spends the value stored at a previously unspent transaction output. More than one input may be added to a transaction, allowing the value of coins to be combined if necessary.

Inputs fund a transaction, whereas outputs spend the funds. The combined value of all transaction inputs must be greater than or equal to the combined value of all outputs. Otherwise, value could be created arbitrarily.

An outpoint was previously described as a unique coin identifier. It’s now possible to reveal the composition of this identifier. An outpoint is composed of two values:

  1. The 256-byte hash value of the hosting transaction. Because these values require 32 hexadecimal digits each, they’ll be abbreviated here - for example: 0xfa2...291. This value is also known as the transaction identifier.
  2. The zero-based index of the output within its parent transaction (e.g., the first output has an index of 0, the second has an index of 1, and so on).

In other words, each coin can be referenced a unique identifier of the form (i, j), where i is the hash value of the hosting transaction and j is the zero-based index of the output within the transaction.

Previously, Alice used a signed transaction to pay Bob for a crate of apples. This example can now be updated to account for the presence of transaction inputs and outputs.

Alice begins by finding the outpoint of a coin she wants to spend. Imagine this turns out to be a ฿2 payment from her friend, Carlos. Alice adds an input to her transaction that references this coin’s outpoint. Under our abbreviated system, the full outpoint designation might be something like (0xa01...2af, 0).

SVG Image
Receiving Change. Alice redeems a ฿4 coin by creating a transaction that splits it into a ฿3 coin for Bob and a ฿1 coin for herself as change (right). The coin’s previous owner, Carlos, used a similar procedure (left).

Having defined her transaction’s inputs, Alice turns her attention to its outputs. Imagine that the crate of apples Alice is buying costs ฿3. Using Bob’s public key, Alice adds to her transaction a ฿3 output paying Bob. However, this leaves ฿1 in unclaimed value. To recover it, Alice adds a second output, payable to her own public key, with a value of ฿1. Alice signs her transaction’s input to make it valid.

SVG Image
Combining Coin Value. Alice wants to pay Bob ฿5 but does not own a single coin large enough. Her transaction (TXID-369, right) combines the value of two ฿4 coins (819-A and 562-A) and produces two coins. One is a payment to Bob (369-A) and the other is change that Alice collects (369-B).

Coin value can be split and combined in the same transaction. For example, Alice might want to pay Bob ฿5, but may only own two coins valued at ฿4 apiece. Alice can combine these coins into an ฿8 payment using two separate inputs. She can then pay Bob and receive change by adding an output payable to herself.

Inputs and outputs allow coin value to be combined and split as needed — just like physical cash. When used with chain of ownership and digital signatures, outright forgery becomes very difficult. There’s just one problem: nothing prevents the owner of a coin from spending it repeatedly.

Double Spending

Transactions, digital signatures, and chains of ownership all protect users from counterfeiting. However, these features alone can’t protect users from receiving a coin that has already been spent. This form of fraud is known as double spending.

Double spending occurs when a payer gives the same coin to two or more recipients. At the transaction level, a double spend happens when two or more inputs reference the same output. If left unchecked, double spending enables the uncontrolled creation of new money.

SVG Image
Double Spending. Chuck’s payment to Alice (top right) cheats her by paying with a previously spent coin (0x551...7af, 0). Even with a valid chain of ownership, Alice is unaware of Chuck’s previous payment to Bob with the same coin (bottom right). Double spending occurs whenever two transaction inputs point to the same output.

Simply trusting a payee does not solve the double spending problem. For example, a coin may have been double spent early in its chain of ownership. Each new recipient passes the bogus coin to a new owner, unaware of its shady history. The fraud would only be detected by a recipient who happens to run across an intersecting chain of ownership.

SVG Image
Double Spending by Association. Alice and Bob separately spend a double spent coin received from Chuck (left). Each recipient remains unaware of the other’s payment, passing the double-spent coin to the next user.

A related situation arises in physical cash systems when the owner of a metal coin or bank note copies it, then uses the copies to make payments. However, the double spending problem faced by electronic cash systems is unique in that physical tokens can never be exactly replicated. A reference to a transaction output, in contrast, can be copied with 100% fidelity. Double spending allows digital tokens to be spent many times without ever calling the chain of ownership into question.

If left unchecked, double spending erodes buying power for the entire user community. As double spenders conjure value into existence without restriction, the money supply soars. Unconstrained money growth reduces the value of all coins, even those that aren’t double spent. Eventually users flee for more secure stores of value. Building an electronic cash system that can be used in the real world requires a practical solution to the double spending problem.

Auditor

Double spending is easy when the users of an electronic cash system can only see their own chains of ownership. Lack of visibility allows the same coin to be spent by multiple transactions without detection. This problem can be solved by enlisting an auditor.

An auditor sees every transaction, using this unique vantage point to protect users from double spending. On receiving a new transaction for review, an auditor applies a two-part validation test. First, any transaction that attempts to extend an unknown chain of ownership is rejected. Second, any transaction that attempts to re-spend an already spent coin is rejected. These two steps ensure that only unspent coins with valid chains of ownership will be accepted as payment.

SVG Image
Auditor. Alice’s payment to Bob (left) is audited. Finding no double spending, Victor the auditor updates his ledger to account for the new transaction (right). The output that Alice spends (0xffa...1fa, 0) is marked as spent and can never be re-spent.

As transactions are received, the auditor compiles them into a ledger. For the purposes of this discussion, a ledger consists of a two-column table where each row gives information for a single coin. The first column records a coin’s outpoint. The second column records whether or not the coin has been spent. In the interest of transparency, the auditor publishes the ledger, allowing all users to verify its contents at any time.

The term “miner” is widely-used to describe the auditor role in Bitcoin. This unfortunate choice of terminology obscures the purpose of the work being performed, which is to prevent double spending. To emphasize this purpose, the term “auditor” will be used throughout this article.

The availability of an auditor encourages payees to seek its services for every transaction. The reason is clear: a double spent coin may be rejected without warning in the future. To avoid this possibility, payees would demand that all incoming payments be checked by the auditor. Rather quickly, the auditor would come to play a central, indispensable role.

Establishing an auditor solves the double spending problem, but it leaves another problem in its wake. Should the auditor ever stop responding, the entire system would grind to a halt. Service outages could arise from fraud, coercion, natural disaster, incompetence, regulator action, technical failure, and a host of other situations. Without double spending protection, panic would result as users scrambled to find partners willing to accept unaudited transactions. Hard-earned savings would be wiped out as the value of stored coins plummeted. The mere rumor of a service interruption could lead to system-wide crisis.

Despite some early promise, the auditor has become a single point of failure whose presence jeopardizes the entire system.

Redundancy

Auditing transactions solves the double spending problem, but leads to a potentially devastating side-effect. A single accident or attack on a single auditor can lead to financial collapse. The risk can, however, be reduced through redundancy.

Redundancy duplicates a system’s critical components or functions for the purpose of increasing reliability. In the context of an auditor-mediated electronic cash system, redundancy means adding a second auditor, and possibly more. The presence of one or more redundant auditors would allow transactions to be processed even if one auditor failed.

Although hundreds or thousands of auditors might collaborate, the simplest system would consist of just two. Ideally, these two auditors would operate as independently as possible. For example, each auditor should be controlled by a different organization, subject to a different legal jurisdiction, and be physically located in a different geopolitical area.

To effectively work together, auditors would need to apply a consistent set of rules. Synchronized ledgers would provide part of the solution. Two auditors can synchronize their respective ledgers by relaying audit requests to each other.

SVG Image
Auditor Redundancy. Victor, an auditor, receives a transaction audit request and processes it as usual (top right). He relays the request to a second auditor, Vanna, who processes it and updates her ledger (bottom right).

Imagine a system of two auditors, Victor and Vanna. Victor accepts an audit request containing a transaction. Detecting no double spending attempt, Victor updates his ledger to reflect the new transaction. He then relays the audit request to Vanna. Detecting no double spending attempt, Vanna updates her ledger. Relaying audit requests to each other allows Vanna and Victor to maintain synchronized ledgers. Both will reach the same decision when presented with the same audit request.

But this system is fragile. At some point, Vanna and Victor will separately receive and relay independent audit requests at almost exactly the same time. This causes the auditors’ respective ledgers to differ in a subtle, but important way. The ledgers list the same transactions, but in different orders.

SVG Image
Out of Order. Immediate relay of audit requests cause the auditors’ ledgers to differ in the ordering of transactions.

The order in which each auditor records incoming transactions plays a crucial role in audit decisions. Imagine that Chuck wants to double spend a coin. He begins by creating two sibling transactions that each spend the same output. Chuck then simultaneously submits the first transaction to Vanna and the second to Victor. The auditors process these transactions independently, updating their respective ledgers. Then each auditor relays its respective audit request to the other, again simultaneously.

SVG Image
Splitting a Double Spend. Chuck creates two sibling transactions spending the same coin and sends each one to a different auditor. Victor rejects the request relayed from Vanna as a double spend, and Vanna does the same. From this moment on, Victor will assert that the coin belongs to Alice, but Vanna will assert the owner is Bob.

This is where the problems start. Victor rejects the transaction relayed by Vanna as a double spend of Chuck’s coin. For the same reason, Vanna rejects the transaction Victor relayed. From this moment on, the auditors disagree about the legitimate owner of Chuck’s coin. Victor believes Alice to the new owner, but Vanna believes the new owner to be Bob. Chuck has succeeded in his double spending attempt, despite the best efforts of two auditors working together.

For redundancy to work, auditors must find a way to agree on a single global transaction sequence. But before they can do that, they need a better language for talking about transaction order.

Timestamps

Any disagreement between auditors about the relative order of transactions can be exploited by double-spenders. To fix this vulnerability, auditors need to agree on a consistent transaction sequence. The first part of the solution comes in the form of timestamps.

A timestamp is a piece of information added to each item of a list for the purpose of enabling chronological sorting. For example, the postmark on a letter is a kind of timestamp that happens to include a time and date. Any collection of timestamped letters can be ordered by postmark, even if the letters were received by different post offices. Page numbers on a book can also serve as a kind of primitive time stamp. Even if each page is removed and reshuffled, the book can be reconstructed by ordering pages by ascending page number.

A more general mechanism uses a hash function to establish an ordering of items in a list. This kind of timestamp includes two pieces of information: the hash value of the message being timestamped; and the hash value of the previous timestamp. The presence of the previous timestamp makes it possible to link hash-based timestamps together into a timestamp chain.

SVG Image
Timestamp Chain. Message sequence is established by timestamps. A timestamp references its message and its parent through their respective hash values. The chain can be extended by appending a new timestamp to the right.

A timestamp chain is similar a chain of ownership in the way it resists tampering. For example, editing a timestamped message causes its hash value to change. This change breaks the link between timestamp and message, and between a parent timestamp and its child. Similarly, inserting a timestamp into the chain invalidates all timestamps after the insertion point. The only way to edit the chain without re-writing every timestamp is to append new timestamps.

SVG Image
Timestamp Chain Resists Tampering. Changing a timestamped message (Message 2) invalidates the link to the timestamp (Timestamp 2).

Any message can be timestamped, including transactions. Imagine that an auditor wants to record the order in which transactions were received using a timestamp chain. To do so, a new timestamp would include the new transaction’s hash value (unique ID) and the hash value of the parent timestamp. Rather than maintaining a loose collection of transactions, the auditor would organize transactions into a timestamped ledger.

SVG Image
Timestamped Ledger. Timestamping individual transactions produces a timestamped ledger.

Timestamp chains are modular in that they can be readily disassembled and reassembled. The procedure is simple. To re-link a message with its timestamp, first compute its hash value. Then, find the timestamp that includes this message hash. Using a similar procedure, a parent timestamp can be relinked to its child.

Modularity is useful because it allows a timestamped ledger to be broken down, transmitted over a distance, and reconstructed by many different auditors. After updating its local ledger, an auditor would relay a transaction together with its timestamp to the other auditors. A receiving auditor would then validate the transaction. If no double spending were detected, the relayed timestamp and its transaction would be appended to the local ledger. These simple rules allow each auditor to assemble an identical local copy of the ledger over time.

SVG Image
Sharing Timestamps. On receiving Transaction B, Vanna verifies and timestamps it (top left). She then relays the transaction together with its timestamp to Victor (bottom center). Victor adds Vanna’s timestamp and transaction to his own chain after validation (bottom right).

Timestamps allow auditors to detect transaction ordering conflicts. Imagine that Vanna and Victor receive different audit requests simultaneously. Each auditor timestamps the request and relays the result to the other. The timestamps were relayed at almost the same time, so they cross paths en route. However, extending a timestamp chain requires a match between the unique ID carried by the new timestamp and the unique ID of the last timestamp in the chain. Lacking a match, both auditors reject the relayed timestamps and their associated transaction.

SVG Image
Timestamp Conflict. After verifying and timestamping transactions simultaneously (left), each auditor relays its copy to the other (center). The relayed timestamps can’t be appended because each extends a parent with one child (right).

Error detection is useful, but by itself can’t solve the problem caused by audit requests crossing paths. What’s needed is not just a way to detect timestamp conflicts, but to resolve them.

Consensus

Timestamps enable an auditor to record and communicate a specific transaction sequence. However, a given auditor will occasionally observe a different transaction sequence than its peers. To record a uniform transaction order, the auditors much reach consensus.

Consensus is a state of agreement among the members of a group. Auditors need to reach consensus about the order of transactions to fulfill their mission of protecting users against double spending. This requires a mechanism to resolve differences in timestamp ordering.

A strictly linear timestamp chain interferes with consensus by limiting auditors to recording only one version of history. Adhering to a single, inflexible sequence precludes the negotiation needed to reach consensus. If instead auditors recorded multiple transaction sequences, they could compare them to resolve differences. This feature can easily be added by allowing multiple timestamps to claim the same parent. In effect, the linear timestamp chain would be replaced with a branching tree.

Imagine that two auditors, Vanna and Victor, simultaneously processed different audit requests (using timestamps C and B, respectively). Vanna receives Victor’s relayed timestamp B, and Victor receives Vanna’s relayed timestamp C. Although neither timestamp can be appended to the end of the timestamp chain, they can be inserted as siblings of timestamp A. In doing so, each auditor creates the same tree.

SVG Image
Branching the Chain. Vanna and Victor relay timestamps at the same time (left). Each auditor receives a relayed timestamp that can’t be appended to the end of the chain (center). However, the relayed timestamps can be inserted as siblings claiming the same parent A (right).

The branch point at timestamp A demarcates two versions of the audit history. This divergence must be resolved eventually to eliminate the risk of double spending. In other words, Vanna and Victor need a way to decide which of the two branches, A-B or A-C, represents the universally valid transaction history. Back and forth communication adds time and complexity to the process, and so should be avoided. Ideally, resolution would occur spontaneously.

Fortunately, the issue can be decided without sending or receiving additional messages. The auditors need only follow two simple rules:

  1. Append newly-received, valid timestamps to the longest branch. If two or more branches of equal length exist, build on the one that was seen first.
  2. If appending a timestamp makes one branch longer than the other, re-process the transactions in the shorter branch and add any missing timestamps to the longer branch.

To continue with the previous example, imagine that Vanna verifies a new transaction (D). Her next job is to pick the branch in her ledger that will be extended. Branch A-B and branch A-C are equally long. Recalling that she saw branch A-C first (because she created it), Vanna appends timestamp D to timestamp C (Rule 1). Vanna relays timestamp D to Victor, then re-processes transaction B (Rule 2).

SVG Image
Growing a Branch. Vanna adds Transaction D to her branched chain, moving Transaction B to her queue for reprocessing (left). She then relays the new Timestamp D to Victor (right).

Victor receives Vanna’s timestamp D, and appends it to branch A-C. Having grown a longer branch, Victor re-processes transaction B (Rule 2). At this point, both Vanna and Victor have equivalent ledgers and one transaction that needs to be re-processed (B). They are in complete agreement about the ordering of transactions.

SVG Image
Updating a Branch. Victor adds Timestamp D to his branched chain, moving transaction B to his queue for re-processing (bottom left). Both auditors now have identical branched chains and queues (bottom right).

The situation becomes more complicated if auditors relay timestamps at the same time. Imagine that Vanna and Victor start with equivalent, but branched timestamp trees. Each auditor receives a new, unique audit request at exactly the same time. The requests are processed and timestamped, then relayed simultaneously. Neither auditor is aware of the timestamp in transit from the other auditor.

SVG Image
Slow Timestamp Relay. Auditors with branched chains receive a new audit request simultaneously (left). Each auditor relays a different timestamp to the other at the same time (right).

Vanna and Victor receive their respective inbound timestamps, and update their ledgers accordingly. However, this merely results in extending both branches by one timestamp. Rather than solving the problem, Vanna and Victor made it worse. The tie can only be resolved if one auditor relays timestamps at a faster rate than the other. In other words, the auditors are stuck in a race condition.

SVG Image
Race Condition. Vanna and Victor simultaneously accept relayed timestamps (left). However, doing so leaves both timestamp trees in a branched state (right).

A race condition prevents consensus even when only two auditors are involved. Adding more auditors just compounds the problem by increasing the number of in-flight timestamps. Furthermore, increasing the rate of audit requests by users results in ever longer branches, again undermining the auditor’s core mission of protecting users from double spending fraud.

Auditors can sometimes arrive a common transaction sequence using only a timestamp tree and two simple rules. However, consensus breaks down as audit request volume and the number of auditors increases.

Proof-of-Work

As long as auditors disagree about transaction ordering, users will be exposed to double spending risk. Branches arise when two timestamps cross paths as they’re being relayed. The resulting race conditions can therefore be avoided by throttling the rate at which timestamps may be relayed. Such a rate limit could be imposed through proof-of-work.

Proof-of-work is a system that restricts access to a valuable resource by forcing users to perform computational work as a condition of use. The idea was first proposed in 1997 as a way to discourage email spam. Known as “Hashcash,” the system would have allowed email users to screen messages based on whether or not a valid proof-of-work were attached. Ordinary users sending individual emails wouldn’t notice the cost, but spammers sending millions of emails would.

For auditors, the valuable resource is the communication channel through which timestamps are relayed. As this channel becomes clogged with timestamps flowing between auditors, branches take root and flourish, ultimately producing a race condition. Requiring auditors to submit a sufficiently expensive proof-of-work before relaying a timestamp would keep the channel clear.

Putting proof-of-work into practice requires a proof-of-work function. An essential quality of such a function is asymmetry. This means that verifying a proof-of-work should be fast, but generating it should be slow. With a little creativity, a hash function can serve double-duty as a proof-of-work function.

Recall that a hash function accepts a message as input, reproducibly returning a hash value as output. A hash function can be transformed into a proof-of-work function through the use of a nonce. A nonce, or number used once, is a value embedded into a message that changes the output of a hash function. For example, a simple proof-of-work function might append a nonce to a message, then return the hash value obtained from the result. The output of a hash-based proof-of-work function is unpredictable, but the same nonce and message will always yield the same hash value. In this way, a proof-of-work can be both easy to verify and difficult to produce.

SVG Image
Proof-of-Work-Puzzle. A message and a nonce are passed to a proof-of-work function, yielding a hash value candidate. Changing the nonce changes the hash value. Providing the nonce needed to generate a hash value within a target range constitutes a solution to the puzzle.

A proof-of-work function can serve as the basis for a proof-of-work puzzle. Such a puzzle asks for a nonce that when combined with a message gives a hash value less than or equal to a threshold value. The difficulty of the puzzle can be adjusted (or “targeted”) by changing the value of the threshold. Recall that secure hash functions resist preimage attacks. This leaves trial-and-error as the only winning strategy to find a valid proof-of-work for a puzzle based on a secure hash function. Raising the target value widens the range of acceptable hash values, and therefore reduces the number of guesses and time needed to find a valid solution. Lowering the target value narrows the range of acceptable hash values, decreasing the speed with which a winning nonce can be found.

By revealing a suitable nonce, a user proves that sufficient computational work has been performed to gain access to a mutually-valuable resource. Others can easily pass the original message and nonce into a hash function and verify that the output falls below the required threshold. In other words, a message, nonce, and target threshold prove that enough computational work was expended to unlock access to a resource.

Timestamp flow can be throttled by forcing auditors to submit a proof-of-work along with each relayed timestamp. Higher rates of flow will result from higher thresholds, and lower rates of flow will result from lower thresholds. However, using such a system requires some changes to the rules.

Blocks and Block Chains

A proof-of-work system can throttle the rate at which timestamps are relayed, reducing the frequency of branches and discouraging double spending attempts. However, using proof-of-work requires solutions to two key problems. First, auditors need a way to publish proof-of-work nonces. Second, auditors need to process audit request quickly, despite a limit on the rate of timestamp publication. Both problems can be addressed by replacing raw timestamps with blocks.

A block is a timestamp that includes features needed to implement a proof-of-work system. Like a timestamp, a block references a parent through its unique, hash-based ID. Unlike a timestamp, a block includes a proof-of-work nonce needed to generate a hash value within the target range. A block also differs from a timestamp in that it can reference many transactions. This feature decouples the rate of transaction processing from the rate of block relay. Recall that an ordered list of transactions can be referenced by its Merkle root.

SVG Image
Block Chain. A block encapsulates four properties: (1) a unique block ID (BID); (2) the ID of the previous block; (3) the Merkle root for a list of transactions; and (4) a proof-of-work nonce. The block ID is computed on-demand, not stored with the block.

Like timestamps, blocks can be chained together to create a block chain. In practice, this block chain will branch occasionally due to two or more auditors simultaneously finding a proof-of-work. Although we speak of a block chain, it’s better to picture a block tree.

SVG Image
Block Tree. Two or more blocks can claim the same parent, resulting in a tree. To construct a chain, an auditor picks the path through the tree with the most cumulative proof-of-work (the “active chain”, orange).

Keeping block siblings and their children in a block tree requires an auditor to determine the active chain. The active chain is the single path through the block tree that will be used to order transactions and thereby detect double spending. To find the active chain, an auditor sums the proof-of-work carried by each block. The one with the most cumulative proof-of-work (the “strongest” chain) is used. In the event of a tie, the chain whose tip was received most recently is used.

Generating a new block requires three pieces of information: a Merkle root for the included transactions; the hash value of the parent block; and the proof-of-work nonce. The Merkle root is obtained by bundling all pending, verified transactions into a Merkle tree. The parent block is the last one on the strongest chain.

Finding a proof-of-work nonce is a trial-and error process. A nonce is guessed and the block’s hash value is computed. If the hash value falls above the target threshold, another nonce is chosen and the candidate block is again hashed. This iterative process continues until the block’s hash value falls below the target threshold.

Block generation can be interrupted by the receipt of a valid block relayed by another auditor. When this happens, the received block’s proof-of-work nonce is verified by comparing the block ID with the currently-accepted threshold. An ID below the target threshold proves that the necessary amount of work was done. The block’s individual transactions are also validated. If all are valid, the containing block is added to the auditor’s local copy of the block chain. A new block candidate is then created from pending transactions, and block generation begins anew.

As auditors’ collective computational power rises and falls, a steady block frequency can be maintained by adjusting proof-of-work difficulty. Each auditor independently measures the rate at which blocks are being relayed. If the rate is too high, difficulty is increased. If the rate is too low, difficulty is decreased. This system allows auditors to join and leave as they please without affecting the overall rate of block generation.

By keeping the rate of block relay sufficiently slow, auditors can prevent race conditions entirely. However, producing proof-of-work consumes valuable resources such as time, energy, and hardware. Economic viability of this system requires that auditors be paid.

Transaction Fees

A proof-of-work system by itself can’t motivate auditors to participate. Although some auditors would donate valuable resources to a volunteer effort, most would not. One type of compensation comes in the form of transaction fees.

A transaction fee is a payment made by the author of a transaction to the first auditor who bundles it into in a block. To add a fee to a transaction, a user ensures that the sum of all input values exceeds the combined value of all output values. Although the receiver of a payment benefits most directly from double spending protection offered by auditors, the sender pays the transaction fee.

SVG Image
Transaction Fee. Carol gives Alice a coin valued less than the one she’s spending. The difference (฿0.1) can be claimed by an auditor as a transaction fee.

An auditor claims fees for the transactions in a block it generates through a coinbase transaction. A coinbase transaction is unusual in that its single input doesn’t reference the output of a previous transaction. Instead, funds are implicitly drawn from the block’s combined transaction fees. Including the coinbase transaction within a block allows an auditor to spend its combined transaction fees.

SVG Image
Coinbase Transaction. Victor collects a ฿0.1 fee through a coinbase transaction. This transaction spends no previous output, and is funded by the cumulative transaction fees in the block for which Victor provides proof-of-work.

Transaction fees respond to market conditions. Users omitting fees or paying too little will find fewer auditors willing to process their transactions. These transactions will in turn take longer to validate. In contrast, users wanting faster audits can prioritize a transaction by attaching a higher fee. The aggregate value of transaction fees determines the degree to which auditors are willing dedicate resources to computing proof-of-work.

Block Subsidy

Chain of ownership, digital signatures, and redundant auditors all help to secure digital coins that already exist. However, an electronic cash system needs a way to mint and distribute new money. This problem can be solved with a block subsidy.

A block subsidy pays the first auditor who successfully generates and relays a new block. This payment is added to the value of any transaction fees, independently of the number of transactions. The block subsidy is sometimes referred to as a “mining reward” or “block reward.”

SVG Image
Block Subsidy. Every coin in circulation can trace its origin to a payment made through a coinbase transaction (left). This transaction is funded through both a block subsidy and the hosting block’s combined transaction fees.

Block subsidy payments conjure new money into existence. This places a coinbase transaction at the origin of all chains of ownership. In other words, a coin whose chain of ownership doesn’t begin with a valid coinbase transaction is counterfeit.

The power to create money from thin air makes the Block subsidy an obvious vulnerability. To counter fake subsidy payments, auditors check that the amount being claimed is valid. Any discrepancy from the expected value causes an auditor who received the containing block to reject it.

Linking the generation of new blocks with a block subsidy establishes a stable and predictable schedule for growing the money supply. As more auditors join the network to claim the block subsidy, the proof-of-work threshold decreases and difficulty increases. Each auditor expends more resources to generate a new block as a result. When the cost of producing proof-of-work becomes too high, auditors resign, reducing difficulty and increasing the profitability of those who remain. Both changes keep the overall rate of block generation constant.

A negative feedback loop establishes a steady rate of block subsidy payments, regardless of the numbers of auditors or their computing power. However, this arrangement won’t guarantee steady growth of the money supply without first setting the value of the block subsidy.

Money Creation

Without a limit on the value of the block subsidy, the money supply would grow uncontrollably. To ensure the long-term value of saved coins, the value of the block subsidy must be set in advance. In other words, an electronic cash system needs a way to regulate money creation.

Money creation is the process whereby the total face value of coins increases. Debates about money creation have raged for decades. Proponents of “hard money” policies argue that slow money supply growth promotes price stability and sustainable economic development. “Soft money” proponents seek to eliminate economic swings with a flexible rate of growth that adapts to prevailing conditions.

These debates will no doubt continue for as long as money is used. In the meantime, a practical way to set the value of the block subsidy is needed. Accounting for external economic indicators such as exchange rate and user adoption might make for an interesting research project. However, such dependencies would themselves likely lead to a multitude of other problems.

A simple and secure approach would be to cut the value of the block subsidy at regular intervals. The subsidy would start at an initial value, where it would remain for a fixed period of time. Then the value of the subsidy would decrease to a fraction of its original value, where it would remain until lowered again. Repeated cuts would lead to the eventual elimination of the block subsidy. With each cut, transaction fees would become a larger component of auditor compensation, weaning auditors off the block subsidy. At the same time, the rate of money creation would approach zero in a predictable, transparent manner.

This system serves the interests of two important user groups. The first group, early adopter auditors, would receive subsidy payments with relatively large face value. The low buying power of these early payments would prompt many recipients to spend or give coins away quickly, distributing new money into wider circulation. The second group, investors, would view dwindling injections of new money as conducive to wealth preservation. Their long-term commitment to saving would lay the groundwork for buying power stability and ever wider adoption.

SVG Image
Tapered Block Subsidy. The subsidy remains constant for a predetermined number of blocks and then decreases by half. Eventually, the subsidy disappears altogether.

Consensus secures, not just user transactions, but the money creation process itself. An auditor can’t force the illegal creation of new money any more than she can force the generation of invalid blocks or transactions.

The Bitcoin Network

Starting from the idea of electronic cash, this guide has explained the Bitcoin network in a technically correct but simplified way. A coin is defined in terms of a mathematically secure chain of ownership. Transactions allow coin value to be combined, split, and transferred to others. Payments are authenticated using public key cryptography and digital signatures. A network of self-interested, consensus-seeking auditors protects users from double spending fraud. Auditors are in turn paid for their services through fees and an ever decreasing block subsidy. The subsidy rewards early adopters, and its gradual abolition encourages savings. New money enters the system on a predetermined schedule, without a central authority.

Next Steps

Understanding Bitcoin from this vantage point offers many opportunities for further exploration. However, this simplified view only lays the groundwork. Becoming your own bank requires a deeper understanding of security and privacy, as well as techniques for practical use.

My e-book, Owning Bitcoin, picks up where this guide leaves off.