Courseraで、Bitcoin and Cryptocurrency Technologiesっていう講義を受けた。みんなめっちゃビットコインって口にするから何なのか知りたかったものの、「ビットコインとは」とか検索しても妙にテンション高い記事しか出てこなくてイラっとしたので、淡々と教えてくれるCourseraで学んだ。
このコース受けるときの注意
私みたいな完全初心者は、week 3ぐらいまで進んでからweek 1の宿題解いた方がいい。講義に一言も「UTXO」とか出てきてないのに「UTXOクラス使ってね」って言われてもよくわからんし。Instructionと実際のコードから何となくわからなくはないけど、それならweek 3まで先にやっちゃう方が効率的。
宿題するにはJava書けないといけない。特にAssignment 2はいろいろ試して試行錯誤する系なので、思いついたことを素早く書けないとめんどくさくなる。
ただ、graderがエラーとかちゃんと見せてくれるので、自分とこで実行環境まで作らなくても大丈夫だった。
Certificateは出ない。
以下は講義ノート。1.1はノート取らなかったので、1.2から。
1.2. Hash Pointers and Data Structures
H(obj): hash of obj
n = the number of leaf nodes (not the number of all the nodes in the tree)
Going down the entire tree takes O(log n) because the height of the tree is log n.
このページがわかりやすい。
1.3. Digital Signatures
2 primitives of cryptocurrency technologies:
Hash function
Digital signature
Rules of digital signatures
Only you can sign, but anyone can verify
A signature is tied to a particular document (cannot be cut&pasted)
API for digital signatures
(sk, pk) = generateKeys(keySize)
sk: secret signing key (used to make a signature)
pk: public verification key (used by anyone to verify)
sig = sign(sk, message)
sig (a string of bits): signature
isValid = verify(pk, message, sig)
Requirements for signature
valid signatures should verify
verify(pk, message, sign(sk, message)) == true
signatures cannot be forged
Consider a game between a challenger and an attacker, where the attacker tries to forge a signature for a particular message. The probability that the attacker wins should be negligible.
Algorithms are randomized -> We need good source of randomness
Limit on message size -> use Hash(message) instead of the raw message
By signing a hash pointer, we can protect all the nodes that the pointer points at.
Bitcoin uses ECDSA standard
ECDSA = Elliptic Curve Digital Signature Algorithm, a digital signature scheme
When using this algorithm, we need to use good randomness; otherwise, we'll leak our private keys
1.4. Public Keys as Identities
public key == an identity
That is, if verify(pk, msg, sig) == true, it's like "pk says [msg]" (= the pk is the actor or the party that states the message).
To make the statement that the pk states, we need the matching sk.
How to make a new identity
create a new, random key pair (sk, pk) by executing generateKeys
pk: the name of that identity. Since the raw pk is big, it's usually better to use Hash(pk)
sk: lets you (the creator of the keys) "speak for" the identity
Decentralized identity management: No central point of control
Rather than making users register in order to use the system, make anyone create identities at anytime.
-> This is the way bitcoin works
In bitcoin jargon, "address" is the name of the identity--either pk or Hash(pk).
Privacy
Addresses are not directly connected to the real-world identity.
But observers can link multiple activities done by the same address over time and make inferences.
1.5. A Simple Cryptocurrency
Simplest possible cryptocurrency: GoofyCoin
Goofy can create new coins whenever he wants
These new coins belong to Goofy
a coin: the result of CreateCoin operation with a uniqueCoinID, signed by pk (= Goofy)
Anyone can verify that the sign is valid and that the sign is put on this statement.
A coin's owner can spend it
spend: to pass the coin to someone else
Goofy makes a new statement saying "Pay this to pk (Alice): H(the original coin object)", which is signed by pk (Goofy)
The recipient can pass on the coin again to another person
Alice: "Pay this to pk (Bob): H(the second coin object)" and sign it herself
Anyone can verify the validity of the coin by following the chain and checking all the signatures along the way.
A security problem
double-spending attack: Alice can make two parallel statement saying that she passes the coin to two different people
ScroogeCoin: designed to solve the double-spending problem
Scrooge publishes a history of all transactions (a block chain, signed by Scrooge)
Anyone can follow the entire structure, which is endorsed by Scrooge, and can notice the attempted double-spending by viewing the transaction history of the coin.
Sometimes, for the sake of optimization, multiple transactions are put into the same block.
Scrooge coin allows two types of transactions:
- CreateCoins: can create multiple coins at a time
num: serial number
value: the worth
recipient: pk of the person who becomes the owner of the created coin
- PayCoins: can consume multiple coins owned by multiple people at a time
This transaction is valid if:
consumed coins are valid (legitimately created)
consumed coins have not been already consumed
signed by every owners of all consumed coins (The same person need to sign multiple times if he has multiple coins in the transaction (judging from the quiz at the end of the video))
Immutable coins
Coins can't be transferred, subdivided, or combined.
But you can get the same effect by consuming your coins and create new coins that have, in total, the same amount of value.
The problem: What if Scrooge starts to misbehave?
-> We need to decentralize (descroogify) this currency
Programming Assignment: Scrooge Coin
week 1で見たときは意味不明だったのに、week 3から戻ってきたらわりと単純に見える!
疑問点
"unordered"ってことは、時系列にすらなってないってこと?"a transaction can reference another in the same block"って書いてあるけど、arrayの中で自分より後のtransactionを参照したりする…?
→そういうの考慮しなくてもpassした。けど、一回目のループでacceptされんかったやつを後でもっかい試してみるとかやった方が安全だったかも。
addInput()とかできるって説明あったけど、そんなことしたらTransaction全体のhashが変わっちゃうから、これのoutputをspendしようとするやつの参照がわからなくなっちゃうのでは?てか自分でaddInput()とかする意味ある?
→Extra creditしない限り別に使いどころなかった。
maximumとmaximalって何が違うん…
→この答えが最高にわかりやすい。
2.1. Centralization vs. decentralization
The problem with ScroogeCoin: it has a central authority
Centralization & decentralization is not all-or-nothing.
e.g. E-mail has decentralized protocol, but is dominated by centralized webmail services.
Aspects of decentralization in Bitcoin:
Who maintains the ledger?
Who has authority over which transactions are valid?
Who creates new bitcoins?
Who determines how the rules of the system change?
How do bitcoins acquire exchange value?
Peer-to-peer network (almost purely decentralized): open to anyone, low barrier to entry
Mining: open to anyone, but inevitable concentration of power is often seen as undesirable
Updates to software (how and when the rules change): core developers trusted by community have great power
2.2. Distributed consensus
Key technical challenge of decentralized e-cash: distributed consensus
Key problem: inconsistent database (where an event is recorded in one database but not in other databases)
Distributed key-value store enables various applications like DNS (domain name server), public key directory (which pairs users' email or something with their public keys) etc.
The requirements of distributed consensus
The protocol terminates and all correct nodes decide on the same value
correct nodes: excludes faulty or malicious nodes
This value (that they agree upon) must have been proposed as input by some correct node
What does it mean in the context of Bitcoin?
When Alice wants to pay Bob, she broadcasts the transaction (which has Alice's signature and transaction content, and the hash to the coin she wants to pay) to all Bitcoin nodes.
Bob's computer doesn't need to receive this transaction. As long as all the nodes received that transaction, the coin is Bob's now.
How consensus could work in Bitcoin:
All nodes have a sequence of blocks of transactions they've reached consensus on We do consensus on block-by-block basis for optimization
Each node has a set of outstanding transactions it's heard about
8:00くらいのとこよくわからんかった。inputとは何なのか。outstanding transactions it's heard aboutとは。
Problems:
Nodes may crash
Nodes may be malicious
Network is imperfect
Not all pairs of nodes are connected
Faults in network
Latency -> we cannot use timestamp
There are some impossibility results.
Some well-known protocols
e.g. Paxos (Never produces inconsistent result. In exchange, it can get stuck and fail to make any progress)
Bitcoin works well because it uses different assumptions than the models that produced impossibility results.
Things that Bitcoin does differently:
Introduces incentives for users to work together (Works only because it's a currency)
Embraces randomness
Consensus do not have specific ending time. Rather, the probability that the transaction has made it into the consensus blockchain rises as time goes, and the probability that you're wrong in making an assumption about the transaction goes down at the same time.
2.3. Consensus without identity: the block chain
Nodes do not have any persistent identity.
Why is it difficult to make consensus without identity?
Pragmatic reason: some protocols need node IDs
Security reason: if nodes have identities, we can assume something like "less than 50% of nodes are malicious"
Why don't Bitcoin nodes have identities?
Identity is hard in a P2P (peer-to-peer) system (We need a central authority to manage identities)
-> cannot detect Sybil attack
Bitcoin aims at making pseudonymity
Users can participate without registering their personal information
Weak assumption: There is a way to pick a random node
How to: Give nodes tokens or something, and later, choose randomly from the published tokens and call the corresponding node.
Adversary that attempts Sybil attack only gets one token, thus not being able to increase his power.
What becomes available by this assumption: implicit consensus
In each round, random node is picked. This node proposes the next block in the chain. Other nodes implicitly accept/reject this block by either extending it or ignoring it and extending chain from the last block in the block chain (= skip the proposed block).
Every block contains hash of the block it extends -> Then the nodes can signal which nodes they are extending
Simplified consensus algorithm
New transaction are broadcast to all nodes
When Alice wants to pay to Bob, she broadcast that transaction to all nodes.
Each node collects new transactions into a block
Nodes are constantly listening to the transactions that yet have not made it to the consensus block chain.
In each round a random node gets to broadcast its block
The chosen node can propose a block that contains all the outstanding transactions it has heard about.
Other nodes accept the block only if all transactions in the block are valid (not double-spending, have valid signatures)
Therefore, blocks proposed by malicious nodes can be rejected.
Nodes express their acceptance of the block by including its hash in the next block they create
じゃあnodeごとに違うblock chainを持つことになってしまわないのか?
→なる。でもそんな感じで動いている。
What can a malicious node do?
Can it simply steal coins that belong to other people?
No. It cannot forge the owner's signature, and transactions that contain invalid signature are rejected by other nodes.
Suppose it hates Bob. Then it can decide not to include transactions originated in Bob's address in the block that proposes. Can it prevent Bob's transactions?
No. Other honest node can propose blocks that contain Bob's transactions in other rounds.
Can it double-spend?
Which of the two blocks end up in the long-term consensus? From the technological point of view, these two transactions are identical.
Policy for honest nodes: Extend the longest valid branch.
Due to network latency, there is not guarantee that the "legitimate" transaction is the first transaction heard by nodes and thus it is proposed first.
If the double-spending block is accepted, then the "legitimate" block gets completely ignored by the network, becoming an orphan block.
How do the merchant (the person who receives money from the double-spending attacker) decide whether the money is legitimately payed?
The more confirmation the block gets, the more likely the block get into the long-term consensus block chain.
Most common heuristic: 6 confirmations is enough to believe that the block ends up in the consensus chain.
2.4. Incentives and proof of work
Are the majority of nodes really honest?
-> We need to give nodes incentives for behaving honestly.
Can we reward nodes that created the blocks that end up in the long-term consensus chain? (The nodes contained in the first branch in the picture above)
-> Use Bitcoin itself!
Incentive 1: Block reward
Creators of each block gets to
include special coin-creation transaction in the block (The value is fixed: currently 25 BTC, halves every 4 years -> total supply ends at 21 million in 2140)
choose recipient address of this transaction (typically itself)
Block creator can "collect" the reward only if the block ends up on long-term consensus branch!
This is the only way BTC is created.
Creatorとはproposerなのか? →いえす。
Incentive 2: Transaction fees
Creator of transaction can choose to make output value less than input value
Remainder is a transaction fee and goes to block creator
Purely voluntary, like a tip
誰がそんなことするの?
→priorityにもよるけど、十分なtransaction feeがないとblockになかなか入れてもらえない。それに、transactionの目安も制定されてる。
Remaining problems
How to pick a random node?
How to avoid a free-for-all (fight, dispute without no rules) due to rewards?
How to prevent Sybil attacks (Adversary creates a lot of Sybil nodes to manipulate the consensus)?
Solution: Proof of work
Approximate selecting a random node by selecting nodes in proportion to a resource that no one can monopolize (hopefully)
Two kinds of resources:
In proportion to computing power: proof-of-work (used by Bitcoin)
In proportion to ownership of currency: proof-of-stake (used by some of alternative currency)
Equivalent views of proof of work
Select nodes in proportion to computing power
Let nodes compete for right to create block
Make it moderately hard to create new identities
Hash puzzles
To create a block, a node need to find a nonce (a number) such that H(nonce || prev_hash || tx || ... || tx)
is smaller than certain threshold. (|| stands for concatenation)
Given that the hash function is secure, only way to succeed is to try many nonces until you get lucky.
output space: all the possible hashed value
target space: hashed value that is smaller than the threshold
target space / output space is very small in order to make the hash puzzle difficult
Nodes have to constantly compete to solve the puzzle.
PoW (Proof of Work) properties:
The problem has to be quite difficult to compute
Only some nodes bother to compete. These nodes are called "miners".
Parameterizable cost
Nodes automatically re-calculate the target value (the size of the target space in the output space) every two weeks
Goal: average time between blocks = 10 mins (good interval for optimization)
Because the overall power of all the nodes increases, the amount of work a single node need to do also increases.
P(Alice wins next block) = fraction of global hash power she controls
For an individual miner, the mean time it takes to find a nonce: 10 mins / fraction of hash power (derived from the graph on the slide)
Key security assumption
Attacks are infeasible if majority of miners weighted by hash power follow the protocol
Trivial (?) to verify
Nonce must be published as part of block
Then other miners can simply verify that the hash is smaller than the target value
2.5. Putting it all together
It costs to invest in hardware to become a successful miner. But the income one gets from creating a block is also high.
mining reward (block reward + tx fees) - hardware and electricity cost = profit (can be negative)
They cannot be simply compared
fixed (hardware) vs. variable costs (electricity)
reward depends on global hash rate
exchange rate of BTC and real money is variable
miners can deploy different strategies
Bitcoin has three types of consensus
Value: how much BTC one has depends on consensus
State
Rules
Bitcoin is bootstrapped
Security of block chain -> value of currency -> health of mining ecosystem -> security ...
What can a "51% attacker" do?
Steal coins from existing address? -> No. They can create a longest branch, but merchants who runs honest nodes do not accept that invalid branch.
Suppress some transactions?
From the block chain -> Yes. They can just refuse the block containing those transactions.
From the P2P network -> No. The network still receives those transactions.
Change the block reward? -> No. The attacker needs to change the rule, but doesn't control the copies of the Bitcoin software.
Destroy confidence in Bitcoin? -> Yes. The main practical threat.
3.1. Bitcoin Transactions
Bitcoin consensus gives us:
Append-only ledger
Decentralized consensus
Miners to validate transactions
assuming a currency exists to motivate miners!
How to make that currency?
The account-based ledger system is simple, but requires us to calculate the account balance by going backward to evaluate the validity each time a transaction happens.
Instead, Bitcoin uses a transaction-based ledger, which explicitly specifies the number of inputs and outputs.
In this case, it's easy to prove that the transaction 4 is not valid because the amount of input and output don't match.
Here, each transaction has its own identifier.
The first transaction's input is ∅ because the coins are newly created.
"1[0]" means "output index 0 of transaction 1".
Why does Alice send money back to herself (Change address) in transaction 2?
-> Everyone has to consume all the output from the previous transaction.
We implement this with hash pointers, so we have direct access to the output which the input of the transaction refers to.
What becomes easy in this transaction-based ledger system?
Merging values: You can set two outputs directed to you as the two inputs of the transaction, and direct the total value to one person.
Joint payments: You and someone else can jointly pay to one person in a single transaction. (The transaction requires two separate signatures)
A Bitcoin transaction data structure
{ "hash": "abcdefg...", "ver": 1, "vin_sz": 2, "vout_sz": 1, "lock_time": 0, "size": 404, "in": [ { "prev_out": { "hash": "qwerty...", "n": 0 }, "scriptSig": "asdfgh..." }, { one more input } ], "out": [ { "value": 10.00, "scriptPubKey": "OP_DUP OP_HASH160 ..." } ] }
The first 6 properties are metadata, followed by the series of inputs and outputs.
hash: the hash of this entire transaction
housekeeping information
ver (version number, which is currently 1)
vin_sz (number of inputs)
vout_sz (number of outputs)
housekeeping: size
We'll come back to "lock_time" property later.
In the input data:
n: the index of the output that is consumed
scriptSig: signature of the owner of that output
In the output data:
- scriptPubKey: the recipient address?? script?? We'll come back to this in the next lecture!
3.2. Bitcoin Scripts
The output scriptPubKey is actually a script, not just an address. The input address is also a script. To verify the transaction, the concatenated script must execute completely with no errors.
scriptSig
30440220... 0467d2c9...
scriptPubKey
OP_DUP OP_HASH160 69e02e18... OP_EQUALVERIFY OP_CHECKSIG
Bitcoin's scripting language
Built specifically for Bitcoin (inspired by Forth)
Simple, compact
Has support for cryptography (like calculating hash, verifying signature)
Stack-based (Last In First Out like a stack of books)
Limits on time/memory
No looping (everything is linearly executed) -> No possibility of infinite loop
Reverse-polishとは何か、そしてその特徴がどのようにスクリプトの書き方に影響してるかについては、The Best Step-by-Step Bitcoin Script Guide Part 1のHow Does the Script Work?の項がわかりやすい。OP_EQUALとOP_EQUALVERIFYの違いも載ってる。
The sender specifies the public keys of the recipients, and recipients need to specify the signature to redeem the coins.
Redeemっていうのは、そのコインを実際使うってことでいいのか?コインを受け取るとき(recipientになるとき)にはsignature要らんよね?
What is written in the script?
The first two lines are data instructions which specifies the signature and the public key that is used to generate the signature of the recipient of the input component.
Recipient of the input componentって言うけど、inputsは分かれて別々のrecipientに行くかもしれないんだから、一対一で対応してなくない?それとも、inputは実はprevious outputだから、そのrecipient(今回のsender)ってことなのか?
→それだったらhash of <pubKey>と<pubKeyHash?>が一緒であるべきっていうのが意味わからなくなる。<pubKeyHash?>っていうのは受取人のものだから、明らかに今回のsenderのものじゃないし。
→後で判明するからちょっと待って。
In stack-based language, if you see a data, you push it onto the stack. (No need to declare variables)
<sig>: Signature of the input
<pubKey>: Recipient address
OP_DUP: Duplicate operation, which adds a clone of the top element of the stack
OP_HASH160: Compute the hash of the top element on the stack. Replaces the element with the computed result.
<pubKeyHash?>: Specified by the sender
OP_EQUALVERIFY: Compares <pubKeyHash> to the hash derived from OP_HASH160. If they aren't the same, the script stops executing.
OP_CHECKSIG: Check if <sig> is the signature by <pubKey>. The message is the entire transaction.
<pubKeyHash>: "Hash of the public key that was actually used by the recipient, when trying to claim the coins". でもclaimするのってrecipientがそれを使うとき、つまりもっと後のことじゃないの?受け取るときに必要なの?Senderがこのtransactionを作ってるから、recipientはその作成に介入できないと思ってたんだけど、何でrecipientが情報載せることができてるの?二人が共同で作ってるの?
→上で挙げた疑問も合わせて、この辺の混乱は、Discussion forumのこの質問が述べてくれている。つまり下図みたいなことだよね?やっとわかった。
この辺の捉え方については、The Best Step-by-Step Bitcoin Script Guide Part 1のThe Game of Locking and Unlockingの項がわかりやすかった。
なんでsenderはpublic keyじゃなくてpublic keyのhashを載せてるの?
→Since the raw pk is big, it's usually better to use Hash(pk) (week1より)だからかな?
Bitcoin script instructions
256 opcodes total
Arithmetic
If / then
Logic / data handling
Crypto
Hash function
Signature verification
Multi-signature verification
OP_CHECKMULTISIG
Built-in support for join signatures
Specify n public keys
Specify t (threshold. There must be at least t out of n signatures that are valid for this transaction to be valid)
Verification requires t signatures
There's a bug: Extra data value popped from the stack and ignored
Avoid this bug by putting extra dummy values on the stack
Bitcoin scripts in practice (as of 2014)
Most nodes whitelist known scripts (Refuse to execute scripts that aren't considered to be standard)
99.9% are simple signature checks
~0.01% are MULTISIG
~0.01% are Pay-to-Script-Hash
Remainder are errors, proof-of-burn
Proof-of-burn
It proves that the coins have been destroyed and cannot be spent in any ways.
OP_RETURN <arbitrary data>
This script throws an error no matter what is written above.
When do people use it?
When they want to put arbitrary data on blockchain
When forcing people to destroy Bitcoin in order to get coins in new system
Pay to Script Hash (P2SH)
Senders can specify the hash to the script they want to use.
Then, the sender's output becomes simple, while the recipient needs to specify the entire redemption script and let the sender know the hash of the script.
Input's scriptSig by the recipient
<signature> <<pubKey> OP_CHECKSIG>
Output's scriptPubKey by the sender
OP_HASH160 <hash of redemption script> OP_EQUAL
The first validation: Is the <hash of redemption script> really the hash of the redemption script specified by the recipient?
The second validation: Does the signature correspond to the pubKey?
Redemption scriptはいつ実行されるの?Stackに入ったら自動的に実行されたりするのか?OP_EQUALがhashになったscriptを消しちゃうと思うから、それまでに実行されないといけない? そもそも、<<pubKey> OP_CHECKSIG>がredemption scriptだと思ってるんだけど、合ってるよね?
3.3. Applications of Bitcoin Scripts
Example 1: Escrow transactions
Alice is a buyer, and Bob is a seller. Alice doesn't want to pay until Bob sends the goods to her, and Bob doesn't want to send the goods until Alice pays him.
In this case, they will involve a trusted third-party, Judy.
(上図は私の理解なんだけど、合ってるかどうかわからん。2つtransaction起こるよね?あと、「Multisig address」って書いたところが「escrow address」って言ってる人もいた。1つ目のtransactionの送り先は、2つ目のtransactionのinputで書かれるredemption scriptのhash値??)
If Alice pays and Bob sends the goods: Alice and Bob sign to send the coin to Bob
If Alice pays but Bob doesn't send the goods: Alice and Judy sign to send the coin back to Alice
If Bob sends the goods but Alice refuses to sign: Bob and Judy sign to send the coin to Bob
Example 2: Green addresses
Alice is a buyer, and Bob is a seller. Bob is either offline or too busy to check the blockchain.
In this case, Alice asks a bank to pay Bob, and bank pays to Bob with its sign. If Bob believes the bank doesn't do double spending, then he can trust the payment and give the goods to Alice without waiting for 6 confirmations.
This trust comes from the real-world trust, not from the Bitcoin system.
This isn't used so much in practice because there were some companies that implemented green addresses and ended up collapsing.
Bobがofflineだったらどうやってtransactionが発行されたかどうか確認するの?
Example 3: Efficient micro-payments
Alice wants to pay Bob a small amount of Bitcoin every minute for the service Bob provides. Alice doesn't want to incur a transaction fee every minute.
How can Alice combine these small payments into one big payment at the end?
What if Bob never signs?
Alice demands a timed refund transaction between process (1) and (2) in the illustration. "Pay 100 to Alice, LOCK until time (signed by both Alice and Bob)"
The lock time is specified as the transaction property "lock_time".
More advanced scripts
Multiplayer lotteries
Hash pre-image challenges
Coin-swapping protocols (-> It becomes difficult to trace who owns which coin)
These contracts are called "Smart contracts".
The contracts that can enforce some features which were traditionally enforced by laws or authority.
3.4. Bitcoin Blocks
Why bundle transactions together?
Single unit of work for miners
Keep the length of hash-chain of blocks shorter
Faster to verify history
Bitcoin block structure is a combination of two different hash-based structures.
Hash chain of blocks
Hash tree (Merkle tree) of transactions in each block
The structure of a block
{ "hash": "0000...", "ver": 2, "prev_block": "0001...", "time": 1391, "bits": 4195, "nonce": 4594, "mrkl_root": "89776...", "n_tx": 354, "size": 181520, "tx": [ ... ], "mrkl_tree": [ "6bd5eb25...", ... "89776cdb..." ] }
The first 7 lines are block header, and the rest is transaction data.
1:52あたりのスライドでは"mrkl_root"はheaderに含まれてなかったけど、他のソースによると含まれてるっぽい?
Among those header properties, "hash", "bits", and "nonce" are mining puzzle related information.
Header is the only thing that's hashed during mining.
"mrkl_root"はheaderに含まれているのでhash化される。すると、treeの中の値を一つでも変えるとblock全体のhashが変わるので、改竄できなくなる。
n_txはnumber of transactionsだと思うけど、じゃあsizeって何?
→実際の例を見たら1.496KBとか書いてあったから、データとしてのサイズかな。
Merkle treeのrootへの参照だけ持っていればいいんだと思ってたけど、"tx"とか"mrkl_tree"は何なの?
Coinbase transaction
A special transaction in the Merkle tree, which creates a new coin.
"in": [ { "prev_out": { "hash": "0000...0000", "n": 42429696 }, "coinbase": "..." } ], "out": [ { "value": "25.033...", "scriptPubKey": "OP_DUP OP_HASH 160" } ]
動画でのスライドの括弧の対応とかインデントとかなんかおかしかったから適当に直したけど、もしかしたら間違ってるかも。
"value" = block reward + sum of transaction fees
"prev_out"["hash"]: Null hash pointer, hash of all zeros
"prev_out"["n"]: arbitrary
"coinbase": completely arbitrary
You can see the real block and transaction data at blockchain.info.
3.5. The Bitcoin Network
Ad-hoc protocol (runs on TCP port 8333)
Ad-hoc network with random topology
All nodes are equal
New nodes can join at any time
Forget non-responding nodes after 3 hours
How to join the Bitcoin P2P network?
Send message "getaddr()" to one node that you know (the seed node).
Then the seed node let you know the address of the nodes that the seed node is connected to.
Send the same message to the nodes that you got to know in process 1.
Repeat this procedure as many times as you like and randomly select the nodes to connect to.
How to publish a transaction?
-> Use flooding algorithm.
A node tells about the transactions to its neighbors.
The neighbors store the transaction to their pending transaction pool, and then tells about the transaction to their neighbors.
When a node has already heard about the transaction and gets notified about the same transaction, it simply doesn't propagate the information any more.
Conditions under which you should propagate the proposed transaction
Valid with current block chain
Script is whitelisted (by default)
Avoid unusual scripts
Haven't seen before
Avoid infinite loops
Doesn't conflict with other transactions you've relayed
Avoid double-spends
They are just sanity checks; some nodes may ignore these conditions.
Nodes may differ on transaction pool because of the network delay.
That's fine because they've not yet published as a block.
Race condition
Transactions or blocks may conflict.
Default behavior: accept what you hear first
But if the conflicting transaction made it to the block, the node just forget about the first transaction it heard about.
Network position matters
Miners may implement other logic
Block propagation is nearly identical
Relay a new block when you hear about it if:
Block meets the hash target (the nonce meets the standard)
Block has all valid transactions
Run all scripts even if you wouldn't relay
Block builds on current longest chain
Avoid forks
They are also just sanity checks, too.
This protocol is not so efficient, but Bitcoin prioritizes the simplicity and decentralization.
How big is the network?
Impossible to measure exactly
Estimates: up to 1M IP addresses/month
Only about 5-10k "full nodes", the permanently connected and fully-validating nodes
This number may be dropping
Fully-validating node
Permanently connected
Stores entire block chain
Hear and forward every transaction
Tracks the full UTXO set
Thin/SPV clients (not fully-validating)
Store block headers only -> cannot check for the transactions' validity
Request transactions as needed to verify incoming payment
Trust fully-validating nodes
3.6. Limitations & Improvements
Hard-coded limits in Bitcoin
10 minutes average to create a block
1 MB in a block
20,000 signature operations per block
100 M satoshis per BTC
21M total BTC maximum
50, 25, 12.5... Bitcoin mining reward
The maximum amount of the coin and the reward structure cannot be changed because they affect the economic balance too much.
Throughput limits in Bitcoin
1 MB / block (10 min)
more than 250 bytes / transaction
7 transactions / sec
Inefficient compared to VISA or PayPal
Cryptographic limits in Bitcoin
Only 1 signature algorithm (ECDSA/P256)
Hard-coded hash functions
Crypto primitives might break by 2040
"Hard-forking" changes to Bitcoin
It's impossible to assume that every node will upgrade.
-> Nodes that still have old version will reject valid transactions or blocks sent from nodes with the new version of the software.
PROBLEM: Old nodes will never catch up
Soft forking
Observation: We can add new features which only limit the set of valid transactions. We can just make the validation rules stricter.
Need majority of nodes to enforce new rules
Old nodes will approve the blocks proposed by new nodes.
RISK: Old nodes might mine now-invalid blocks
Example of soft fork: Pay to script hash
Old nodes will just approve the hash without running the embedded script.
Soft fork possibilities
New signature schemes
Extra per-block metadata
- Restrict the content of coinbase parameter to, for example, the root of the UTXO tree
4.1. How to Store and Use Bitcoins
To spend a Bitcoin, you need to know:
some information from the public blockchain: don't need to worry about how to store it because you can always get it
the owner's (your) secret signing key
So, it's all about key management.
Goals
availability: you can spend your coins
security: nobody else can spend your coins
convenience: relatively easy to use
Simplest approach: store key in a file on your computer or phone
as available as your device
as secure as your device
very convenient
Wallet software
Keeps track of your keys and coins, provides nice UI.
Nice trick: use a separate address/key for each coin
benefits privacy (looks like separate owners)
wallet can do the bookkeeping, so users don't need to know
Encoding addresses
Encode as text string: base 58 notation or use QR code
4.2. Hot and Cold Storage
Hot storage (online): convenient but risky
Cold storage (offline): not as convenient but safer
Store separate keys in each of these storage.
To transform money to each other, each storage needs to know the address of the other.
Even though the cold storage is offline, hot storage can send money to it.
Problem: want to use a new address (and key) for each coin sent to cold storage. How can hot wallet learn new addresses if cold wallet is offline?
Solution: Hierarchical key generation
Important properties:
ith address and ith key correspond to each other.
address gen info doesn't leak keys -> address gen info can be public
Some digital signature schemes cannot support hierarchical key generation, but the one Bitcoin uses does.
At the beginning, cold side does generateKeysHier, and pass the address gen info to the hot side.
Hot sideがどんなiを使ったかってどうやったらcold sideが知れるの?
How to store cold info
Info stored in device, device locked in a safe
"brain wallet": encrypt info under passphrase that user remembers
paper wallet: print info on paper, lock up the paper
in "tamperproof" device: device will sign things for you, but won't divulge keys
4.3. Splitting and Sharing Keys
Secret sharing
Split secret into N pieces, such that:
given any K pieces, one can reconstruct the secret key
given fewer than K pieces, one cannot learn anything
Example: N = 2, K = 2
P = a large prime number
S = secret in [0, P - 1]
R = random in [0, P - 1]
X1 = (S + R) mod P
X2 = (S + 2R) mod P
reconstruction: S = (2X1 - X2) mod P (because S < P)
As k becomes 3, 4, ... you can just increase the number of random parameters R1, R2... and make the function quadratic, cubic, ... and so on.
Good:
Store share,es separately, adversary must compromise several shares to get the key.
You can lose (N - K) shares.
Bad: To sign, one needs to bring shares together and reconstruct the key -> vulnerable
Multi-sig
Lets you keep shares apart, approve transaction without reconstructing key at any point.
If there are 4 people, each of them generates a key-pair and puts the secret key in a secure and offline place, and if they stipulate that three of the four keys are needed to release their coins, then the coins will be safer.
じゃあ結局S+Rとかのやつは使わんってこと?めっちゃ賢いのに何かもったいないね。
4.4. Online Wallets and Exchanges
Online wallet
like a local wallet but "in the cloud"
runs in your browser, sends code to you and stores keys. You log in to access the wallet
Good: convenient. Nothing to install, works on multiple devices
Bad: vulnerable if the site is malicious or compromised
Bank-like services
You give the bank money (a "deposit")
The bank promises to pay you back later, on demand
The bank doesn't actually keep your money - they invest the money
Bitcoin exchanges
accept deposits of Bitcoins and fiat currency, promise to pay back on demand
lets customers:
make and receive Bitcoin payments
buy/sell Bitcoins for fiat currency
typically, match up BTC buyer with BTC seller
What happens when you buy BTC
No BTC transaction appears on the blockchain
The effect is only that Exchange is now making a different promise (the kind of currency your account holds has changed)
Good:
connects BTC economy to fiat currency and economy
easy to transfer value back and forth
Bad: the same kind of risks as banks, cyber attack
For traditional banks, government typically
imposes minimum reserve requirements
regulates behavior and investments
insures depositors against losses
acts as lender of last resort
But for Bitcoin banks and exchange services, these regulations do not apply.
Proof of Reserve
Bitcoin exchange can prove it has fractional (or 100%) reserve.
Prove how much reserve you're holding
Publish valid payment-to-self of that amount. Sign a challenge string (generated by some impartial party) with the same private key -> Proves that someone who knows the private key is participating in this proof of reserve
Prove how much demand deposit you hold
In Merkle tree, each hashpointer includes total value in its subtree.
Construct the tree and sign the root pointer.
If a customer demands to prove that he is included in the tree, just show log n items from the root to the customer's account. He can only check if the sum is correctly added.
Reserve fraction >= how much reserve you're holding / how much demand deposit you hold
4.5. Payment Services
Scenario: merchant accepts BTC
customer wants: to pay with BItcoin
merchant wants: to receive dollars, simple deployment, and low risk
この辺の話には興味がなかったのであんまりノート取ってない。
customer, merchant: do what they wanted to do
payment service: gets Bitcoins, pays dollars, absorbs risks related to security and exchange rate
4.6. Transaction Fees
transaction fee = value of inputs - value of outputs
fee goes to miner who makes the block that contains this transaction
How are transaction fees set today?
Transaction fee exists to compensate for (some of) the cost for
peers to relay the transaction
miner to record the transaction in a block
Current consensus fees:
No fee if all these conditions apply:
tx is less than 1000 bytes
all outputs are 0.01 BTC or larger
priority is large enough
priority = (sum of (inputAge * inputValue)) / txSize
Otherwise: 0.0001 BTC per 1000 bytes
Most miners enforce the consensus fee structure.
Transaction without consensus fee will take longer to be recorded.
Miners prioritize transactions based on fees and the priority formula.
4.7. Currency Exchange Markets
Supply of Bitcoins
supply = coins in circulation (+ demand deposits?)
coins in circulation: fixed number
When to include demand deposits?
-> When they can actually be sold in the market
Supplyとdemandのあたりあまりよくわからなかった。経済がわかってなさすぎる。興味が出たらちゃんと考える。
Programming Assignment: Consensus from Trust
今週の講義内容とあんまり関係なくない?笑
疑問点
Do malicious nodes make up fake transactions?
→"Your CompliantNode code can assume that all Transactions it sees are valid"らしい!
How should I use the four arguments passed to CompliantNode constructor? Individual nodes are not supposed to know the structure of the entire network...???
→According to a mentor's post in the discussion forum, these parameters can be used to construct our own malicious nodes for testing.
ここ見て、ああああなるほど!!!ってなって"handshaking" transaction実装しようとしたけど、Simulation.java見たらvalidTxIdsに入ってるidを持つTransactionしか伝達されてないので詰んでいた。でも、どっちにしてもmaliciousだってそのfolloweeから送られてきたやつをそのまま伝達する(ことがある)んだから、maliciousからhandshaking transactionも送られて来るんじゃない?
結局、何も送ってこないfolloweeをガン無視するって方法で90%まで行ったので、そこで諦めることにした。
これの逆バージョンで、malicious nodeをデザインしてどれだけネットワークを攪乱させられるかみたいなのあったら楽しそう。
5.1. The Task of Bitcoin Miners
Bitcoin miners
store and broadcast the blockchain
validate new transactions
vote (by hash power) on consensus
Join the network, listen for transactions
validate all proposed transactions
Listen for new blocks, maintain blockchain
when a new block is proposed, validate it
Assemble a new valid block
Find the nonce to make your block valid
Hope everybody accepts your new block
Get profit
How to look for nonce?
Try every single possible value for both the block's nonce and the coinbase parameter
When the coinbase parameter changes, the change in hash propagate all the way up to the entire block's hash.
Mining difficulty (the difficulty to find nonce) is reset every two weeks.
next_difficulty = previous_difficulty * (2 weeks) / (time to mine last 2016 blocks)
2016 is the ideal number of blocks mined in 2 weeks because a block should be mined every 10 minutes. (14 * 24 * 60 / 10 = 2016)
5.2. Mining Hardware
SHA-256
general purpose hash function
part of SHA-2 family
Remains unbroken cryptographically (weaknesses known)
SHA-3 (replacement) under standardization
256 bits -> split into 32 bits * 8
Then applied some tweaks and rotated.
SHA-256のdiagramはノートに取ってない。
Actually, SHA-256 is applied twice to get the hash of the block.
Nonceを見つけるためにどれくらいのマシンパワーが必要かとかいう話。マイニングに興味はないのでノート取ってない。Bitcoin miningがどれくらい熾烈な争いになってるかにびっくりした。
"Goodput" = throughput * success rate
5.3. Energy Consumption & Ecology
この人Bitcoinの話してたかと思ったらハードウェアに移って、更に熱力学みたいな話まで始めたけど…専門の幅がすごすぎる。あんまり話聞いてなかった。
5.4. Mining Pools
Mutual insurance of miners
How to prove how much work you did as a miner?
Show "near-valid blocks" (The hash is relatively small, but not enough to be valid) that you mined to the pool manager
Pros
Make mining more predictable
Allow small miners to participate
More miners use updated validation software
Cons
Lead to centralization
Discourage miners from running fully-validating nodes
5.5. Mining Incentives and Strategies
Game-theoretic analysis of mining
Which transactions to include in a block
Default: any above minimum transaction fee
Which block to mine on top of
Default: longest valid chain
How to choose between colliding blocks with the same length of chain
Default: first block heard
When to announce new blocks
Default: immediately after finding them
Can you profit from a non-default strategy?
assuming that you control 0 < α < 1 of mining power
-> For some α, yes. (Analysis is still ongoing)
Forking attacks
Certainly possible if α > 0.5 or via bribery
Attack is detectable
Might be reversed
Might crash exchange rate because people no longer believes in Bitcoin
Checkpointing
Avoid deep forkings
Block-withholding attacks or "selfish mining"
Don't announce blocks right away. Try to get ahead!
Find, for example, two blocks ahead of the currently longest blockchain. Keep them secret.
The rest of the network finds one block.
Release your secret blocks. The block found in 2 immediately becomes an orphan block.
Even if you've found only one block, if you win the race, the block goes in to the accepted blockchain.
How to win the race?
Try to peer with as many nodes as possible
Bribery
Punitive forking
Blacklist transactions from address X = freeze X's money forever
Announce that you will refuse to mine on any chain with a transaction from X
With α < 0.5, you'll soon fall behind
Feather-forking
To blacklist transactions from X, announce that you will refuse to mine directly on any block with a transaction from X. But you'll concede after n confirming blocks
-> You can cancel the block containing transactions from X with probability α2
-> For other miners, including a transaction from X induces an α2 chance of losing a block -> Might be safer to join in on the blacklist
6.1. Anonymity Basics
Bitcoin addresses are public key hashes (that act kind of like pseudo-identities) rather than real identities = pseudonymity
Anonymity = pseudonymity + unlinkability
Unlinkability: Different interactions of the same user with the system should not be linkable to each other
Why is unlinkability needed?
Many Bitcoin services require real identity
If you buy something with Bitcoin at a real store, the stuff will know your physical identity
Linked profiles can be deanonymized by a variety of side channels
Example: One might be able to connect your address to your Twitter account by the similarity between when you make Bitcoin transactions and when you post tweets
-> Pseudonymity is quite fragile, so unlinkability is needed
Defining unlinkability in Bitcoin
Hard to link different addresses of the same user
Hard to link different transactions of the same user
Hard to link sender of a payment to its recipient
Indirect payment that goes through a bunch of transactions does not easily reveal the initial sender and the ultimate recipient
Complete unlinkability (among all addresses/transactions) is hard
Quantifying anonymity
Try to maximize the size of anonymity set
Anonymity set: the crowd that one attempts to blend into
To calculate anonymity set:
Define adversary model
Reason carefully about: what the adversary knows, does not know, and cannot know
Why we need anonymous cryptocurrencies?
Blockchain-based currencies are totally, publicly, and permanently traceable
-> Without anonymity, privacy is much worse than traditional banking
Money laundering
Bottleneck: moving large flows into and out of Bitcoin ("cashing out")
Similar dilemma: Tor
Anonymous communication network
Sender and receiver of message unlinkable
Used by many kinds of people including:
Journalists -> should be protected from government pressure
People who spread malware -> should be identified
Using blind signature, traditional bank can let users withdraw and deposit their money without being able to identify the user's connection.
Anonymity & decentralization: in conflict
Interactive protocols with bank are hard to decentralize
Decentralization often achieved via public traceability to enforce security
6.2. How to de-anonymize Bitcoin
Bitcoin's best practice: always receive payment at fresh address
Shared spending is evidence of joint control -> leads to linking addresses
Also, change address can be connected to the payer's address though people cannot see which address in the output is the change address
"Idioms of use"
Idiosyncratic features of wallet software
e.g. each address is used only once as a change address
Shared spending + idioms of use
You can draw a diagram of a user's addresses and transactions and guess the correspondence between address and service provider.
Instead of just guessing, you can also make transaction with the service and see how the address you interact with is connected to other addresses.
From services to users
High centralization in service providers
Most flows pass through one of these in a traceable way
Address: identity links in forums
Users sometimes post their addresses online to get donation
Network-layer de-anonymization
"The first node to inform you of a transaction is probably the source of it"
-> You can identify the IP address of the source node
Solution: use Tor (not specifically for Bitcoin, but robust and functioning) or Mix nets
6.3. Mixing
To protect anonymity, use an intermediary
By just looking at the blockchain, one cannot identify who deposited specific coin
There exist dedicated mixing services
Promise not to keep records
Don't ask for your identity
Wallet services requires your identity to connect you to your account. But if you trust these services, it's good to use them for mixing because they have bigger anonymity set.
Principles for mixing services
Use a series of mixes
If at least one mixing service deletes records, your transaction becomes untraceable
Uniform transactions (make the transactions have the same value)
Client side must be automated
Fees must be all-or-nothing
Remaining problem: trusting mixes
Stay in business and build up reputation
Users can test for themselves
Cryptographic "warranties"
最後の解決法がよくわからんかった。
6.4. Decentralized Mixing
Pros:
No bootstrapping problem
Theft impossible
Possibly better anonymity
More philosophically aligned with Bitcoin
Coinjoin
Find peers who want to mix
Exchange input/output addresses
One of the member construct the transaction
Send it around and collect signatures
Before signing, each peer checks if his output is present
Broadcast the transaction
Problems:
How to find peers
-> Solution: Use an untrusted server
Peers (or at least one of them) know your input-output mapping
-> Solution: special-purpose anonymous routing mechanism
Betrayal
After learning the other's address and signature, one can transfer the coin to his own address
-> Proposed solutions: proof of work, proof of burn, server kicks out malicious participant, cryptographic "blame" protocol
3つ目の問題があるなら、theft不可能ってわけでもなくない?
High-level flows could be identified
For example, if you receive specific amount of coin every week and immediately transfer specific portion to your other account, the pattern in the amount of money and time is identifiable
Heuristic: merge avoidance
Instead of a single payment transaction,
receiver provides multiple output addresses
sender avoids combining different inputs
6.5. Zerocoin and Zerocash
Zerocoin
protocol-level mixing (Mixing capability baked into protocol)
Advantage: cryptographic guarantee of mixing -> You don't need to trust others to achieve anonymity
Disadvantage: not currently compatible with Bitcoin
Basecoin: Bitcoin-like Altcoin
Zerocoin: Extension of Basecoin
Basecoins can be converted into zerocoins and back
-> Breaks line between original and new Basecoin
Zerocoins: cryptographic proof that you owned a Basecoin and made it unspendable
Miners can verify these proofs
Gives you the right to redeem a new Basecoin and exchange it for Zerocoin
Challenges
How to construct these proofs?
How to make sure each proof can only be "spent" once?
Zero-knowledge proofs: a way to prove a statement without revealing any other information Example:
"I know an input that hashes to xxxx"
"I know an input that hashes to some hash in the following set ..."
-> Others can't learn what the input is
Zerocoin comes with a standardized denomination
-> Any zerocoin in the block chain has the same value
Minting zerocoins
Generate serial number S (eventually made public)
Generate random secret r (never becomes public, ensures unlinkability)
Compute H(S, r) = cryptographic "commitment"
Create a transaction which have base coin as input and the computed H(S, r) as the output address
-> The basecoin in the input becomes unspendable
Spending zercoins
Reveal S
-> Miners will verify S hasn't been spent before
Create zero-knowledge proof that "I know a number r such that H(S, r) is one of the zerocoins in the block chain"
-> This gives you the right to redeem basecoin
Pick arbitrary zerocoin in block chain (to ensure anonymity) and use it as input to your new transaction
Zerocoin is anonymous because people cannot learn which zerocoin you put in the block chain as long as you keep your random number r secret.
Zerocash
Zerocoin without Basecoin
Differences
Different (and more efficient) crypto for proofs
Proposal to run system without Basecoin
All transactions are zerocoins
Supports splitting and merging
Puts transaction value inside the "commitments"
Ledger merely records existence of transactions
Random, secret inputs are required generate public parameters. These secret inputs must then be securely destroyed. No one can know them (anyone who does can break the system).
System | Type | Anonymity attacks | Deployability |
---|---|---|---|
Bitcoin | Pseudonymous | Tx graph analysis | Default |
Single mix | Mix | Tx graph analysis, bad mix | Usable today |
Mix chain | Mix | Side channels, bad mixes/peers | Bitcoin-compatible |
Zerocoin | Cryptographic mix | Side channels (possibly) | Altcoin |
Zerocash | Untraceable | None | Altcoin, tricky setup |
6.6. Tor and the Silk Road
In Tor, a sender picks three Tor nodes at random to route her message. If at least one of them is honest node, it's safe(ish).
Key challenge: hiding routing information
Individual router shouldn't learn the destination's IP address
Solution: layered encryption
What is encrypted in each layer is the destination of the message at that layer.
Hidden services
What if the server wants to hide its address?
Connect to "rendezvous point" through Tor
Publish {domain name -> rendezvous} point mapping
Domain name is not the default DNS address; the domain name used here is called "onion address"
Client connects to rendezvous point
Silk Road
Communication: Tor hidden service
Payment: Bitcoin
Security?
Anonymous shipping?
7. Community, Politics, and Regulation
ここからは、後で復習することはあまりなさそうなので、ノート取らない。でも勉強になるので聞くだけ聞いた。
Consensus about rules: depends on developers
Consensus about history: depends on miners
Consensus that coins are valuable: depends on investors, merchants, and customers
Satoshi Nakamotoさんがミステリアスなのは何となく知ってたけど、addressがわかってるから何かtransaction起こしたらブロックチェーンに公開されるっていうの聞いて「ロマンがある!!!!」ってなった。
7.7の2つ目のクイズの解答がめっちゃ謎。どうしてそうなったん?? →Mentorさんが"I thought the answers to the second question on market failure were exactly opposite to the ones given."って言ってはるからやっぱり何かの間違いかな。
Programming Assignment: Blockchain
疑問点
heightとは??そのblockがgenesisからどれくらい伸びたところにあるかってこと?
→たぶん。heightは相対的にしか使われないから、genesisを0としても1000としても何でもいい。
UTXOPoolとTransactionPoolは、forkあたり一つずつなのか、blockchainあたり一つずつなのか?
→どっちもinstructionに書いてあった笑
TransactionPool: "Maintain only one global Transaction Pool for the block chain and ..."
UTXOPool: "You have to maintain a UTXO pool corresponding to every block on top of which a new block might be created."
なぜかgenesisBlockは空っぽだろうと信じてたのでtransactionのvalidationがうまくいかなかった。genesisBlockのcoinbaseを考慮したらちゃんとできた。