VB & Lamport’s 99% Fault Tolerant Consensus

Leona Hioki (日置 玲於奈 )
6 min readSep 11, 2018

Lamport’s Paper

There’s famous “Byzantine General Problem”
Generals have to have a consensus “attack” or “retreat”,or nodes may die. There can be liars among them.

“””THEOREM 1. For any m, Algorithm OM (m ) satisfies conditions IC1 and IC2 if there are more than 3m generals and at most m traitors.”””

OM means the consensus algorithm which doesn’t use digital signature.
And this means,a consensus without signatures can accomplish 33% Fault Tolerant at most.
Simple explanation is this picture below.
These nodes have to have a consensus “attack” or “retreat”,or let one node die.

We can turn the problem of “a consensus among 3 nodes” to the problem of “commander and 2 lieutenant” consensus. They are same because the problem is whether or not they lie, not nodes’ choices . They can’t have a consensus if any liar. And this implies that 1/3 are liars, network cannot have a consensus without signatures.

[NOTICE] “attack:0:1” means that “attack” message which was signed by node0 and node1.

“””THEOREM 4. For any m and d, if there are at most m traitors and the subgraph of loyal generals has diameter d,
then Algorithm SM(m + d -1) (with the above modification) solves the Byzantine Generals Problem.
”””

Why are we using blockchain?? The answer is simple. This consensus is totally high cost and blockchain is relatively low cost. Most remarkable feature is that message(tx?) is signed by all relay nodes,which obviously make both of traceability and stress. If we use this in a network of 30,000 nodes , it needs to wait 10,000 times as long time as that of 3 nodes network, and also needs exponential amount of signed messages which is said to be uncountable.This has bad scalability, simply because as well, this is a synchronous system, and blockchain is not.(I will explain this with CAP Theorem at last of this page)

First of all,at the last of Lamport’s algorithm, a node should finalize the “last message” and make sure no more message.

(3) For each i: When Lieutenant i will receive no more messages, he obeys the order choice( Vi).

He proposed introducing time-out to finalize it.
Then pls take a look at VB’s plan. We see they can have a finalized consensus {x,y,z} on time-clock T+2D

Please make sure you understand that this consensus is among all nodes including dishonest nodes. The consensus “{x,y,w}” is including “w” which byzantine node sent.
This is necessary because dishonest byzantine node is not always malicious,they can be just out of order. Not every node is perfect.

BUT THIS MADE A PROBLEM!(comical)
In Vitalik’s page, this pattern is referred as a problem.

Suppose that a commander and some subset of k (malicious) validators produce a message v : i[1] : .... : i[k], and broadcast it directly to some “victims” just before time T + k * D. The victims see the message as being “on time”, but when they rebroadcast it, it only reaches all honest consensus-participating nodes after T + k * D, and so all honest consensus-participating nodes reject it.

So Observer1 cannot sync the consensus even if he is a good node.
This seems to be difficult to understand why this is problem,because “Colluding nodes” can’t earn profits by this operation.
But the problem is that it’s easy to disturb the network with no risk & no cost.

VB goes on

But we can plug this hole. We require D to be a bound on two times network latency plus clock disparity. We then put a different timeout on observers: an observer accepts v : i[1] : .... : i[k]before time T + (k - 0.5) * D. Now, suppose an observer sees a message an accepts it. They will be able to broadcast it to an honest node before time T + k * D, and the honest node will issue the message with their signature attached, which will reach all other observers before time T + (k + 0.5) * D, the timeout for messages with k+1 signatures.

This can make a consensus.
Message v : i[1] : .... : i[k] still cannot reach the Observer2 directly because of the time-out, but Observer2 can get it with one more sign if any honest node has this before the time-out.

To make such a condition,
(1) Observers do not sign,and just relays broadcast they accepted, D/2 earlier.

[NOTICE] in the pic above,Observer1 sends v : i[1] : .... : i[k] without his signature. Validators (including Honest Nodes) sign.

(2)Time-clock(time-out) should be divided to 2 pattern, for the sake of enabling the two roles of broadcasting ,no signing Observers and signing Nodes.

Then it seems that observers can sync each other ,and nodes can get information from observers like that way.
In principle, Fault Tolerant rate can reach near 100% when a network is big enough. If m is number of liar nodes,

Algorithm SM(m + d -1) solves the Byzantine Generals Problem.

Network consensus will get similar to this when there are many validators and latency is long enough that broadcast reaches every node.

CAP Theorem

https://towardsdatascience.com/cap-theorem-and-distributed-database-management-systems-5c2be977950e

There’s a limitation called CAP Theorem. These 3 below can’t coexist.
(1) Consistency: Every read receives the most recent write or an error

(2)Availability: Every request receives a (non-error) response — without guarantee that it contains the most recent write

(3)Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

The fact that blockchain lacks of Cosistency(1) , is against our images of BC or Ethereum,called “common database” or “world computer”. But obviously P2P network doesn’t guarantee the newest data, you know.

But why is blockchain though to be useful and said to be a breakthrough?
Because lack of Consistency(1) is easily avoid in actual usage, in which people don’t demand the very newest data at second level. People compromise when this has consistency in 30 minutes and responds at most 30 minutes old data.
This called “Eventual Consistency”.

[SPECIAL NOTE] Lightening Network ,Plasma, and all of other off-chain scaling are chasing “Eventual Consistency” , pushing down apparent consistency(2) and get free from CAP limitation.

“99%” has strong consistency which can sacrifice Partition tolerance and maybe unable to endure network splits, as VB admits some cases are considerable.

>>>The 99% tolerance will not work simply because in the case of a network split it will lead to 100 different baby networks — very bad!

VB: Agree, hence why I described the hybrid mechanism, which can survive either a network split or >33% malicious, as long as both do not happen at the same time.

In my imagination, this seems to be cut out for limited scale consensuses like PoS, DPoS or PoA. I don’t know. I’ll keep on eye on posts.

--

--