Randomized leader election using VRF

Randomized leader election using VRF

Author : Seonpyo Kim<spkim@kodebox.io>

Created: 2019-08-20

1. Introduction

Tendermint rotates through the validator set, i.e. block proposers, in a weighted round-robin fashion. Leaders can be subject to targeted denial-of-service attacks because attackers can predict exactly who will be the next proposer and direct all their bots to simultaneously attack that proposer.

Bitcoin is not susceptible to such attacks because it is impossible to predict the next block proposer before the block is propagated to the network. Only the node that solved the proof-of-work problem knows that it can propose a block and other nodes in the network can confirm that the proposed block is from a valid block proposer by verifying the nonce.

To mitigate the possible denial-service-attacks on Tendermint, we need to introduce randomness into its leader election mechanism. In this proposal, we sketch how to introduce a verifiable random function (VRF) to assign leaders to round numbers in a less predictable way.

2. Background

2.1. Verifiable random function (VRF)

In cryptography, a verifiable random function is a pseudo-random function that provides publicly verifiable proofs of its outputs’ correctness[2]. The operating principle of VRFs is as follows:

Figure1. VRF

VRFs guarantee collision resistance and pseudorandomness. Moreover, they give a unique output random hash for a given message, unlike DSAs(digital signature algorithm). Using VRFs, arbitrary values can be calculated privately by any user and can be verified publicly. Furthermore, users cannot brute force their way to get a more desirable random hash value because the output random hash is unique for the given message.

2.2. Algorand committee selection

Algorand’s cryptographic sortition aims to constitute a committee randomly, but the result of an election can be discovered privately[1]. Algorand implements cryptographic sortition as shown in the following figure:

Sortition requires a role parameter that allows distinction amongst different roles. In Algorand’s consensus, there exists the block proposer and the fork proposer. 𝜏 specifies the expected number of users selected for a certain rule. A calculated value j determines the number of elections of a user. Additionally, w refers to the weight of a user and W refers to the total weight of all users.

The sortition algorithm of Algorand is analogous to biased coin flipping with hitting probability p, namely “Bernoulli trial”. It treats each stake token as an independent coin. Thus, the total voting power is equal to the number of coins that hit success. The probability p is set by the expected number of committee over the number of candidates.

The number of hits is determined not by multiple independent trials but by one hash value calculated with the VRF function(sk, seed). To treat one VRF function result as an aggregated value of multiple independent trials, the VRF should guarantee that the output of the function is a uniform random value. More specifically, the region in which (output_hash) / 2^(hashlen) exists determines the voting power k, where the output_hash refers to the output of the VRF function and the hashlen refers to the length of VRF outputs.

Figure 2: The sketch of B(k; 2, ¼)

3. Design

3.1. Leader election

The leader election process follows Algorand’s sortition algorithm. Let’s assume each validator has equal weight. If one naively sets the expected committee number(𝜏) as 1, then p becomes 1/30. In this situation, the probability of how many block proposers are elected is as follows:

# proposer 0 1 2 3 4 5 6 7
probability 0.362 0.374 0.187 0.060 0.014 0.002 0.0003 0.00004

Though the expected number was set to 1, the chance of there being no proposers elected is as high as 36%. This is the most undesirable case since block finalization does not occur in that round. When 𝜏 is greater than 1, it is possible to secure at least one block proposer with high probability.

For 𝜏 = 6 when the total weight is 30, the chance of “no proposer” is reduced to 0.12%.

# proposer 0 1 2 3 4 5 6 7
probability 0.0012 0.0093 0.0337 0.0785 0.1325 0.1723 0.1795 0.1538
# proposer 8 9 10 11 12 13 14
probability 0.1106 0.0676 0.0355 0.0161 0.0064 0.0022 0.0007

If there are multiple proposers, they independently propose blocks and only the proposal with the highest priority is accepted. Proposal priority will be discussed in the next section.

Until now we have assumed that each node has equal weight, but it is improper because each has different voting powers. In this case, splitting stakes into multiple nodes and managing all of them are more advantageous than managing a single node that has a lot of stakes. To eliminate the chances of Sybil attacks, the weight of a node must be proportional to the amount of stake delegated to the node.

If the total weight is large enough, it is possible to recalculate the proper 𝜏 value by using the “Poisson distribution” approximation [A].

3.2. Proposal priority

One validator can be elected as a block proposer multiple times in one round with stake-proportional weight. However, even if elected multiple times, it is nonsensical to propose multiple blocks in a single round. Here’s where the priority comes into play.

The priority for a block proposal is the measure of the power of that proposal. Block proposals need to be prioritized, and proposals with more power end up gaining more priority. In a single Tendermint round, there should be only one valid proposal, which would be the proposal with the highest priority.

The priority for the block proposal is obtained by calculating Hash(VRF output_hash || sub-user index). To become higher in priority, a validator can try as many times as the number of elections. In this process, the validator iterates the sub-user index from zero to the number of election - 1. Other validators can verify the priority by using the published VRF output hash. The highest priority of all the block proposer’s selected sub-users is the priority of the height.

3.3. Gossip

Algorand gossips two different kinds of messages in the proposal step. One message only contains the priorities and proofs of the chosen block proposers, and the other message contains the entire block, the proposer’s sortition hash, and the proof. The purpose of this distinction is to quickly converge to the global highest priority proposer. These messages enable most users to learn who is the highest priority proposer, and thus quickly discard other proposed blocks.

We apply the same technique to Tendermint because fast convergence to the highest priority block will decrease the average block finalization time. Validators first propagate priorities and proofs through push gossip. Each validator discards priorities that have lower value than the proposal with the highest priority that it has seen. After a sufficient amount of time, most users will know who is the highest priority proposer. Then, these users can get the block proposal through a pull gossip to the highest priority proposer. This minimizes unnecessary message transfers in the network.

3.4. Verification

To introduce this sortition process to CodeChain, it needs two additional verification steps. One is a VRF output verification with a known public key of a certain validator. The other is proposal priority verification using the VRF output and the sub-user index validation (sub-user index < number of election).

For a single node, verifications take at least 5 times, when 𝜏 = 6. There is a trade-off between fast block finalization and correct block finalization. To vote in the global highest priority block, Tendermint module needs a looser timeout to wait and observe the actual highest priority block. However, this would make the ‘propose’ step take more time than before and force all nodes to wait full timeouts when there are no proposers. Therefore, the total confirmation latency would increase.

3.5. VRF seed

CodeChain blocks must contain the seed values. CodeChain’s leader election scheme needs the VRF seed to remain unpredictable until the current round. Hence, the seed should be dependent on parent_block_seed. Moreover, every Tendermint round requires a unique seed to determine block proposers, so the seed must be dependent on the Tendermint round value(i.e. height and view). Algorand’s idea is quite complicated because Algorand’s consensus algorithm “BA*” allows fork choices. However, Tendermint does not allow forks in its consensus, making the seed scheme simpler. The simplest form is:

current_block_seed = VRF_sk(parent_block_seed || height || view).

Attackers may want to control current_block_seed to have an advantage in the next sortition; however, one cannot be certain that they will have the highest priority in a round even though the hash value for the priority is large enough. Since nobody can predict other validators’ VRF results, priorities are also unpredictable. Malicious users may try their best to delay consensus until a certain “view” comes, which makes their VRF output in sortition bigger. However, controlling the “view” is more difficult than controlling current_block_seed because it requires a more complete control over the consensus process. The “view” value can get higher only when more than 2⁄3 of the validators in the committee fail to agree on a single block. Therefore, VRF_sk(parent_block_seed || height || view) is reasonable as a seed value of each round.

4. Safety and Liveness Analysis

Occasionally, there will be no block proposers elected in the sortition. Fortunately, it does not harm the liveness property of the Tendermint consensus since “no proposer” leads all honest nodes to wait until timeout and proceed to the next round. Malicious users could not pretend to be elected proposers because they can’t manipulate the VRF result, ultimately leading their proposal priority verification to fail.

References

[1] Algorand whitepaper

[2] Micali, Silvio; Rabin, Michael O.; Vadhan, Salil P. (1999). “Verifiable random functions”. Proceedings of the 40th IEEE Symposium on Foundations of Computer Science. pp. 120–130.

[3] CodeChain Techtalk slides

Appendix

[A] Poisson distribution approximation. The probability of U coin hits out of total K coins, where the expected number of hits is set to 𝜏, is

and it’s equal to

For relatively very large U , and fixed K

And

So the probability of sampling exactly K sub-users is :

which is poisson distribution.

2 Likes

This is the VRF function benchmark test result for secp256k1 curve which is now used in CodeChain.

CPU: i7-8750H
OS: WSL(Windows Subsystem for Linux) in Windows10

 Running target/release/deps/secp256k1_sha256_svdw-94715cb1757c4836

running 3 tests
test bench_prove              ... bench:   1,715,070 ns/iter (+/- 19,615)
test bench_prove_rand_message ... bench:   1,733,770 ns/iter (+/- 25,553)
test bench_verify             ... bench:   2,070,650 ns/iter (+/- 26,377)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

     Running target/release/deps/secp256k1_sha256_tai-7687d83499a2d23c

running 3 tests
test bench_prove              ... bench:   1,665,140 ns/iter (+/- 68,732)
test bench_prove_rand_message ... bench:   1,753,590 ns/iter (+/- 59,381)
test bench_verify             ... bench:   2,014,280 ns/iter (+/- 54,412)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

Hi, I have a suggestion about the election rule. How about choosing a candidate as a proposer if it has the largest (Proposer power) among all candidates? (Proposer power) is defined below. This method prevent the election process from ending in a tie, as the probability that two candidates have the same (proposer power), is 0.

(Proposer power) = X ^ {1 / V}

or,

(Proposer power) = \frac{\log X} { V }

X and V are a random number drawn from uniform distribution U(0, 1), and the voting power of the proposer, respectively.

We can show that if there are n candidates, the probability of candidate i being elected is

P(\mbox{candidate i is elected}) = \frac{V_i} { (\sum_{k=1} ^ {k=n} V_k }.

The drawback of this method is that it requires floating point operations; there is no guarantee that all kinds of CPUs give the same result.

Did you want to suggest a new probability model that guarantees the election probability is proportional to the stake value(voting power)?
I have two questions here.
Firstly, how the probability that two candidates have the same (proposer power) is 0? I think there are chances that X_i ^ {1 / V_i} = X_j ^ {1 / V_j} and the probability is higher than 256bit hash collision(A priority tie situation in my proposal).
Secondly, I failed to derive the linear probability function that you proposed: P(\mbox{candidate i is elected}) = \frac{V_i} {\sum_{k=1} ^ {k=n} V_k }. How could I get this linear probability?

Thank you for your prompt response! For the first one, I assumed that X and V are real values. In reality, our random numbers are all integers. Therefore, as you pointed out, while it is quite small, there is some probability that two candidates have the same (proposer power). In this case, I think your tie breaking method can be used.

For the second point, here is the proof for my claim:

Thank you! The proof seems clear in mathematical sense. It also does make sense in actual calculation with an additional assumption that the one unit increment of “quantized x” is small enough to approximate summation to integration.
If we introduce this probability model, we don’t have to introduce the concept of priority. In the previous model, the sortition algorithm gives stake-proportional chances to try random functions to get higher priority. Unlike the previous one, we can natively support stake proportional election with this kind of proposer power. It is really worth considering changing the model. X_i can be easily calculated from VRF function output divided by 2^(VRF_hash_len).

However, as you pointed out, the accuracy of the floating point operations is of great importance. The stake values are high so that the proposer power values will be extremely close to 1 unless we introduce a new unit. I expect the introduction of a new bundle unit and log sized \frac{\log X}{V} function will help distinguish close values.

We have a concern about network and verification cost. In the Algroand model, only some eligible validators send their priorities and proposals. However, here all of the validators have to send “validator power” values to be a leader. Every proposer power verification accompanies VRF verification which is costly.

I just watched the presentation on youtube. I could not understand a single word in Korean :slight_smile: but fortunately I followed the links in youtube description to here. Glad to meet others here.

I am using VRF in my project too. I was thinking to improve the tendermint leader election the same way. I am very happy that some smarter person think the same way and has done great job !

I have a question on the project progress. Has tendermint team considering update their tendermin algorithm, or can we just fork it to add VRF flavor into tendermint? what do you think? is there any others can help?

Here is the tendermint issue related to introducing randomized leader election.
Apart from the Tendermint team, Codechain has its own Tendermint implementation written in Rust. We are now implementing randomized leader election onto this CodeChain experimental branch. We are going to check its performance and the trade-offs. If it’s successful, we are willing to implement a safer vrf function to introduce randomized leader election.

I am a rust fan too ( not so much to Golang). I am so happy that you have the tendermint rust implementation ready , that is super cool!

does that mean we can write the abci app in rust?
will it be compatible with existing cosmos sdk? I will take a look at the code tomorrow. I already feel exciting even before reading the code.

you mentioned you are willing to implement safer vrf function. do you mean the current algorand vrf is not safe enough?

You of course can write the abci app in rust because it is not language dependent.
However, CodeChain is not using abci application and was not created by cosmos sdk. Our consensus algorithm is tendermint but was implemented independently and included in CodeChain core.

For the verifiable random function, Algorand is using elliptic curve VRF with Elligator2 hash-to-curve method. This cryptographic suite is compatible with the Algorand’s curve25519 but is incompatible with our secp256k1. Therefore, we needed a new implementation for the sake of CodeChain. We implemented svdw hash-to-curve method but it is not production-ready now. I mentioned “a safer vrf function” in that sense.

Implementation

Following the proposal, the randomized leader election process is implemented by expanding the proposal step. In the previous vanilla Tendermint, only the proposal signed by the predetermined block proposer can be accepted as a valid one. Now, any node that can prove it is eligible to be a proposer can propose a block and other nodes can verify the proposal. After a certain time interval, nodes vote for the block proposal with the highest priority that they have observed until that point. The gossip protocol also changed to request proposals only to the nodes that have a higher priority. The block proposal starts to include the VRF hash output and its proof, and the block seal starts to include the VRF seed and its proof calculated by the block proposer. This patch does not harm the safety and liveness property of Tendermint because it only changes the criterion for proposal validity.

Experiment

From CodeChain binaries with randomized leader elections, I organized the local network with a total of six nodes. The Propose timeout was set to 4000ms. This parameter defines the proposal exchange time between nodes. Every node will pre-vote for the highest priority proposal it has ever seen within the timeout period. For sortition parameters, I normalized each node’s stake, treating the total as 1000 and set the expected elected_sub_users as 7.0. Lastly, I produced the consensus and sortition data for 99829 blocks.

The first set of data is about average block generation time.

  • Average block generation time is 4.42s.
  • Total view increase cases had ocurred 123 times, it occupies 0.12% out of total 99829 blocks

Considering all nodes were positioned at the same local network, the time for network propagation was negligible. Most of the time was taken in the Propose step waiting for the convergence between the highest priority proposal of each node. The block generation time was expected to be a value slightly higher than 4 seconds. However, the reason that the deviation was quite larger than I expected is due to view increase(consensus failure) situations. An increased view implies that the nodes failed to agree to one block proposal in a round. A failure rate of 0.12% is slightly above the theoretically predicted no-proposer rate 0.0912% . Except for the no-proposer situation, some consensus failure cases can be reduced naively by increasing timeouts or by optimizing the gossip protocol.

The second set of data is about the frequency of each node being a block proposer.

Node Frequency being a leader Delegated CCS
1 16534 100000
2 16673 100000
3 16545 100000
4 16687 100000
5 16684 100000
6 16706 100000

In the ideal environment, each node’s frequency is expected to be equal. From the viewpoint of a normal distribution, each node’s election frequency follows N(np, np(1-p)) where the probability being the leader of a round is 1/6. The expected frequency and the standard deviation are 16638 and 117.8 respectively. All the frequencies are included in the 1 sigma region from the average [16520, 16756]. The result is statistically reliable to be interpreted as random.

The third set of data is about the average number of elected sub-users in each round. This data cannot be extracted from the chain information, but can be extracted from full consensus logs for 34679 blocks.

Node Average number of elections Delegated CCS
1 1.1580763046338254 100000
2 1.156911472342416 100000
3 1.1560051455970062 100000
4 1.1654436485893875 100000
5 1.151726940505774 100000
6 1.1683964332699899 100000

The average number of elected sub-users per round is 6.96 and the standard deviation is 2.631.

The number of no-proposer rounds is 19, which occupies 0.056% out of the total.

In our probability model, a random value is calculated at every round, and it is projected to the binomial cumulative distribution function. To get statistics, I repeated a binomial trial B(n, p) about 35,000 times and n is large enough that the sortition behavior would approximate to the normal distribution N(np, np(1-p)). In this point of view, the average number of elected sub-users and the standard deviation was expected to be 7.0 and 2.637 respectively. The result distribution is similar to the expected theoretical distribution. Therefore, I can say that the randomness property of VRFs is favorable to be used in randomized leader elections.