So you may have heard some news recently about ETH 2.0’s long awaited and much bandied about official multi client testnet named Medalla crashing and burning only to re-emerge from the flames after a few days of downtime. The event didn’t leave them unscathed-far from it. Don’t listen to the noise, however, we’ll tell you why it’s the best thing that could have happened and why Ethereum 2.0 is on track to be the most decentralized and resilient blockchain yet thanks to a razor sharp focus on security.
A majority of the Medalla beacon nodes were running the Prysm client developed by Prysmatic Labs. This set the stage for the actual issue having such an outsized impact and cascading effects and is a powerful reminder of the perils posed by a lack of client diversity and over representation of a single client in a blockchain network.
The ETH 2.0 beacon chain relies heavily on the assumption that all network participants have the same clock time to properly propose/validate blocks and perform other duties. Clock skew is a real issue and the Prysm developers had baked in a system that used Cloudflare’s Roughtime protocol to get digitally signed and reliable clock time to the benefit of users.
Unfortunately it turns out that Roughtime uses a pool of servers to determinate the time and if one of these servers misbehaves and reports an incorrect time that is heavily behind or in the future it gets averaged in and the resulting time is way off from the real time. A clear case where using a median instead of an average value would have almost completely mitigated the issue. This large clock skew trickled down to ETH 2.0 nodes running Prysm because the developers had designed the Roughtime sync so that it would automatically adjust the local system clock if it detected a deviation from the time reported by roughtime. This is obviously a bit haphazard and should only be performed if the discrepancy is minimal eg.: in the order of a minute or less. Since the roughtime servers were reporting a time more than 4 hours in the future, this got propagated to anyone running a prysm instance and effectively made all the prysm beacon nodes unable to work with the other clients. In the ensuing chaos and the mad scramble that followed to get everything back on track additional mistakes were made that led to validators being slashed and network participation dropping even further; in the end it took several days for the network to reach finality again. This catastrophic scenario was an invaluable opportunity for client teams to gather data as it was unfolding, to hone their incident response playbook and to uncover a lot of edge cases that just would not have emerged had the network not been in such a degraded and fragmented state. All clients suffered from massive resource usage spikes due to the amounts of different forks and the strenuous loads that they imposed. These conditions would have been almost impossible to create in a synthetic controlled environment. The Medalla incident led to a multitude of improvements in all the clients(and Prysm now only uses Roughtime as a time source to warn the users of possible clock skew but does NOT alter the system clock).If you want to learn more about this specific incident we highly recommend these very through and informative posts by Benjamin Edgington and the excellent Postmortem authored by the Prysmatic Labs team.
It all starts with a simple yet unavoidable question: how do you actually design a blockchain to be as resilient and hardened as possible against a multitude of highly skilled adversaries and black swan events?
An immediate, binary choice with far reaching implications is how many client implementations of the protocol will be available. A single one or many.
Having a blockchain network comprised of more than 1 protocol implementation(a client) means that if a something goes horribly wrong with one client, be it an actively exploited bug in the wild or a legitimate code update that has unforeseen consequences the network does not necessarily come to a standstill thanks to the other unaffected clients.
This begs the question – why don’t blockchains have multiple clients? Well developing(and maintaining!) a blockchain client is a complex and time consuming undertaking that requires a diverse skillset. It demands a profound grasp of multidisciplinary subjects such as networking, security, cryptography, distributed systems, economics and more!
Now, for those of you looking for the fly in the ointment there is a notable downside: different client implementations which are often coded in different programming languages can have subtle divergences in how they interpret and implement the protocol spec. These differences can lead to consensus bugs. A consensus bug can be described as two (or more) clients not being able to agree on a certain thing. In a blockchain context it can be a block, a transaction, or another object. This can have severe consequences on the health of the network and lead to forks, downtime and eventually when the dust settles even reverted transactions.
As everything else in the blockchain space, security and decentralization are a tough balancing act. There are various schools of thought on this but we believe monocultures to be dangerous and are firmly in the camp of people believing multiple clients to offer better overall security and resiliency in a network. Another benefit is that the ecosystem and ultimately users are not beholden to a single entity (the client developers) that is able to hold captive the network, exert disproportionate influence and dictate the overall direction of the project.
With that out of the way, let’s get back to Ethereum 2.0 and how it’s reshaping the blockchain security landscape. To start with it has FIVE (5!) clients under active development as of now. While we fully expect this number to decrease over time as users converge to the more mature and robust implementations, this is nonetheless an amazing evolutionary process. The clients that are better able to thrive in the real world will emerge as the winners, it’s comparable to natural selection and will ultimately greatly benefit the network. We think the inevitable consolidation phase will leave us with battle-hardened clients that will have withstood a barrage of adversarial conditions and more importantly dev teams who have an intimate understanding of the security landscape.
So we have client diversity covered, now what? The next logical step is ensuring that these clients are as secure, performant, stable and robust as possible. As is standard practice for complex, mission critical systems that are expected to hold a significant amount of value, audits are involved. All of the ETH 2.0 clients have or are in the process of, engaging 3rd party security auditors to perform a thorough review of their codebase, simulate adversarial scenarios, uncover potential bugs/security vulnerabilities and suggest appropriate fixes and mitigations.
Selecting and working with a company to audit your codebase is a deeply involved endeavor that requires tight collaboration and a strong commitment from the start of the process (usually a RfP, Request for Proposals) where you delineate the scope of the audit and solicit bids from various qualified security vendors to the end of the relationship, which usually concludes with an Audit Report and a request for comments.
It’s important to stress that an audit is not a fire and forget tool, but rather an ongoing undertaking and only one layer in a Defense in Depth approach to blockchain security.
It’s not only the ETH 2.0 clients that have undergone comprehensive security audits. The reputable Trail Of Bits firm has been engaged to assess the security of the CLI tool prospective stakers will use to generate the cryptographic keys used to control their validators. The audit uncovered 2 high severity issues, and suggested several less critical improvements but also noted the general high quality and maturity of the code reviewed.
Another encouraging sign showing that the EF is deeply conscious of all the critical moving parts that will have to interact together to bootstrap the beacon chain and allow validators to deposit ETH to perform staking duties is signaled by the fact that they went a step further than a simple audit for a piece of code that is arguably one of the most vital puzzle pieces In ETH 2.0: the Deposit Smart Contract.
This is a smart contract deployed on the Ethereum mainnet that will act as a one way bridge for people to transfer their ETH from the current eth 1.0 chain to the beacon chain to be staked. Runtime Verification has conducted a Formal Verification audit of this smart contract at the behest of the EF. What is a formal verification audit? The answer to that would necessitate a separate post to thoroughly explain but the gist of it is that it’s a process that allows to prove(or refute) the correctness of a specific algorithm in a mathematical way. This is one of the most rigorous and challenging vetting procedures available in the software realm and speaks volumes about the extraordinary dedication by the EF to shipping secure code.
We believe there are no silver bullets in the security field but gaining actionable insights form a high quality audit report compiled by a reputable firm is a great and necessary first step.
As befitting a project as ambitious and complex as Ethereum is, especially when securing billions in value, the EF is building a world class in-house dedicated security team whose sole focus will be ETH 2.0.
They have received numerous applications by highly qualified people and we have no doubt they will succeed in amassing a sizeable amount of InfoSec talent.
But as part of a multi pronged approach to security the EF has also spearheaded a Bug Bounty program covering the Phase 0 part of the ETH 2.0 spec. The pieces of the project that are considered in scope are well detailed and the bounties very generous ranging from a minimum of $1000 for bugs rated as low severity/no impact and going up to $20000 for critical bugs that have the potential to severely impact the network.
This is in addition to the rewards offered for successfully breaking the ad hoc testnet networks bootstrapped and maintained by the EF and cheekily and aptly named attacknets.
The Ethereum Foundation initially deployed multiple attacknets dubbed beta-0, each formed solely of nodes running a single client in order to purposefully lower the overall security of the network and the barrier to entry for whitehat hackers and security professionals looking to probe and exploit specific client vulnerabilities.
These attacknets have since been deprecated and decommissioned but not before they fulfilled their role and yielded some successful exploits that mainly targeted the networking layer and were able to successfully prevent finality thus being eligible for bounties as summarized in the Trophies section.
The single-client beta-0 attacknets were retired in favor of a multi client attacknet dubbed beta-1. This attacknet is still operational and so far no one has successfully claimed a bounty on it so if you have a penchant for breaking things this could be a nice way to help ETH 2.0, gain lasting fame and net some cash.
Look for more parts of the spec and client implementations being covered under the program as the EF transition to a dedicated ETH2 bug bounty portal with a public leaderboard, following in the footprints of the pioneering initiative that has led to countless ETH1 security issues reported and fixed.
The bug bounties offered are reflective of the openness of the Ethereum ecosystem and the deep rooted commitment by the EF to fostering an open and collaborative environment that extends to all facets of the security footprint for the project. We think this approach is likely to pay off big time in the long term.
All of this brings us to the final and most disruptive initiative that has emerged from the security related efforts in the Ethereum 2.0 ecosystem.
The EF generously funded a grant to develop a comprehensive fuzzing framework targeting most of the Ethereum 2.0 clients. We consider this to be the magic arrow in the EF’s quiver.
But what exactly is fuzzing?
We’re glad you asked. Fuzzing is a process in which an automated program overwhelms a target piece of software with a deluge of random(and not so random) inputs in the hope of inducing a crash or unexpected behavior that could manifest itself in the real world when certain conditions arise. Still not sure you understand? Well this very vivid analogy courtesy of Afri Schoedon comes to the rescue:
“Imagine you have an unlimited number of kids of all ages asking you seemingly random questions non-stop. The moment you have a mental breakdown, the psychologist will write down the question that caused it and try to repair you, so you will withstand it next time. “
Makes it a lot clearer right?
Having a fuzzing framework to uncover bugs that would be basically impossible for a human to detect complements nicely the already well rounded approach to a robust security posture in ETH 2.0 but what is absolutely unprecedented is the revolutionary way in which the EF decided to engage stakeholders as part of the process.
Traditionally, fuzzing is a process generally performed by security firms when contracted by a client or done in house by large entities that have a dedicated security team and that are well versed in shipping secure code as it is fairly onerous and time consuming. In most cases it is closely guarded and while the tools and techniques are sometimes open sourced they are very rarely disseminated with custom tailored code to target a specific software/ property because of the high potential for abuse by attackers to use them for nefarious purposes eg.: finding a security vulnerability and exploiting it instead of reporting it.
The Sigma Prime devs took a very different approach in publishing the Beacon-Fuzz tool that is opposite to the security-through-obscurity credo that is often ingrained and embraced by a lot of fortune 500 companies and even by some less open minded blockchain projects.
Not only did they release the tool in the public domain under a very permissive license but they also pre populated it with corpora(a set of pre defined inputs upon which to perform mutations) designed to kickstart the fuzzing of existing ETH 2.0 clients.
They are actively encouraging members of the Ethereum community to run the tools, going as far as providing assistance in setting up the local fuzzing environment and troubleshoot issues(you can reach them on the #fuzzing channel in their discord using this link should you want to join the fuzzing ranks)
This effort is notable because their reasoning is that the benefits gained by having a diverse set of stakeholders run the fuzzing software far outweighs the risk of an attacker using the tool to uncover and exploit bugs in the clients, especially now that the network has not yet reached Mainnet status and is not securing real value.
So far their intuition has been proven largely correct as there was an enthusiastic response from the community at large and some dedicated community members who recognized the value provided started using the tool and eventually managed to find bugs affecting quite a few clients as detailed in this blog post.
The amount of stakeholders who have a significant vested interest in ensuring a successful launch of the beacon chain, its ongoing security and a pristine uptime thanks to the economic incentives that underpin the network(namely the staking rewards for proposing and validating blocks and the desire to avoid the stiff penalties that could result from security incidents) means there is a big and ever growing pool of users who are highly motivated to run the tool.
The Sigma Prime crew is not resting on their laurels though, and have recently added new capabilities to the fuzzing framework, extending its already powerful features to not only find bugs affecting a single client but also comparing how each client performs the various State Transitions in order to find discrepancies from the canonical reference spec and potential consensus issues between the different implementations!
They are also in the process of adding more fuzzing targets covering the portion of the codebase that is handling networking in the various clients. A more in depth look at how structural differential fuzzing operates and a sneak peak of their exciting roadmap can be gleaned by visiting their blog.
Even once the whole attack surface of the various endpoints will have been exhaustively covered the tool will still prove useful to detect potential bugs that could be introduced as part of successive modifications to the protocol and client updates.
Based on our interactions with the Sigma Prime folks and other clients devs we have gained a deep respect for their steadfast commitment in strictly adhering to security best practices and their resolve to continually assess, challenge and improve the security posture of the ecosystem as a whole.
The teams have some amazingly talented sec leads working in an incredibly collaborative environment and projects such as Beacon-Fuzz are testament to this. They will have a lasting beneficial impact as they are refined and maintained well past phase 0.
To conclude, we believe that Ethereum 2.0 has a great shot at shipping a highly resilient and hardened network with a best-in-class overall security posture thanks to the strenuous efforts by multiple teams and the amount of expertise they bring to the table. Other projects in the space will be hard pressed to replicate these enviable achievements, pervasive security culture and virtuous cycle brought to bear by its massive, energetic community and the grassroot movement involved in the fuzzing efforts. It’s a tall ask for any project but especially difficult to mimic for the multitude of self-styled VC-funded “Ethereum killers” that have been kept in carefully controlled and monitored incubators since their inception and don’t have a set of well established and documented protocols and guidelines to respond to security SNAFUs. Ethereum has weathered countless attacks and the expertise and insight gained by the battle hardened client developers is invaluable here and can’t be understated.
Does this mean bugs won’t happen come mainnet time or that all of this is enough to ward off attackers and guarantee there will be no security incidents? Certainly not- but it highlights how uniquely positioned the ETH 2.0 project is to timely and effectively respond to such issues if and when they will arise. You’d be a fool to bet against ETH 2.0 and you won’t find us on the sell-side of the order book or among the ranks of traders opening short positions anytime soon.