Consensus problem in Distributed Systems

Nov 2019

11 Mon

12 Tue

13 Wed

14 Thu

15 Fri

16 Sat 09:15 AM – 05:00 PM IST

17 Sun

Make a submission

Accepting submissions till 30 Sep 2019, 11:59 PM

T-Hub, Hyderabad

Tickets

Pinned update

From building malleable systems, to tweaking your architecture to control cloud costs - Rootconf SRE on Friday, 24 November, in Bangalore Rootconf SRE on Friday brings together talks and discussions which have all the ingredients of a nice OTT drama. more

##About Rootconf Hyderabad:

Rootconf Hyderabad is a platform for:

DevOps engineers
Site Reliability Engineers (SRE)
ML and data engineers
Security and DevSecOps professionals
Software engineers

to discuss real-world problems around:

Site Reliability Engineering (SRE)
Data and AI engineering
Distributed systems -- observerability, microservices
Implementing Infrastructure as Code

Speakers from Flipkart, Hotstar, Intuit, GO-JEK, MadStreetDen and Trusting Social will share their experiences with the above challenges.

##Event venue:
Rootconf Hyderabad will be held at T-Hub, IIIT-Hyderabad Campus, Gachibowli, Hyderabad, Telangana - 500032

##Contact information:

For bulk ticket purchases,sponsorship and other inquiries, contact sales@hasgeek.com or call 7676332020

#Sponsors:

Click here to view the Sponsorship Deck.

Rootconf Hyderabad 2019 sponsors:

#Platinum Sponsor

#Bronze Sponsors

For information about the event, tickets (bulk discounts automatically apply on 5+ and 10+ tickets) and speaking, call Rootconf on 7676332020 or write to info@hasgeek.com.

Hosted by

Rootconf

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

All submissions

Previous Next

This submission has been added to the schedule

Consensus problem in Distributed Systems

Submitted Sep 3, 2019

Section: Flash talk (5 mins) Category: Distributed systems

A fundamental problem in a distributed system is obtaining consensus on some data value to achieve overall system reliability on top of unreliable system components. In the real world, system components are never perfect, they are prone to hardware failures, packet drops, slow network, clock skews, etc and in this talk, let’s walk through a few common scenarios in a distributed system where all the components should agree on the state of the system for it to be reliable.

AUDIENCE
Aspiring Distributed Systems Developers; Technical; Beginner

KEYWORDS
Distributed System, Coordination service, Consensus Problem in Distributed Systems

Outline

What is a consensus in a distributed system?

In the context of distributed systems design, a consensus is often loosely used to mean some form of agreement. Consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final. Typical consensus algorithms make progress when any majority of their servers is available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).

i.e 2f+1 nodes to survive f failed nodes

There are a few properties we expect from a solution to consensus:
Agreement: Every correct process must agree on the same value.
Validity: If all processes propose the same value v, then all correct processes decide v
Termination: Every correct process decides some value. If the protocol never terminates, then the processes are vacuously agreeing on the same thing, which is not deciding.

To summarize, fundamentally, the goal of consensus is not that of the negotiation of an optimal value of some kind, but just the collective agreement on some value that was previously proposed by one of the participating servers in that round of the consensus algorithm. With the help of consensus, the distributed system is made to act as though it were a single entity.

An example scenario:

For the purpose of simplicity, let’s assume a distributed storage system with 2f+1 nodes participating to form a cluster and these participants act at their own speed, may fail at any time and rejoin after recovering from the failure. And these nodes are connected via a network which transmits messages asynchronously at an arbitrary speed. In short, everything can fail at any time; after failure, participants can recover and rejoin the system. Yes, we are looking at a fault tolerant storage system. As these nodes can fail at various stages, it’s important to have more than one copy of our data. For now, let’s assume all the data is replicated across all the cluster nodes (but in reality it may affect overall performance)
And we have a client which is not part of the cluster, requesting for some operation from our distributed storage, like a write or read to a data file. Read operation can be served by any node in our cluster without any issues, but write has to be agreed upon by all cluster members before the write can be committed. If two or more nodes recieve write request at the same time for the same value, how to determine which request to process in a distributed setup? This is an example of consensus problem in distributed systems.

Via this talk, let me introduce some prominent consensus algorithms to obtain consensus in a distributed systems.

Previous Next

Comments

Nov 2019

11 Mon

12 Tue

13 Wed

14 Thu

15 Fri

16 Sat 09:15 AM – 05:00 PM IST

17 Sun

Make a submission

Accepting submissions till 30 Sep 2019, 11:59 PM

T-Hub, Hyderabad

Hosted by

Rootconf

Rootconf Hyderabad edition

Rootconf Hyderabad 2019 sponsors:

Consensus problem in Distributed Systems

Outline

What is a consensus in a distributed system?

An example scenario:

Requirements

Speaker bio

Links

Slides

Comments