Nov 2019
11 Mon
12 Tue
13 Wed
14 Thu
15 Fri
16 Sat 09:15 AM – 05:00 PM IST
17 Sun
Nov 2019
11 Mon
12 Tue
13 Wed
14 Thu
15 Fri
16 Sat 09:15 AM – 05:00 PM IST
17 Sun
Geethanjali Eswaran
A fundamental problem in a distributed system is obtaining consensus on some data value to achieve overall system reliability on top of unreliable system components. In the real world, system components are never perfect, they are prone to hardware failures, packet drops, slow network, clock skews, etc and in this talk, let’s walk through a few common scenarios in a distributed system where all the components should agree on the state of the system for it to be reliable.
AUDIENCE
Aspiring Distributed Systems Developers; Technical; Beginner
KEYWORDS
Distributed System, Coordination service, Consensus Problem in Distributed Systems
In the context of distributed systems design, a consensus is often loosely used to mean some form of agreement. Consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final. Typical consensus algorithms make progress when any majority of their servers is available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).
i.e 2f+1 nodes to survive f failed nodes
There are a few properties we expect from a solution to consensus:
Agreement: Every correct process must agree on the same value.
Validity: If all processes propose the same value v, then all correct processes decide v
Termination: Every correct process decides some value. If the protocol never terminates, then the processes are vacuously agreeing on the same thing, which is not deciding.
To summarize, fundamentally, the goal of consensus is not that of the negotiation of an optimal value of some kind, but just the collective agreement on some value that was previously proposed by one of the participating servers in that round of the consensus algorithm. With the help of consensus, the distributed system is made to act as though it were a single entity.
For the purpose of simplicity, let’s assume a distributed storage system with 2f+1 nodes participating to form a cluster and these participants act at their own speed, may fail at any time and rejoin after recovering from the failure. And these nodes are connected via a network which transmits messages asynchronously at an arbitrary speed. In short, everything can fail at any time; after failure, participants can recover and rejoin the system. Yes, we are looking at a fault tolerant storage system. As these nodes can fail at various stages, it’s important to have more than one copy of our data. For now, let’s assume all the data is replicated across all the cluster nodes (but in reality it may affect overall performance)
And we have a client which is not part of the cluster, requesting for some operation from our distributed storage, like a write or read to a data file. Read operation can be served by any node in our cluster without any issues, but write has to be agreed upon by all cluster members before the write can be committed. If two or more nodes recieve write request at the same time for the same value, how to determine which request to process in a distributed setup? This is an example of consensus problem in distributed systems.
Via this talk, let me introduce some prominent consensus algorithms to obtain consensus in a distributed systems.
Basic knowledge on distributed systems
Geethanjali Eswaran, DevOps Engineer for Large-Scale Data Cloud in Salesforce. Passionate about distributed computing, BigData cloud, Apache projects, Kerberos protocol, and many more...
https://docs.google.com/presentation/d/1aAY7uZms0dJZ7x1PWgGnh_ZAipJ-GA5Zcz9vAapOYpw
Nov 2019
11 Mon
12 Tue
13 Wed
14 Thu
15 Fri
16 Sat 09:15 AM – 05:00 PM IST
17 Sun
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}