Demystifying DHT in GlusterFS

May 2017

8 Mon

9 Tue

10 Wed

11 Thu 08:40 AM – 11:10 PM IST

12 Fri 08:40 AM – 06:00 PM IST

13 Sat

14 Sun

MLR Convention Centre, J P Nagar, Bangalore

Demystifying DHT in GlusterFS

Submitted Apr 7, 2017

Section: Full talk of 40 mins duration Technical level: Intermediate

Introduction of GlusterFS: Distributed file system gives us a way for storing and accessing files in a client/server architecture. Here we can use more that one server to store data and use multiple clients (local or remote) that can access data from these servers. It organizes and displays files and directories from multiple servers as if they were stored in your local system, thereby projecting a simple interface to the user or application. Their main advantage is that it is easier to distribute documents to multiple clients and they provide a centralized storage system so that client machines are not using their resources to store the data. There are few terminologies which we need to understand before diving into DHT like:

Brick
storage pool
volume
translator

Distributed Hash Table (DHT):
Distribution in GlusterFS is handled by the DHT or the Distributed Hash Table which is loaded on the client stack. All operations are driven by the clients which are all equal. There are no metadata servers or special nodes which have any additional information about where the files are present or should go. Any additional information about file or directories are stored in the extended attributes or xattrs. Xattrs are filesystem features that enable users to associate files/dirs with metadata. They store information as key-value pairs. There are mainly two DHT related xattrs- linkto and layout.

DHT creates directories on all the bricks. When directory is created a layout range is assigned to it which is stored in the extended attribute called trusted.glusterfs.dht .The range varies from 00000000 to 0xffffffff and each brick is assigned a specific subset of this range. The layout is complete and healthy when the range 00000000 to 0xffffffff is distributed across the volume without any gaps or overlap.

Unlike directories file have to be present on only one subvol. Given a file we find its hash value and the brick on which the hash value falls. This brick is known as the hashed brick. The brick on which the data file actually exists is the cached brick. For a newly created file the hashed and the cached brick will usually be the same. Considering the above example if we create a file under the directory dir1 then the file will be created on only one of the brick.

However while renaming a file the destination file’s hashed brick may be different from the source file’s hashed brick. In this case instead of actually moving the entire data file to the new hashed brick we create a linkto file. This is a 0 byte file which is created on the new hashed brick.

Takeaway: The sole aim is to make audince aware of GlusterFS. How It works? Since, DHT is one of the core part. So before diving into the contribution they should have minimal knowledge about how it works. Last but not least, we want audience to contribute on our upstream repo. :)

Outline

Introduction about GlusterFS
Basic Terminologies
What is DHT?
How it works?
Major fops algorithm
Contribution guide

Requirements

Projector, Mic

Speaker bio

Hi there!
This is Ankit Raj, currently working at Red Hat in Gluster Team. Before joining to Red Hat, I have worked as Ruby on Rails developer for few startups. Also, In 2015 I participated as a Student developer under Ruby organisation in Google Summer of Code 2015.
I am programmer, learner, presenter and a Engineer in the process. I have spoken at various national conferences and local meetups.

Rootconf 2017