Rootconf

A beginner’s guide to distributed graphs

Name: A beginner’s guide to distributed graphs
Start: 2025-05-16T14:15:00+05:30
End: 2025-05-16T16:45:00+05:30
Location: Underline Centre, 2nd floor

Hands-on workshop - Rootconf 2025 Annual Conference

May 2025

12 Mon

13 Tue

14 Wed

15 Thu

16 Fri 02:15 PM – 04:45 PM IST

17 Sat

18 Sun

Underline Centre, 2nd floor, Bengaluru

May 2025

12 Mon

13 Tue

14 Wed

15 Thu

16 Fri 02:15 PM – 04:45 PM IST

17 Sat

18 Sun

Underline Centre, 2nd floor, Bengaluru

Workshop goal

To provide participants with a practical understanding of the principles, architectures, algorithms, and key challenges involved in processing and managing massive-scale graph data in distributed environments.

Note

This workshop is of 2.5 hours duration.
This is an in-person and hands-on workshop.
Only 20 seats are open for workshop participation.
Recording of the workshop will be made available for Rootconf members.
This workshop is beginner-friendly.
Code & materials can be accessed at https://colab.research.google.com/drive/1-Qmv9UFCXHnuWkWKsRfuPP7MxA7DH0VY?usp=sharing

Overview

Graphs are ubiquitous! They are fundamental to modeling complex relationships, from social and recommendation networks to financial systems and knowledge graphs. But real-world graphs often contain billions or trillions of nodes and edges, far exceeding the capabilities of single-machine processing.

This workshop explores the architectural and algorithmic foundations of distributed graph systems like Spark GraphX, which are essential for scaling graph processing.

We’ll begin by examining foundational graph data models (Property Graphs and RDF) and then discuss architectural considerations including:

Partitioning
Replication
Querying
Fault tolerance strategies employed in distributed graph environments.

Participants will explore

Common query patterns in graph data:
- Reachability queries
- Subgraph and pattern matching queries
- Keyword search queries
- Path queries
Why traditional batch systems (like MapReduce) fail for graph workloads due to high I/O overhead.
How specialized distributed models like Pregel and Gather-Apply-Scatter (GAS) emerged, including a discussion of the “think like a vertex” framework popularized by Google.
Effective graph partitioning strategies:
- Edge-Cut
- Vertex-Cut
Mechanisms for fault tolerance in graphs:
- Checkpointing
- Lineage-based recovery
Distributed Querying for graphs — how query rewrites for distributed graph databases differ.
Graph summarization — approaches for summarizing large graphs in a distributed setup, including parallelizing and distributing GNNs (Graph Neural Networks).

Hands-On Activities

📌 Distributed Breadth-First Search (BFS) using the Pregel Model

Participants will conceptually design the compute() function for a distributed BFS, tracing message flow and state updates across supersteps to understand the vertex-centric model in action.

📌 Partitioning Strategy Comparison

Using a small example graph, participants will manually simulate and compare the communication cost (cross-partition messages) for a simple query (e.g., 2-hop neighborhood) under different partitioning strategies:

Hash Partitioning
Manually optimized Edge-Cut

This highlights the direct impact of partitioning on performance.

📌 Distributed PageRank using Spark GraphX

Participants will code the steps for implementing iterative PageRank using GraphX operators like aggregateMessages, understanding how data flow and aggregation work in a data-parallel framework.

📌 Graph Summarization using GNNs

An exercise on how to summarize large graphs using Graph Neural Networks in a distributed setting.

Pre-requisites

A basic understanding of graph theory concepts (e.g. BFS, DFS).
Familiarity with distributed computing fundamentals (e.g., Data Partitioning, Fault Tolerance).
Some exposure to data-parallel paradigms (e.g. MapReduce) and a basic understanding of Spark and RDDs.
Comfort with reading and writing Python code.

Key learnings for participants

By the end of the workshop, participants will:

Understand the core principles of distributed graph processing.
Be familiar with the architectural considerations of distributed graph systems, including partitioning, fault tolerance, etc.
Gain beginner-level insights into using frameworks like Pregel and GraphX.
Walk away with conceptual tools to model and debug large-scale graph workloads.

Venue

Underline Centre, 2nd floor

24, 1st Main, 3rd Cross Road, 3rd Floor,

Above Blue Tokai 24, 3rd A Cross, 1st Main Rd,

Bengaluru - 560071

Karnataka, IN

Loading…

Hosted by

Rootconf

We care about site reliability, cloud costs, security and data privacy

A beginner’s guide to distributed graphs

Workshop goal

Note

Overview

Participants will explore

Hands-On Activities

📌 Distributed Breadth-First Search (BFS) using the Pregel Model

📌 Partitioning Strategy Comparison

📌 Distributed PageRank using Spark GraphX

📌 Graph Summarization using GNNs

Pre-requisites

Key learnings for participants

Who should attend?

👩 💻 Instructor bio

How to attend this workshop

Contact information ☎️

References

Venue

Loading…

Related events

Rootconf 2025 Annual Conference - 16th and 17th May: On platforms, distributed data systems & security

Venue

Loading…