The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Russell Sullivan

@russellsullivan

Customizing One Database for Your Multiple Data Structures

Submitted May 10, 2013

Attendees will gain hands-on experience on how to address the challenge of managing multiple types of structured and unstructured data by customizing data-structures to accurately represent their data as it exists and is queried in its natural form, to attain an impedance match between data in the wild and its model. The data structures will then go through a series of customizations to optimize for performance, scalability, data-size, ease-of-use, etc.

Outline

Historically, distributed databases have imposed restrictions on data models (e.g. structured-data-model, range-based partitioning) on data requiring high locality. In addition, databases have dictated that data be stored in either row or column oriented storage, which causes an explosion in I/O operations as the size of a row/column increases, triggering adverse side effects in subsystems such as replication.

This workshop will explore the architectural decisions that we made in developing Aerospike to address these issues while being a reliable, fast, distributed, and linearly scalable database cluster.

We will introduce the LinkedRecord API, which gives end-users the ability to define linked data structures (with guaranteed data locality) within the database. Linked data structures can be used to represent a logical data structure that is physically stored as many records, which in practice contains the I/O explosion described above.

We will also cover Aerospike’s user-defined functions, the underpinnings of the LinkedRecord API, for use in both single-record and multi-record operations. Each of these features have moved Aerospike from a key-value store to a rich, robust data platform, allowing users to build databases for their applications, rather than applications for their database.

The audience of this advance-level workshop will:

• Gain an understanding of the Aerospike data platform.

• Learn how easy it is to install and run an Aerospike cluster.

• Develop a simple application creating their own linked data structures and user-defined functions.

Requirements

Any reasonably modern laptop will do to setup an Aerospike client and server. Operating system must be CentOS 6.x or Linux 12.x; virtual machines are OK.

Speaker bio

Russell Sullivan, Aerospike’s principal architect and performance guru, is leading the integration of AlchemyDB capabilities into the Aerospike real-time NoSQL database and further enhancing its performance. An expert in high-performance and distributed systems, he was the founder and developer of AlchemyDB, an ultra high velocity open source database and the first database to combine SQL, NoSQL key-value store, document store, and GraphDB. Russell was also a key contributor to the performance enhancements of the Redis open source key-value store upon which AlchemyDB was built. Prior to founding AlchemyDB, Russell was a principal engineer of the Makara platform-as-a-service (PaaS), which was acquired by RedHat. He served as vice president of engineering and operations with BE2.com, one of the largest and fastest growing dating sites in the world with 100 million members. He also served as a software architect at Lycos Europe, senior software engineer at 24/7 Real Media, and software programmer at the National Institute of Health (NIH).

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures