Dianemo: Distributed task management to achieve faster throughput
Submitted by Santosh Gannavarapu on Thursday, 28 June 2012
Big Data Infrastructure & Processing
There is always a need to run batch/offline jobs while utilizing resources efficiently. The initial attempts are to set aside pre-configured worker boxes that will pick up jobs either by a cron scheduler or by a pre-designated manager application that is responsible for kicking off processes. Assigning boxes statically would necessitate deploying boxes that may either run out of bandwidth or may be so under-utilized that you are paying for unnecessary CPU cycles.
The need of the hour is a distributed task management system that will distribute load uniformly across boxes whose CPU cycles are available while distributing the load such that every box in the fleet is efficiently used. Such a system should be tolerant of host failures resulting in high-availability.
After evaluating various open source schedulers we decided to build our own. While distributing load across boxes is the essential requirement in many there are subtle nuances such as honoring CPU load, distributed scheduling agent or priority that are missing in many.
Dainemo has more to give than just distributing load. Some of its features are listed below:
Achieving peer-to-peer distributed scheduler: The scheduler operates by deploying the agent across all the boxes set aside as worker boxes. Each agent picks up the job by consulting a common database of jobs. Thus each host behaves as a peer agnostic of other peers in the network. Two hosts wouldn't know anything about each other. The jobs are currently time-based. The scheduler has a job Generator which checks at the various schedules and creates jobs that are later picked off by the agents.
Linear Scalability: The system achieves linear scalability and elasticity. Need for additional capacity is as easy as adding another host with the scheduling agent. Upon successful provisioning the box becomes operational by picking up unscheduled jobs. If there is excess capacity simply shutdown the agent and remove the host. Weighted Jobs: Each job will be assigned a designated number of pre-configured tokens. Similarly each host is assigned a set of tokens providing an indicator to the amount of resources available on the host. Each scheduling agent will pickup jobs amounting to the tokens available at any time.
Honoring the CPU load: The number of tokens available is directly proportional to the CPU load on the host. If the load on the box is high the tokens available are brought down to zero immediately thus disabling the scheduler from picking up any further jobs. This heuristic honors load spiked caused by any process running on the host, thus facilitating better CPU utilization.
Priority Based Scheduling: There are currently two types of priority in Dianemo priority 1 and priority 2. Priority 1 has a reserved set of tokens, thus giving scheduling time to priority 1 jobs first. Upon availability of any additional tokens priority 2 jobs are picked up. Priority 1 jobs are always picked up at the scheduled time allowing a configurable minimal grace period. Upon failures Priority 1 jobs are immediately marked failed. Priority 2 jobs are picked up upto a larger grace period. Upon failures priority 2 jobs are re-tried upto 3 times with retrial window being 30 minutes away.
Scheduling dependency jobs: A job can be marked a child of another job. Thus if the parent job is finished only then the child job is kicked off as long as the scheduled time is within proper limits.
Leader agent: A leader agent is selected at real-time within the available scheduling agents. The agent will check to see if there any jobs that aren't picked up and will mark them as failed to run. If there are any jobs whose heartbeats haven't been updated in a designated time then they are marked as UNREACHABLE.
Santosh Gannavarapu is Co-Founding Geek & CTO of Sokrati. Sokrati is the leading Search Marketing platform in India.