Druid Cluster Setup

Druid cluster can scale horizontally. It has individual nodes like broker, coordinator, overlord, middlemanager, historical. For our convience, we will club the components to use the resources effectively.

Basic understanding about Druid Cluster:

Master server contains Overlord and Coordinator running which manages and handovers the task to middlemanager and historical nodes.

The Overlord node is responsible for accepting tasks, coordinating task distribution, creating locks around tasks, and returning statuses to callers.
The Coordinator node is responsible for loading new segments, dropping outdated segments, managing segment replication, and balancing segment load.
Each historical node maintains a constant connection to Zookeeper and watches a configurable set of Zookeeper paths for new segment information.
The middle manager node is a worker node that executes submitted tasks.
The Broker node is to route queries to if you want to run a distributed cluster. This node also merges the result sets from all of the individual nodes together

Using 1000 GB HDD as network file sharing for common purpose among all the servers to store and process the segments. Using Supervisor to automatically start the serverse whenever the server boots and node killed.

Master Servers      Overlord + Coordinator - 1st boot priority
Data Servers          MiddleManager + Historical - 2nd boot priority
Query Servers       Broker Node + Pivot - 3rd boot priority

### Configuration and Setup

Add 1000 GB volume to master server and using nfs share mount it to other clusters. You can read more about the nfs mounting and sharing here .
This is on GitHub so let me know if I've b0rked it somewhere.

The same configuration can be used for staging environment with lower machine confiugrations.

### Directory Structure

./druid-staging/benchmarking
├── bin
├── conf
│   ├── druid
│   │   ├── _common
│   │   ├── broker
│   │   ├── coordinator
│   │   ├── historical
│   │   ├── middleManager
│   │   └── overlord
│   └── tranquility
├── conf-quickstart
│   ├── druid
│   │   ├── _common
│   │   ├── broker
│   │   ├── coordinator
│   │   ├── historical
│   │   ├── middleManager
│   │   └── overlord
│   └── tranquility
├── extensions
│   ├── druid-avro-extensions
│   ├── druid-caffeine-cache
│   ├── druid-datasketches
│   ├── druid-examples
│   ├── druid-hdfs-storage
│   ├── druid-histogram
│   ├── druid-kafka-eight
│   ├── druid-kafka-extraction-namespace
│   ├── druid-kafka-indexing-service
│   ├── druid-lookups-cached-global
│   ├── druid-lookups-cached-single
│   ├── druid-s3-extensions
│   ├── druid-stats
│   └── postgresql-metadata-storage
├── hadoop-dependencies
│   └── hadoop-client
│       └── 2.3.0
├── lib
└── quickstart

These are the following directories important for configuring nodes of Druid. You can read more about druid configuraiton here.

conf directory contains the common configuratios for all the nodes in druid for cluster setup.
conf-quickstart contains the common configurations for the all the nodes running in one machine.
extensions contains all the extensions that are packed by druid-io. If you want to have additional extensions that are not packed by Druid Team because of licensing, you need to manually download or download them using pulldeps and put them in extension library.

Druid nodes benefit greatly from being tuned to the hardware they run on.

Checks and Commands:

Ensure nfs is mounted in all the servers except capeve serevr.
If not, mount nfs and then restart supervisor using the commands given below.
Kafka all brokers should be up and running. Make sure all the 3 brokers are running. If not check for the error file in /var/lib/kafka directory. Generally error file starts with hs_

# Mount nfs volume
sudo mount 10.2.1.98:/vol/druid/ /vol/druid/

# Check Tranquility Kafka consuming events more than 10k in ms.
cat /vol/druid/imply-2.0.0/var/mk_node/sv/tranquility-kafka.log | grep -E "Flushed {vnk-clst={receivedCount=[1][9][0-9]{3}" | grep "2017-01-17"

# Check Tranquility Kafka offset (Consumed so far).
cd /var/lib/kakfa
./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic vnk-clst --zookeeper localhost:2181 --group tranquility-kafka

# Check topic configs. Replication factor 3 in our case. All brokers should be in sync.
cd /var/lib/kafka
./bin/kafka-topics.sh --describe --zookeeper localhost --topic vnk-clst

# To increase replication factor of existing topic.
cd /var/lib/kafka
./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file increase-replication-factor.json --execute

# Using kafkat tool. Command line utility for Kafka.
# Increasing replication factor. Not reliable way.
kafkat set-replication-factor vnk-clst --newrf 3 --brokers  [0,1,2]

# Creating topic in kafka.
cd /var/lib/kafka
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic vnk-clst

# To change retention period of a topic.
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic vnk-clst --config retention.ms=10

# Deleting a config. 
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic vnk-clst --delete-config retention.ms

# Creating NodeJS server with pm2.
pm2 start /var/lib/capeve/server.js --name capeve -i 100 --max-memory-restart 100M --node-args="--max_old_space_size=200"

# Scaling up/down pm2 instance
pm2 scale capeve 100

References:

Kafka Setup here
nfs-mount here
Kafka MultiBroker here
Druid White Paper here
Imply Archtecture here
Imply Cluster here

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
conf-quickstart		conf-quickstart
conf		conf
extensions		extensions
hadoop-dependencies/hadoop-client/2.3.0		hadoop-dependencies/hadoop-client/2.3.0
lib		lib
quickstart		quickstart
LICENSE		LICENSE
README.md		README.md
druid.jmx		druid.jmx
location_data_set.csv		location_data_set.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Druid Cluster Setup

Basic understanding about Druid Cluster:

These are the following directories important for configuring nodes of Druid. You can read more about druid configuraiton here.

Checks and Commands:

References:

About

Releases

Packages

Languages

License

gwthm-in/druid

Folders and files

Latest commit

History

Repository files navigation

Druid Cluster Setup

Basic understanding about Druid Cluster:

These are the following directories important for configuring nodes of Druid. You can read more about druid configuraiton here.

Checks and Commands:

References:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages