How many nodes should an Elasticsearch cluster have?

Similar to the question of how many shards to use with an Elasticsearch index, the number of nodes your cluster should have is hard to answer in a definitive way. Ultimately, it will boil down to questions like the following:

  1. How much data are you working with?
  2. How many searches will you be processing?
  3. How complex are your searches?
  4. How much resources will each node have to work with?
  5. How many indexes/applications will you be working with?

Those are just a few of the many question that should be considered. The fifth, in particular, is something that's often not thought about. A single Elasticsearch cluster is able to work with multiple indexes, and therefore, could serve the needs of numerous different applications. You might provide a simple search for your website and also complex analysis of your organization's internal log files all on the same Elasticsearch cluster. Getting a handle on what and how many different applications the cluster will serve will help you better understand the number of nodes you need.

Also, there's a fair bit of flexibility depending on how you set things up. For example, since a large portion of the work Elasticsearch does is handled in RAM, devoting more RAM to your nodes can limit the amount of nodes you need overall. Conversely, you may have a physical or virtual limit on the RAM devoted to the nodes. Perhaps you're using a hosting provider that only allows you to use up to a certain amount of RAM per server or VM. You might then have to utilize more nodes in your setup to provide better overall throughput.

I can give a general rule of thumb, though. Start with three nodes. Why three? Mostly to prevent a situation called "split-brain" which is common in improperly configured clusters. Essentially, in a cluster, even though the idea is to distribute work evenly among nodes, you still need a "master" node. It's the job of this node to coordinate communication between all the other nodes. Among other things, the master node decides things like where the shards/replicas should be stored to optimize availability. It also coordinates tasks such as indexing and flushing data and routine index optimization that Elasticsearch performs.

The problem comes in when a node falls down or there's simply a lapse in communication between nodes for some reason. If one of the slave nodes cannot communicate with the master node, it initiates the election of a new master node from those it's still connected with. That new master node then will take over the duties of the previous master node. If the older master node rejoins the cluster or communication is restored, the new master node will demote it to a slave so there's no conflict. For the most part, this process is seemless and "just works".

However, consider a scenario where you have just two nodes - one master and one slave. If communication between the two is disrupted, the slave will be promoted to a master, but once communication is restored, you end up with two master nodes. The original master node thinks the slave dropped and should rejoin as a slave, while the new master thinks the original master dropped and should rejoin as a slave. Your cluster, therefore, has a split brain, as it were.

To prevent this, you need a sort of tie-breaker, in the form of a third node. That third node either remained with the original master and knows that the new master dropped, or saw the old master drop and participated in the election of the new master. Therefore, there's no conflict.

Split-brain can still be an issue with three or more nodes, though. To help mitigate this situation from occurring, Elasticsearch provides a config setting, discovery.zen.minimum_master_nodes. What this does is set a minimum amount of nodes that must be "alive" in the cluster before a new master can be elected. For example, in a three node cluster, a value of two would prevent a single node that became disconnected from electing itself as master and doing its own thing. Instead, it would simply have to wait until it rejoined the cluster. The formula for determining the value to use is: N / 2 + 1, where N is the total number of nodes in your cluster.

If you really want to only use two nodes, you can still prevent split-brain by using another Elasticsearch config setting, node.data. By simply setting this to false on one of the two nodes in your cluster, you effectively prevent that node from ever being a master node. However, by doing that, you also lose any failover ability as, if the master node goes down, the entire cluster is then down.

comments powered by Disqus