Here at Sisense we support high-availability, multi-server deployments for scalability and flexibility. In fact, these deployments are ready to go, out of the box and are common at most of our customer’s sites.
While this type of deployment meets the performance demands of most customers, we recently had a large customer come to us wanting to get even more out of their system. We decided to customize a solution using out of the box Sisense capabilities to meet their specific requirements.
Here’s what we did.
Step 1: Investigate
If we take a look at a common architecture schema with a high availability environment, you may find, for example, a database layer, a build layer with one build node, a query and web layer with two query nodes and two web nodes, and a load balancer. Just like the image below:
In an environment like this, an end user sends a request through the load balancer to one of the two web nodes, with no discretion to which they are using. By default, the web nodes then run the query on one of the two query nodes and return the response to the request to the end user.
It’s important to keep in mind that each query node stores every query in the machine’s memory. So, let’s say that each of the query nodes has 500GB of RAM. Once the node gets to a predetermined threshold of RAM space left, the server automatically starts to clean up the queries to free up available RAM.
Which software stack is right for you? Find out with our ultimate comparison guide.
Step 2: Brainstorm
Therein lies our question: Why is there lag time in answering a query if the query nodes clear their cache once RAM reaches a maximum capacity? How can we reduce frequency without purchasing more RAM space?
started by mapping and roughly measuring the dashboards and how much RAM was utilized when queried. Because both of the query nodes have ~500GB of RAM in our example, I wanted to understand how much of it will be utilized and how often the cleaning process will occur. For example:
|1||Customer Churn||100 GB|
|4||Help Desk Tickets||100 GB|
|6||Leads Analysis||60 GB|
|7||Lead Generation||400 GB|
|8||Customer Satisfaction||200 GB|
|TOTAL CAPACITY||995 GB|
After listing out the dashboards and realizing that the total capacity is less than 1TB, I needed to look at the actual functionality of the different nodes. Each web node has the same metadata and each query node has the same ElastiCube (Sisense’s super-fast data stores). So, when one end user asks a question and it’s directed to query node A, the answer gets “stored” there. However, if another end user asks the same question and it gets directed to query node B, it will be “stored” there too.
Ah-ha! This is not efficient. We’re storing the same answers twice on two different query nodes. This means, for example, that 5GB RAM query can take up a total of 10GB when stored on both query nodes. Moreover, this expedites the time interval for when the RAM will have to be flushed. Let’s try and reconfigure the architecture to fix it and find out.
Step 3: Test and Expand
To eliminate having the same queries saved on both query nodes’ RAM, I needed to create a structure that told the web nodes which query node to use based on which dashboard was being used.
If we use the example list of dashboards from above, I made sure that if dashboard 1, 3, 4, 6, and 8 were being used, the queries should be sent to query node A. If dashboards 2, 5, or 7 were being used, then the question should be sent to query node B. This means breaking the ElastiCube set and directing each dashboard to a dedicated query node. Doing so would eliminate our redundancy on the query layer.
For reliability purposes, if one of the query nodes is down, we want to have a fallback option that ensures all queries will succeed, even at the expense of performance. The solution was creating new ElastiCube sets like this:
Set 1: The cube on query node A is default, the cube on query node B is the fallback.
Set 2: The cube on query node B is default, the cube on query node A is the fallback.
The fallback cube does not participate in the ElastiCube Set round-robin routing and is not typically used. The fallback cube is only used when none of the other cubes in the set are available.
Now, specific queries are always routed to the same caches. Because the queries are not duplicated in both caches the available RAM across both caches is better utilized to store a larger amount of queries.
After testing this architecture and ensuring it works for all scenarios with two query nodes we expanded the capabilities to allow more than two query nodes if needed. Success!
So, What’s The Big Deal?
To put it simply: the ability to boost performance without having to add additional hardware reduces total cost of ownership. Most people can meet their performance demands with a standard high-availability set up. But for those who can’t, and who have the abilities to manage this new solution, this is an exciting discovery.