A couple of weeks ago, we officially launched the Cloud-Native Sisense on Linux deployment after a successful beta release cycle that kick-started in Spring 2019.
As of 2017, Linux was running 90% of the public cloud workload. It is increasingly the OS of choice by enterprises and the cloud due to its many advantages: lower TCO, higher security, improved stability and reliability, flexibility, and more. Given this importance, we made it an organizational priority to invest in a Sisense on Linux deployment in late 2017.
When we sat down to plan this execution strategy, we realized there were several different ways we could approach it. For us, it was critical to not only do it right so we didn’t waste time and resources but also to deliver a product that would lead our customers into the future and support their needs in the ever-growing cloud environment.
Here’s how we re-architected Sisense with the right technologies and frameworks for the task at hand without simply porting the code over.
The First Few Months
When I was tasked with the responsibility of building a Sisense Linux deployment in late 2017, a few small steps had already been taken. Two developers had started a Linux project which initially comprised of simply porting code from one OS to the other.
They started with C/C++ code, which usually takes the longest to migrate from Windows to Linux. By the end of 2017, the team was able to show their first demo, which ran queries over the ElastiCube Manager, our high-performance database, using C/C++.
Taking Stock and Restarting the Project
Even though some more progress was made, in January of 2018, we decided to take a step back and rethink our approach to this project. Often it is less scary to take what you have and what you know and continue without questioning your approach. However, it is not always the smartest or the best course of action.
Before jumping headlong into merely porting code from one OS to the other, it was necessary to think whether it made sense to migrate all components as-is or instead, to see what language or architecture would work best for the task at hand.
We decided that where it was required and where it made sense, we would not simply port over code but rebuild the component from scratch using the most relevant stack and technology for what that component was meant to do.
There were three “buckets” in this decision-making process:
- Components that would be migrated.
- Components that would be rebuilt from scratch using the right framework while maintaining institutional knowledge and providing a similar user experience. For example, we concluded that several components had to be rewritten in Java. To enable this, we dedicated more than a month to training the entire engineering team in Java. We also recruited Java experts to help guide and govern the design.
In hindsight, this was a critical decision that paved the way for a modern, enterprise-grade, full-stack analytic application that is highly-performant, reliable, and scalable. The best part is that we were able to build it in a little over a year.
The Right Technology for the Job
Let’s break this down some more. The Sisense application has a few key tasks handled by different components:
1. Sisense ElastiCube or Data Engine
The Sisense ElastiCube crunches hundreds of millions of records and needs to be highly optimized. It has to be close to the OS for better control of what is being done with less overhead. Most of this code was in C and C++ and was left that way.
Takeaway: C & C++ are good to use when building highly optimized processes that are close to the OS, such as building a database engine.
2. ElastiCube Management Service and Query Service
The ElastiCube Management Service and Query Service were moved away from C# and C++ and rebuilt in Java. Java is a highly-portable and mature language that plays well in building mission-critical, high-performance applications that are CPU-intensive. The agility and complexity needed in those components are such that we needed to use a lot of frameworks that come with Java and focus only on our application logic without compromising on performance.
We already had (and continue to have) a big footprint in Node.js. It would have been easier to use Node.js everywhere. However, we resisted the urge to use Node.js everywhere and use the best language and framework for the job.
Node.js is great for responsive operations with low memory footprint. It is easier to write in Node.js and is fast to debug and develop as well. However, Java has much better performance, more caching, and long state capabilities. Java is also more suited for compilation and type checking, which is important, especially when merging releases and branches over the years. These actions can have a lot of vulnerability if not caught by compilation errors.
For example, the Management service needs to be aware of all the statuses and aware of Kubernetes with a lot of control of the systems. It made sense to build it in Java as the service needs to be efficient, highly available and multi-threaded.
On the other hand, application parts that are more tightly integrated with the UI, are easier to build it in Node.js. For example, the original pivot was implemented in C# as an IIS application. The pivot is a full stack component. It made sense to rewrite it in Node.js which allows the full stack developer to work on both the front-end and back-end in the same technology.
For web services, it’s not recommended to use C++ because the development time is too expensive. For those reasons, eventually, we decided to go with Java and, in particular, used the Spring Boot framework. We also considered a few options like Guice or EJB (which we immediately disqualified).
Takeaway: Java is useful when building mission-critical, high-performance applications that are CPU-intensive with the need for more caching, long state capabilities, and a robust set of available frameworks. Node.js, on the other hand, is useful for responsive operations with a low memory footprint and when a developer wants to work on both the front-end and back-end in the same technology (which is the genesis of Node.js).
3. Data connectors
The .NET connector-framework was replaced with a new framework based on Java because the support for .NET on Linux is via .NET Core, which was introduced in 2016, and does not contain all the functionality of the .NET framework for Windows. The connector framework acts as a pipe for transferring data. On top of this, the actual drivers for accessing most of the database providers are written in Java, so it was only natural to code the framework in Java too. The actual data crunching is done inside the ElastiCube, which is coded in C/C++.
Takeaway: Java is a natural choice for building data connectors due to its large ecosystem including database drivers and rich frameworks.
In summary, there are certain languages most appropriate for certain operations, and choosing the correct language for the operation at hand is key.
Another critical change in the Linux deployment was related to the architecture itself. While many components in the Windows deployment are microservice based, given the opportunity to re-architect Sisense, we decided to build a containerized microservices application using Docker for containerization and Kubernetes for orchestration.
We initially debated between Docker Swarm and Kubernetes for orchestration but decided to go with Kubernetes due to the rising popularity and the fact that Kubernetes was becoming the de-facto standard for container orchestration. While our teams were comfortable with Docker Swarm, which is considered more of the DevOps way, Kubernetes better handled other developer requirements like versioning, upgrades, releases, and rollbacks. We decided to go with Kubernetes keeping the future developer user in mind.
An interesting debate that comes with building a microservices architecture is the number of microservices you’ll break your application into.
Two years ago, we had a fairly monolith application with four or five services. That is not the case anymore. We have around 20 services today. As a rule of thumb, we try not to create too many microservices, especially ones that lengthen the call chain. It is okay to add services that are not on the call chain. In a given operation, we shouldn’t involve all the microservices in the call chain (for example, 4-5 services is okay but not all available services). It is important to remember that while microservices are a great way of building scalable and resilient applications rapidly, they also add complexity, especially with communication between them and eventually debugging. You need to find a balance between the number of microservices you create, supportability, and maintainability.
A New Way of Doing Things with Shared Storage, Updated Monitoring & Logging
Re-architecting the platform also gave us the opportunity to update old ways of doing things and create better and highly-performant new ways. For example, the Windows way of creating highly-available data is to store copies of the data on multiple servers. With this re-architecting, we were able to do away with that and rebuild that experience enabling the use of highly-distributed and available Shared Storage technologies like cloud storage providers, GlusterFS, Amazon EFS, Azure file share, Google Filestore, and many more.
Another example is logging. One of the challenges with building a microservices-based architecture is debugging because of the number of components involved and all of the different places logs can be stored. One of the first steps we took to alleviate this was to build a combined log using FluentD, which collects all the data in a centralized place. In addition, we added Grafana and Prometheus, which provide counters of what’s going on in the system by providing a detailed view of system metrics.
Learnings Along the Way
While we have come out on the other side of a successful project, the journey was not without difficulties. Some of these were challenges that we learned from and others were limitations that we have had to work with in order to provide the best experience for the end user.
1. Embracing open source technologies
We learned that embracing well-tested and mature open source technologies are game changers in how quickly and efficiently we can build a large-scale, enterprise grade application. This tech is not something to be afraid of. Better yet, some of these technologies provide us with a completely different way of thinking about the problem (like the shared storage solution).
2. Wiping code and rebuilding where needed
We learned not to be afraid to wipe out code and rebuild. Today, we look back at a small portion of a component which we left in C++ and realize that it was a mistake. We could have saved time and done a better job by simply rewriting it. Keeping the code of C components that were not originally written for multi-threaded operations instead of rewriting them to make them multi-threaded was eventually more expensive.
3. Keeping customer and end-user experience in mind
When we embarked on the Sisense on Linux deployment, it was very clear to us that we wanted to provide the same user experience in both the Windows and the Linux deployments.
A big reason for this was to ensure that we can use the carefully curated and built automated testing assets across both deployments. The automatic testing assets (various databases, different schemas, dashboards, validated results) were collected and built for the last couple of years. Keeping the same automatic testing assets was a top priority. The ability to test both deployments with the exact same assets is an important tool to ensure we were retaining data integrity between the two systems. This meant that, in certain areas, we choose not to change something on the front-end that we could have changed in order to ensure the end-user experience was not affected.
We also wanted to make the transition process from Windows to Linux (if asked for) to be quick and painless. To address this, we built a migration tool that allows our customers to move over all the work assets from Windows to Linux seamlessly so that they do not have to worry about rework.
4. Organization-wide focus and cross-company collaboration
A critical component of our success lay in cross-collaboration across R&D teams, and later with non-R&D teams, across the company. The Linux deployment is a completely new platform that touches every aspect of our organization and, at any given point, we had a significant number of Sisense developers contributing to it.
Additionally, this required changes outside of R&D.
- Technical support teams needed to know how to debug issues and support customers using a completely new OS and new technology.
- Pre-sales engineers needed to know how to successfully install and demo the new deployment to customers and needed to learn about the details of the new architecture. To facilitate the training of the tech teams, the teams not only subscribed to external courses but we also flew internal R&D trainers around the globe to share and educate the teams at various sisense locations.
- Sales and marketing teams also needed to become familiar with cloud-technology and the benefits of the Cloud-Native Sisense on Linux deployment in order to convey these benefits to customers and prospects.
It was essential to garner buy-in across the organization with the full-support and prioritization coming from senior leadership. Without a vision and cross-organization goals, no project like this could come to fruition.
The Cloud-Native Sisense on Linux deployment marked a milestone in our journey as we became the only data and analytics platform with an advanced containerized microservices architecture that is purpose-built from the ground up with best-of-breed-technologies like Docker containers and Kubernetes orchestration that can be deployed on the cloud or on-premises. It provides the full Sisense platform including the Elastic Data Hub, which offers both live in-database connectivity to all major cloud databases, as well as Sisense’s proprietary In-Chip™ Performance Accelerator. The deployment fits seamlessly into DevOps processes and enables faster delivery, resiliency, and scalability.
We started this journey with a vision of building a true next-gen analytics platform that will lead the way in how organizations build large-scale analytic applications. We are proud of the platform being deployed with our customers today.
We successfully made this transition in little over a year and while we had some setbacks and difficulties (as with any project), the decisions around how to approach this project — like not shying away from rewriting components where needed — not only sped up the process but also allowed us to build a platform that can provide the most value to our customers in the cloud and the web based world we work in.
As we continue rolling out this new, full-stack Cloud-Native Sisense deployment, we are carefully working with teams across Sisense to make this a great experience for our customers and enable them to go from data to insights even faster in a highly-scalable and resilient environment.