Multi-tenancy is a software architecture common in cloud computing where a single instance of software provides a service for multiple customers or “tenants.” A noteworthy example of multi-tenancy software is Lambda, a serverless technology provided by Amazon Web Services (AWS).
Our Natural Language Generation (NLG) platform, Quill, has traditionally been hosted as a multi-tenant solution. That said, our business has expanded rapidly over the years, and we’ve run into a number of limitations for this deployment model, particularly around security, scalability and reliability.
But by adopting AWS Lambda as one of our primary compute platforms, we’ve been able to to solve many of our software-related problems.
Why Multi-tenancy is a win-win situation
For many years, hardware virtualization and multi-tenancy have been the building blocks of providing hosted software services to customers, while creating a cost model that worked for software providers. This model gave customers the ability to use software without the cost and hassle of hosting it internally within their own datacenters. In many cases, this was a win-win situation.
We have traditionally hosted Quill in this way, and it worked well for years. But as the company grew, we ran into a number of limitations.
Challenges with Multi-tenancy
We have faced three key challenges with multi-tenancy: reliability, security and scalability.
Typically, customers don’t think about how their actions can impact other platform users – nor should they. However, on a multi-tenant platform, reliability becomes a main concern. For example, if one customer decides to run a load test against a production endpoint that happens to be a multi-tenant environment, they can cause an outage for everybody in the environment. Companies typically safeguard against this with rate-limiting or try to absorb it with autoscaling, but these solutions don’t always work as planned.
Security is another challenge. To reduce the risk of a breach or other cyber threat, many customers demand that their data be physically separated from other customer data. This can be problematic in a multi-tenant environment, where customers are co-hosted on the same physical or virtual hardware. Traditional scaling techniques, such as dynamically spinning up new virtual hosts to meet demand, have their limitations.
Even on a cutting-edge cloud provider such as AWS, it takes a minimum of tens of seconds to provision and initialize new hosts. If a sudden spike in traffic hits a traditional multi-tenant system and it becomes overwhelmed, even a well architected system will experience increased latency or downtime. This means that many companies provision (and pay for) enough hardware to handle spikes in traffic, even though it is sitting idle most of the time.
Lambda to the Rescue
So, how can we isolate customers, provide scalability without cross-impact, AND keep the economics in line with building a profitable business? Narrative Science turned to AWS Lambda to achieve all of these goals.
AWS Lambda is a compute service that allows users to run code without provisioning or managing servers. Lambda automatically manages the compute resources needed to run code in response to certain events – zero administration needed. In short, a user of Lambda uploads a ZIP file full of code, and Lambda then makes that code available via an endpoint. Send data to that endpoint, and the result of the code is returned. This simple concept can be used to build everything from simple APIs to fully featured webapps.
Lambda “just handles” scaling for us. Now, we have no need to run hardware just in case a customer suddenly makes a large volume of requests. We can also understand how an individual customer’s transaction volume relates to how we support them.
Lambda also provides isolation. It is a container technology, meaning you can wrap up a single “instance” of all the software necessary to service that customer into a single “function.” Each customer lives in its own Lambda function, with all of the customer’s data and code logically isolated from all other customers. This allows us to provide a unique environment for each of our customers, eliminating a data security concern.
Lambda’s cost model is “pay for what you use” rather than “pay for the maximum amount you might need”. Narrative Science regularly sees spikes in traffic where our platform’s throughput spikes by factors of almost 100. Prior to using Lambda, we needed to provision enough compute resources to handle these spikes in traffic. Now, Lambda handles scaling for us. It has simultaneously decreased compute costs by more than a factor of 10 while also making our systems more resilient, secure, and scalable.