It really is such a vague question. I am guilty of asking it myself. I look back at every evaluation scorecard I have built in the past and invariably there is the “scalability” line. Sure, it sounds all technical and precise, but it really isn’t.
Here at PurePredictive we get hit with the scalability question all the time (probably because everyone starts from the same evaluation template that I did – and who wants a system that doesn’t scale?). I suppose the answer someone is looking for all depends on who is asking. Let me try a few perspectives on for size. I would love to get your thoughts.
When we talk to somebody about the business concept of intelligently automating machine learning it sounds so amazing. You mean I can generate high quality predictive models specific to my data and my business really fast, and I can deploy it in production at the click of a button ready to integrate into my systems or business process RIGHT NOW? If it works, the product should sell itself. Then comes the question – Does it scale? The question in this case is talking about our ability to grow the business. What kinds of efficiencies are there? What kind of capital is required to keep it going? What is your target market (how large can it scale)?
When we tell people that it is a multi-tenant web application hosted in the cloud the meaning shifts. How many simultaneous users can it support? Is there a limit to how many transactions can be run? How many programs can an account hold? How many predictions can I run?
Being a web application on the cloud brings some significant advantages. I can dynamically scale resources up and down based on demand. This is essential when dealing with a compute heavy environment. Maintaining the amount of infrastructure required to handle peak demand at scale alone is a daunting and expensive task. The cloud makes this manageable.
Of course, since our product depends on quality data, somebody is bound to ask how much data will it handle - Because, of course, they have BIG data. (and we all know how clear the definition of “big data” is). In reality, even if you do have petabytes of data you wouldn’t use all of it to build a predictive model. Not all features of the data are relevant, and you don’t need every observation. A significant sample is sufficient.
Each of these perspectives alone is unique and challenging. At our current stage all three perspectives are relevant. So, if you were to ask me, “Do you scale?” from any of these perspectives I would answer yes! Some better than others, and we are constantly pushing the boundaries for all three of these perspectives. We will never be done.
The best way for us to answer these questions is for you to hop on the platform and put it through its paces. We have launched our product in beta. You can register at http://www.purepredictive.com/registration. We would love to hear your feedback and see where you push us. The beta is free. And as a thank you for your help any model you deploy during the beta period we will continue to host for free as long as there is value in it.
James Lovell; VP Product Engineering