From reference architecture to reference implementation: detailing the DevOps aspects of In-Stream Processing Service

Big Data Nov 10, 2016 Grid Dynamics

by Sergey Plastinkin, Victoria Livschitz, Anton Ovchinnikov

In the previous four blog posts in this series we covered the reference architecture of a general purpose In-Stream Processing Service blueprint. To recap, here is a list of shortcuts to the blogs in that series:

In the next few posts we’ll present our reference implementation of that blueprint, and open source all of its components so that anyone can deploy and run the entire service platform on AWS (Amazon Web Services) within a few hours by using our deployment and orchestration scripts. 

This is the “DevOps” part of the story -- making the platform operational on the dynamic cloud infrastructure for development, testing and production purposes. The main topics will concern scalability, availability, portability and automation of the platform’s deployment and operations on any public cloud. 

We even developed a fully-functional demo application for real-time sentient analysis of twitter feeds for Social Movie Reviews that runs on our reference implementation out of the box. You can play with the interactive web application that lets you visualize public’s historic and real-time sentiments towards the latest movies, powered by our In-Stream Processing service here. We also wrote a series of blogs that explain the scientific process behind the work of the data scientists, shows every step in the process of developing the sentiment analytics application from the data scientist point of view, and illustrates how the machine learning models were trained, evaluated and tuned to perform the analytics. The series of blogs is collectively called “Data Science Kitchen: a hands-on primer on how data scientists create machine learning models, using Twitter stream sentiment analysis of social movie reviews as our teaching example.” Here is a link to the first post in that series, which we strongly advise you to read -- along with those that will come after it.

Now let’s jump into the details of the reference implementation, starting from a discussion of the technology stack used to automate the deployment and operational management.