Data quality monitoring made easy
Jul 16, 2021 • 11 min read
Jul 16, 2021 • 11 min read
Specifically, we focus on the following big questions asked from the point of view of the customer’s chief architect:
The choices are captured in the following decision flow diagram:
Many vendors offer In-Stream Processing as a “feature” of a broader Big Data processing platform rather than as a separate service that is loosely coupled with their Big Data platform and, therefore, can be integrated with other Big Data platforms and services. For many customers, tight coupling of the In-Stream and Big Data processing platforms is not practical because technology decisions about Data Warehouses, Data Lakes and Batch Analytics are made at different times, by different organizations, based on different selection criteria than those used to choose an In-Stream Processing platform. Even if a comprehensive Big Data platform is already in place, the choice of a stream processing feature for that platform shouldn’t be predefined, since a standalone, self-sufficient In-Stream Processing product may fit actual and prospective business requirements much better.
Big Data applications are big drivers of cloud infrastructure adoption, so it should not surprise anyone that all major cloud providers are investing heavily in Big Data APIs in general, and streaming APIs in particular. Choosing a specific cloud vendor for streaming APIs has several compelling advantages, including speed of implementation, SaaS consumption and delivery model, and integration with other APIs of the cloud platform. The major concern, of course, is the implication of that choice: in all likelihood, getting out of that cloud platform later will not be practical without massive costs.
The choice to pick a specific cloud API should not be made lightly. If your company has already made a strategic commitment to a specific cloud and its APIs, it might be a moot point. The APIs of that cloud provider should be considered the default choice because, presumably, that’s why you chose that provider. However, if your company has not yet made such a commitment or has adopted a more balanced multi-cloud strategy, cloud portability is an essential consideration. The preferred choice would most likely be open source technologies or vendor products that can be deployed and run on any cloud with minimal operational implications.
Blueprints, sometimes called reference architectures, can be powerful accelerators and enablers for companies that have decided to build their own systems using open source technologies deployable on any cloud rather than to buy vendor products.
Such companies are Grid Dynamics’ traditional customers. They face a substantial battle to figure out:
Knowing how to make the right design choices and how to answer these and similar questions is our business. Grid Dynamics is an engineering services company specializing in Big Data in general, and In-Stream Processing in particular, using open source technologies and cloud environments.
Beside working on customer projects, we have a research lab where our architects work to identify repeatable business use cases that can be addressed with repeatable design patterns and work to turn these design patterns into reusable blueprints. These blueprints are our intellectual property, and we make them freely available to our user community.
When a blueprint matches the business use case closely, the time-to-market can be 30% to 50% faster than starting from scratch. That’s because a lot of design choices have already been made, tools pre-integrated, and environments pre-configured. Making modifications to a proven design is much faster than creating a brand-new design.
Grid Dynamics makes money by consulting on design modifications, providing implementation services, and managing the resulting systems according to SLAs. This works well for our customers, who can rely on us as a design, implementation, and managed services partner to supplement their in-house teams — using 100% open solutions developed in a fraction of the time and at a fraction of the cost of proprietary alternatives. Needless to say, this works well for us, too; we get to monetize our experience and research by providing value other vendors can’t. And if a customer chooses to use our blueprints without our help, we are still delighted, as that’s how we gain loyal friends in high places.
For all these reasons, we have created a blueprint called In-Stream Processing Service that will be described in detail
Sergey Tryuber, Anton Ovchinnikov, Victoria Livschitz