Turn Data Into Insights Faster With Grid Dynamics Analytical Data Platform Accelerator on AWS Cloud

Turn Data Into Insights Faster With Grid Dynamics Analytical Data Platform Accelerator on AWS Cloud

Oct 20, 2020 • 6 min read
Grid Dynamics

Every company wants to take advantage of their data to turn it into actionable insights. It can be used to build customer 360, reduce customer churn, plan marketing campaigns, and optimize pricing, inventory, and supply chain. Data is also key to increasing productivity and the efficiency of the workforce. In order to manage increasing volume, velocity, and variety of data without sacrificing quality, security, and accessibility, companies need to build a robust analytical data platform.

To reliably generate high-quality insights, augment decision making with AI and ML, and increase the level of business intelligence, the data analytics department needs to follow three steps:

Steps two and three are usually repeated in that new data and insights are continually added and generated over time. But business value is only ever created in Step 3 and is dependent on the quality of the insights that are generated. So while executives want to get to insights as soon as possible, without a robust foundation in the form of an analytical data platform, data analysts, scientists, and engineers will not be productive.

To help companies reduce the cost, effort, time, and risk of building analytical data platforms, Grid Dynamics together with AWS created an accelerator that should satisfy the data analytics needs of small technology startups and large enterprises.

What is an analytical data platform?

In the article describing the journey from data lakes to analytical data platforms, we analyzed why traditional data lakes do not satisfy the needs of modern data analytics teams. The blueprint of an analytical data platform includes capabilities for easy and secure access to data by analysts and scientists, data governance tooling, stream analytics, data monitoring and quality, enterprise data warehousing, reporting and visualization tooling, as well as AI/ML platforms.

Together, all these capabilities facilitate DataOps and MLOps processes and provide a solid foundation for data scientists, analysts, and engineers to generate business value quickly and reliably.

One accelerator for three use cases

Companies can use the analytical data platform accelerator in three different cases:

  1. Cloud migration – if a company has a data lake or enterprise data warehouse on-premises and plans to migrate data analytics to the cloud, it can use the accelerator to set up an analytical data platform in the AWS cloud and start migrating data and implementing AI/ML use cases.
  2. Build a new platform – if a company is launching an enterprise-wide initiative to improve data analytics, it can use the accelerator to set up an analytical data platform in the AWS cloud to start onboarding data sources and implementing AI/ML use cases.
  3. Upgrade data lake – if a company has already built a basic data lake in AWS but lacks enterprise data analytics capabilities and tooling for DataOps and MLOps.

No matter what the starting point, the analytical data platform accelerator built by Grid Dynamics and AWS can provide the following benefits:

  • Feature completeness – the platform has all the necessary capabilities to satisfy the data analytics needs of modern enterprises and uses cloud architecture and security best practices.
  • Supports DataOps and MLOps – the accelerator contains key technology enablers for DataOps and MLOps best practices, many of which we described in our “5 Technology Enablers for DataOps” article.
  • Low risk and high reliability – all components have been battle-tested and pre-integrated to ensure seamless end-to-end functionality.
  • High speed to market – the entire platform can be provisioned in one day and integrated with the company's ecosystem with onboarding of the first data sources in a matter of weeks.
  • No additional license costs - the accelerator is based on AWS cloud services and open source components, so the company only needs to pay the cloud costs.

Architecture and technology stack

The analytical data platform is built with AWS cloud services, open source components, and includes other Grid Dynamics accelerators for data quality, data monitoring, and anomaly detection. It has a modular structure, so companies don’t have to provision the entire platform. Although the core accelerator is industry-agnostic, it contains two sample AI/ML use cases from retail industry, which demonstrate the end-to-end data and ML pipeline, image recognition for automatic product attribution, and promotion planning.

The high-level architecture of the accelerator is outlined in the diagram below:

The following capabilities of the analytical data platform are implemented as separate modules and can be provisioned independently:

  • Base platform – hosts foundational components required for operating the rest of the platform.
  • Data Lake – is built on top of S3 with data catalog and data lineage available in Apache Atlas.
  • Batch analytics – contains necessary components for typical batch analytics use cases such as data lake with EMR, EDW with RedShift, and data pipeline orchestration with Apache Airflow.
  • Stream analytics – covers the stream processing and stream analytics use cases and examples with Amazon Kinesis, Apache Spark, and pipeline orchestration with Apache Airflow.
  • Data governance – provides tools for data catalog, data glossary, data lineage, data monitoring, and data quality. The data catalog, glossary, and lineage are implemented with Apache Atlas. Data monitoring and quality capabilities are implemented with Grid Dynamics accelerator based on ElasticSearch, Grafana, k8s, and a number of custom applications.
  • Anomaly detection – for more information about anomaly detection architecture and the technology stack, refer to a separate article on how to add anomaly detection to your data pipelines.
  • CI/CD – deployment automation is implemented on top of the native AWS CloudFormation stack. The deployment pipeline is responsible for services provisioning like EMR or EKS, docker images creation and publishing in ECR, and deploying executable code such as jars for batch and streaming use cases and Airflow DAGs.
  • AI/ML use cases – we included two AI use cases in the accelerator to demonstrate end-to-end functionality. All use cases are implemented with Amazon Sagemaker and Jupyter Notebooks and use the data we prepared in the data lake and EDW. One of the use cases contains a model to detect attributes in an e-commerce product catalog, while the second implements price optimization and promotion planning. We kept the use case implementations simple for the demo purposes. If you are interested in production implementation of these use cases, please read these articles about product attribution with image recognition and price and promotion optimization or reach out to us.

Following enterprise cloud best practices, different parts of the accelerator can be provisioned in separate VPCs or cloud projects.

This is an example of a single cloud project installation:

How to get started?

The easiest way to get started with the analytical data platform is by using the AWS Service Catalog. In advanced cases, a company may decide to provide the platform via a self-service portal. ServiceNow, Jira, or custom portals can be integrated with the AWS Service Catalog to orchestrate provisioning of necessary capabilities and AI use cases, guaranteeing compliance with the internal security and IT policies.

In the next article we’ll provide a step by step getting started guide for provisioning the solution using the Service Catalog.

Grid Dynamics offers services to plan, design, prototype, integrate, and implement the analytical data platforms accelerator in the client ecosystem, along with onboarding of batch and streaming data sources, migration of data and platform components from on-premise to the cloud, and implementation of required AI/ML use cases. The typical engagement models can be found below:

Conclusion

Building an analytical data platform can be a daunting and difficult task. Most companies want to get to insights as soon as possible and avoid investing too much time and effort into the foundational capabilities. To address this need, Grid Dynamics created a pre-integrated accelerator for the modern analytical data platform and published it on AWS service catalog. We are excited about this new solution and welcome everybody to use it to increase their speed to market and reduce the risk and effort of implementing the platform from scratch. If you are interested in a demo or would like to explore how the accelerator can help you, please reach out to us.

Subscribe to our latest Insights

Subscribe to our latest Insights