Multi-agent deep reinforcement learning for multi-echelon supply chain optimization
Jun 10, 2020 • 11 min read
Jun 10, 2020 • 11 min read
Pricing decisions are critically important for any business, as pricing is directly linked to consumer demand and company profits. Even a slightly suboptimal decision-making process inevitably leads to tangible losses, and major mistakes can have grave consequences.
Optimal pricing is a challenging problem for several reasons. One is the complex structure of the price waterfall, which often includes multiple variables such as list prices, discounts, and special offers that need to be optimized. Another reason is the complexity of demand and profit forecasting, which makes it difficult to evaluate new pricing strategies accurately. Finally, non-optimal pricing decisions are often caused by a lack of coordination between the teams that are responsible for various aspects of pricing.
Although the fundamentals of price optimization are well understood, our experience with dozens of leading retailers clearly indicates that retail practitioners are struggling with certain pricing decisions. Even ones that presumably make the right pricing decisions are often uncertain if they have unharvested profits or avoidable losses due to suboptimal pricing, and the lack of tools and techniques to measure it quantitatively.
At Grid Dynamics, we know that economic modeling and machine learning can greatly help improve the quality of pricing decisions. To demonstrate how it works and simplify the development of similar solutions for our clients, we created a reference implementation of a price management tool that showcases the main capabilities of AI-based price optimization and also features several advanced techniques.
The price management process has to deal with many variables and use cases because pricing typically has a complex structure. As a basic example, consider a retailer who buys a certain product from a supplier at a supplier price, adds a markup to obtain a list price, optionally applies one or more markdowns, and finally accounts for variable and fixed costs to calculate the profit margin. The price structure of this imaginary retailer is visualized in the chart below (the so-called price waterfall):
Even this oversimplified environment requires building a price management system that understands the difference between markups and markdowns, knows how each of these two price components influences the demand and profit, and can find the optimal trade-off between the two. A more realistic setup is likely to include many more variables such as different promotion types, inventory constraints, differences between stores and regions, and product substitution effects. Ideally, the system should be designed as an extensible framework that provides fundamental capabilities such as demand prediction, and new input variables or constraints can be plugged in depending on a particular business model and set of use cases. We will describe how to achieve this goal in the sections dedicated to solution design and predictive modeling.
Although the core system can be extensible enough to support a wide range of use cases, its interface and consumed data elements need to explicitly support concrete use cases. We decided to use promotion price optimization as the primary business case, and create case-specific services and interfaces that demonstrate how this aspect of price management can be solved using our system.
To define the problem in a more formal and detailed way, we have chosen a set of assumptions that are typical for many apparel retailers. First, we assumed that the product list prices are fixed, and that merchandisers maintain a database of promotions in which each promotion is configured by triggering rules (e.g., the purchase total must be more than $100) and actions (e.g., provide a $20 discount). For each transaction, the pricing engine pulls active promotions from this database based on products in the shopping cart, and then calculates the final sales price applying these promotions, as illustrated in the figure below:
Next, we decided to support several types of promotions that are typical for apparel retailers and department stores:
The cart-based promotions are particularly tricky. Since the rules are applied to individual shopping carts, the final sales price of a product is unique for each transaction and can vary depending on other products in the cart. Consequently, the sales price of any given product at any moment can be described only as a statistical distribution of prices rather than as a single number. This can be a significant challenge to building the price-demand model needed for promotion optimization, but this promotion type is very common and thus worth some research.
In the settings defined above, a merchandiser needs to make a number of decisions:
These questions are traditionally answered using last year’s data combined with keen tracking of the current sales data and trends. This approach may be more or less efficient depending on the nature of merchandise, properties of the customer base, skillfulness of merchandisers, and external factors such as market growth or decline. Our experience with many large retailers indicates that, on average, traditional techniques are not optimal in the sense that profits are partly lost due to “harmful” discounts that should be removed. Revenues are also much lower than they could be because of a mismatch between the price and demand. The impact of these issues may or may not be significant, but it is quite remarkable that most companies do not have the tools and analytical techniques needed to estimate this gap in optimal pricing, and thus do not really know whether they have a problem or not.
If we can build a digital model of a retailer and its customers that allows a what-if analysis of promotion-related scenarios (e.g. how profit will change if we introduce a 5% discount), one of the immediate benefits would be the ability to quantitatively assess pricing decisions. For example, a merchandiser would be able to forecast the incremental profit delivered by a new promotion and make sure that it does not interfere with other promotions before running a promotional campaign. This capability alone can help to prevent losses and streamline the promotion management process. The model then can be used to automatically answer all the questions listed in the beginning of this section: find profit-optimal combinations of promotions, tune promotion properties, and find new promotional opportunities.
We envisioned a system that provides a merchandiser with a simple, but powerful promotion optimization workflow. The flow starts with the selection of a product category, and once the category is selected, the system takes the merchandiser through the steps illustrated with the wireframes below:
The solution described above relies on the ability to accurately predict revenue, profit, and demand, taking the parameters of the planned promotional campaigns into account. It requires building predictive models for these values that can be later used for manual what-if analysis or automatic optimization. To better understand how these models can be designed and used, let us briefly review the basics of the price management theory.
The most basic scenario one can consider is the static optimization of the list price for a single product. Assuming that we have historical data where product price varied over time, we can try to train a predictive model to learn the price-demand dependency, and find a price point that maximizes the revenue, which is a product of price and demand (the gray rectangle in the figure below):
The above figure also suggests that having any single price may not be optimal because some revenue (and therefore profit) remains unharvested. At any given price point, we are likely to have customers who would be buying the product even at a higher price (and thus delivering additional profits). At the same time, we would have customers who do not buy the product, but potentially could at a lower, but still profitable price. Marketers typically work around this limitation by dividing the market into segments and setting different price points in each segment to capture additional profits, as illustrated below:
Examples of such segmentation are: separation of mainstream and luxury sub-brands, different prices in areas with high and low incomes, and more. Limited-time promotions are also a segmentation technique, but the segmentation is done over time - a retailer first captures profits at a regular price from less price-sensitive customers, and then captures additional profits at a discounted price from more price-sensitive customers.
We can conclude that the price optimization system should be designed to predict future revenue, profit, or demand for a certain period of time as a function of variables like list price, discount depth, and competitor prices. This is done for various segments that can be defined in terms of customers, store locations, time periods, and products. The total profit can then be summed across all of the segments:
At a conceptual level, a solution designed this way provides promotion optimization capability (where time is the segmentation dimension and discount is the variable to be optimized), but also provides enough flexibility to support other business cases ranging from list price optimization to assortment optimization.
In some environments, revenue and profit can be straightforwardly and deterministically calculated from the demand: revenue is a product of price and demand, and profit is a product of margin and demand. Consequently, it can be enough to build only a demand prediction model and multiply its output by price or margin to obtain a revenue or profit function that, in turn, can be plugged into an optimization algorithm. Unfortunately, this is not possible in the environment we described above, as promotions can be applied to a shopping cart, which breaks down the simple relationship between the quantity sold and revenue: one product can be sold at different prices depending on other products in the cart. To work around this issue, we decided to create three different models to predict revenue, profits, and demand, respectively. These models share the same feature design, but are fitted separately for these three training labels.
The model is designed to predict the metric of interest (revenue, profit, or demand) for a given individual product at a given date. The input feature vector includes the following groups of variables:
Finally, price and promotion variables were calculated not only for the date to be predicted, but also for historical dates with lags of one week, one month, and one year to account for autocorrelations, as shown below:
To build the forecast, the system moves from left to right and fills in the predicted values for every single day. Note that the earliest predictions on the left can be used to build lagged features for the later predictions on the right. For example, the forecast for the 14th day can use the forecast for the 7th day as an input.
Another important thing to consider is that the feature vector incorporates product attributes, and thus the same design can be used to forecast the demand and profit for new products without sales histories or shorts sales histories .
For training and validation, we used statistics from real retailers to create a probabilistic generative model that reflects the main purchasing and promotional patterns for orders that occur in the real world. This model was used to generate 3 years of order history for one product category of 10K products and 50K products, making up a total of 500K orders. This order history included 60 promotional campaigns total. The gradient boosted decision trees model was trained on this historical data, and was then used to predict a validation sample of 100 days. An example validation chart for the profit for one category is shown below:
The prediction accuracy of the model is generally good for practical purposes of promotion evaluation. However, one of the main challenges is accurately predicting sales spikes for slow moving products, as the model tends to underestimate such spikes. This problem partly stems from the limited amount of historical data that we used, which meant that the spikes in our data are rare. This issue can be mitigated by training on a larger number of spikes, or through more elaborated modeling specifically for this case.
Once we have revenue, profit, and demand prediction models for individual products, we can then plug them into various optimization algorithms:
The technical solution includes the modeling subsystem and promotion optimization subsystem. The modeling subsystem consolidates transactional, promotional, and catalog data, and makes this data available for a data scientist who does feature engineering and designs the predictive models. The models are scheduled for regular re-training and made available for the promotion optimization system. Technologically, the modeling part of the solution is based on Spark, Python, and Python ML libs.
The optimization system includes a user portal where promotion and campaigns can be configured, and an optimization server that forecasts the performance of individual promotions or the entire promotion mix. This architecture is illustrated below:
The user portal was created based on the solution vision mockups we described earlier in the corresponding section. The following screen recording demonstrates how a merchandiser does a what-if analysis of the promotion mix:
Before we wrap up this post, let’s briefly discuss several questions that were frequently asked in connection with this work:
 Marshall Fisher and Ananth Raman, The New Science of Retailing: How Analytics are Transforming the Supply Chain and Improving Performance, Harvard Business Review Press, 2010