#### Multi-agent deep reinforcement learning for multi-echelon supply chain optimization

**Ilya Katsov**

Jun 10, 2020 •

**11 min read**

Aug 07, 2018
• **13 min read**

Pricing decisions are critically important for any business, as pricing is directly linked to consumer demand and company profits. Even a slightly suboptimal decision-making process inevitably leads to tangible losses, and major mistakes can have grave consequences.

Optimal pricing is a challenging problem for several reasons. One is the complex structure of the price waterfall, which often includes multiple variables such as list prices, discounts, and special offers that need to be optimized. Another reason is the complexity of demand and profit forecasting, which makes it difficult to evaluate new pricing strategies accurately. Finally, non-optimal pricing decisions are often caused by a lack of coordination between the teams that are responsible for various aspects of pricing.

Although the fundamentals of price optimization are well understood, our experience with dozens of leading retailers clearly indicates that retail practitioners are struggling with certain pricing decisions. Even ones that presumably make the right pricing decisions are often uncertain if they have unharvested profits or avoidable losses due to suboptimal pricing, and the lack of tools and techniques to measure it quantitatively.

At Grid Dynamics, we know that economic modeling and machine learning can greatly help improve the quality of pricing decisions. To demonstrate how it works and simplify the development of similar solutions for our clients, we created a reference implementation of a price management tool that showcases the main capabilities of AI-based price optimization and also features several advanced techniques.

The price management process has to deal with many variables and use cases because pricing typically has a complex structure. As a basic example, consider a retailer who buys a certain product from a supplier at a supplier price, adds a markup to obtain a list price, optionally applies one or more markdowns, and finally accounts for variable and fixed costs to calculate the profit margin. The price structure of this imaginary retailer is visualized in the chart below (the so-called price waterfall):

Even this oversimplified environment requires building a price management system that understands the difference between markups and markdowns, knows how each of these two price components influences the demand and profit, and can find the optimal trade-off between the two. A more realistic setup is likely to include many more variables such as different promotion types, inventory constraints, differences between stores and regions, and product substitution effects. *Ideally, the system should be designed as an extensible framework that provides fundamental capabilities such as demand prediction, and new input variables or constraints can be plugged in depending on a particular business model and set of use cases*. We will describe how to achieve this goal in the sections dedicated to solution design and predictive modeling.

Although the core system can be extensible enough to support a wide range of use cases, its interface and consumed data elements need to explicitly support concrete use cases. We decided to use promotion price optimization as the primary business case, and create case-specific services and interfaces that demonstrate how this aspect of price management can be solved using our system.

To define the problem in a more formal and detailed way, we have chosen a set of assumptions that are typical for many apparel retailers. First, we assumed that the product list prices are fixed, and that merchandisers maintain a database of promotions in which each promotion is configured by triggering rules (e.g., the purchase total must be more than $100) and actions (e.g., provide a $20 discount). For each transaction, the pricing engine pulls active promotions from this database based on products in the shopping cart, and then calculates the final sales price applying these promotions, as illustrated in the figure below:

Next, we decided to support several types of promotions that are typical for apparel retailers and department stores:

- Product level percent-off. The promotion is defined for a group of products that meet a certain filtering criteria such as a brand name. For example, “5% off on Calvin Klein”.
- Product level dollar-off. The same as above, but with a discount defined in terms of dollars, not percents. For example, “$10 off on Calvin Klein dresses”.
- Buy One Get One (BOGO). This is essentially a percent-off promotion with an additional condition to buy two product units. Once two product units are purchased, a customer effectively gets a 50% discount on each.
- Cart-based percent-off. The promotion is applied to a shopping cart rather than individual products. For example, “20% off on orders over $200”.

The cart-based promotions are particularly tricky. Since the rules are applied to individual shopping carts, the final sales price of a product is unique for each transaction and can vary depending on other products in the cart. Consequently, the sales price of any given product at any moment can be described only as a statistical distribution of prices rather than as a single number. This can be a significant challenge to building the price-demand model needed for promotion optimization, but this promotion type is very common and thus worth some research.

In the settings defined above, a merchandiser needs to make a number of decisions:

- Does a particular promotion yield an additional profit or it would be better to disable it?
- What is the optimal discount value for a certain group of products?
- What are the best start and end dates for a given promotion?
- Is it possible to harvest additional profits or meet turnover targets by creating new promotions, and what would be the parameters of these promotions?

These questions are traditionally answered using last year’s data combined with keen tracking of the current sales data and trends. This approach may be more or less efficient depending on the nature of merchandise, properties of the customer base, skillfulness of merchandisers, and external factors such as market growth or decline. Our experience with many large retailers indicates that, on average, traditional techniques are not optimal in the sense that profits are partly lost due to “harmful” discounts that should be removed. Revenues are also much lower than they could be because of a mismatch between the price and demand. The impact of these issues may or may not be significant, but it is quite remarkable that most companies do not have the tools and analytical techniques needed to estimate this gap in optimal pricing, and thus do not really know whether they have a problem or not.

If we can build a digital model of a retailer and its customers that allows a what-if analysis of promotion-related scenarios (e.g. how profit will change if we introduce a 5% discount), one of the immediate benefits would be the ability to quantitatively assess pricing decisions. For example, a merchandiser would be able to forecast the incremental profit delivered by a new promotion and make sure that it does not interfere with other promotions before running a promotional campaign. This capability alone can help to prevent losses and streamline the promotion management process. The model then can be used to automatically answer all the questions listed in the beginning of this section: find profit-optimal combinations of promotions, tune promotion properties, and find new promotional opportunities.

We envisioned a system that provides a merchandiser with a simple, but powerful promotion optimization workflow. The flow starts with the selection of a product category, and once the category is selected, the system takes the merchandiser through the steps illustrated with the wireframes below:

- The first step allows a merchandiser to create a pool of individual promotions where each promotion is defined by a product filter, cart-level conditions, and a discount amount or percentage.
- The pool of promotions is then used to compose promotional campaigns. A campaign is a group of promotions that has start and end dates, and typically has clear business meanings and goals (e.g. Spring Clearance Sale). The merchandiser uses this screen to make a draft version of a promotional calendar where promotions are logically grouped and positioned on a time scale.
- The first two steps are merely a configuration exercise, but once the promotional calendar is sketched, machine learning comes into play. The system has demand, profit, and revenue prediction models to forecast the performance of individual promotional campaigns, as well as the overall performance of all campaigns together. A merchandiser can experiment with the promotion mix, turn campaigns on and off, and instantly see the forecast for profits, revenues, and expected quantity sold, as well as the uplift compared to the baseline.
- Although the merchandiser can manually search for the optimal combination of promotions by switching campaigns on and off and analysing the forecast, the system can automatically find the optimal offer combination based on the profit, revenue, or quantity maximization objective. This helps a merchandiser detect promotions that they should turn off.
- The system can not only find the optimal combination of campaigns, but it can also find new promotion opportunities, and propose additional campaigns that drive more profits. The merchandiser can review the proposals with the corresponding profit forecasts and accept or reject them. This capability helps detect unharvested profits and capture them.
- Finally, the manually created campaigns and the opportunities discovered by the system can be merged and optimized together to obtain the best promotional campaign combination.

The solution described above relies on the ability to accurately predict revenue, profit, and demand, taking the parameters of the planned promotional campaigns into account. It requires building predictive models for these values that can be later used for manual what-if analysis or automatic optimization. To better understand how these models can be designed and used, let us briefly review the basics of the price management theory.

The most basic scenario one can consider is the static optimization of the list price for a single product. Assuming that we have historical data where product price varied over time, we can try to train a predictive model to learn the price-demand dependency, and find a price point that maximizes the revenue, which is a product of price and demand (the gray rectangle in the figure below):

The above figure also suggests that having any single price may not be optimal because some revenue (and therefore profit) remains unharvested. At any given price point, we are likely to have customers who would be buying the product even at a higher price (and thus delivering additional profits). At the same time, we would have customers who do not buy the product, but potentially could at a lower, but still profitable price. Marketers typically work around this limitation by dividing the market into segments and setting different price points in each segment to capture additional profits, as illustrated below:

Examples of such segmentation are: separation of mainstream and luxury sub-brands, different prices in areas with high and low incomes, and more. Limited-time promotions are also a segmentation technique, but the segmentation is done over time - a retailer first captures profits at a regular price from less price-sensitive customers, and then captures additional profits at a discounted price from more price-sensitive customers.

We can conclude that the price optimization system should be designed to predict future revenue, profit, or demand for a certain period of time as a function of variables like list price, discount depth, and competitor prices. This is done for various segments that can be defined in terms of customers, store locations, time periods, and products. The total profit can then be summed across all of the segments:

At a conceptual level, a solution designed this way provides promotion optimization capability (where time is the segmentation dimension and discount is the variable to be optimized), but also provides enough flexibility to support other business cases ranging from list price optimization to assortment optimization.

In some environments, revenue and profit can be straightforwardly and deterministically calculated from the demand: revenue is a product of price and demand, and profit is a product of margin and demand. Consequently, it can be enough to build only a demand prediction model and multiply its output by price or margin to obtain a revenue or profit function that, in turn, can be plugged into an optimization algorithm. Unfortunately, this is not possible in the environment we described above, as promotions can be applied to a shopping cart, which breaks down the simple relationship between the quantity sold and revenue: one product can be sold at different prices depending on other products in the cart. To work around this issue, we decided to create three different models to predict revenue, profits, and demand, respectively. These models share the same feature design, but are fitted separately for these three training labels.

The model is designed to predict the metric of interest (revenue, profit, or demand) for a given individual product at a given date. The input feature vector includes the following groups of variables:

*Catalog attributes of the product.*These are mainly categorical variables such as color, product type, size, and material. These attributes are modeled as one-hot groups of binary flags, which is a common way to deal with categorical inputs.*Calendar attributes and events.*This is a group of binary flags that indicate whether the input date is a business day, holiday, Black Friday, and so on.*Prices and promotions.*This is the most complicated group: it includes list prices, promotion prices, promotion types, purchase prices, and several ratios (e.g. markup, which is the ratio between list price and purchase price).*Shopping cart statistics.*This group is a little bit tricky, and was created specifically for cart-based promotions. As we already mentioned in the section about use cases, some promotions are applied to the shopping cart as a whole, and thus the discount amount for any one product is undefined when this product is considered in isolation. We worked around this issue by calculating the average shopping cart total which contained a given product, and then determining the discount values that are most likely to be applied to such carts. These values were used as additional input features.

Finally, price and promotion variables were calculated not only for the date to be predicted, but also for historical dates with lags of one week, one month, and one year to account for autocorrelations, as shown below:

To build the forecast, the system moves from left to right and fills in the predicted values for every single day. Note that the earliest predictions on the left can be used to build lagged features for the later predictions on the right. For example, the forecast for the 14th day can use the forecast for the 7th day as an input.

Another important thing to consider is that the feature vector incorporates product attributes, and thus the same design can be used to forecast the demand and profit for new products without sales histories or shorts sales histories [1].

For training and validation, we used statistics from real retailers to create a probabilistic generative model that reflects the main purchasing and promotional patterns for orders that occur in the real world. This model was used to generate 3 years of order history for one product category of 10K products and 50K products, making up a total of 500K orders. This order history included 60 promotional campaigns total. The gradient boosted decision trees model was trained on this historical data, and was then used to predict a validation sample of 100 days. An example validation chart for the profit for one category is shown below:

The prediction accuracy of the model is generally good for practical purposes of promotion evaluation. However, one of the main challenges is accurately predicting sales spikes for slow moving products, as the model tends to underestimate such spikes. This problem partly stems from the limited amount of historical data that we used, which meant that the spikes in our data are rare. This issue can be mitigated by training on a larger number of spikes, or through more elaborated modeling specifically for this case.

Once we have revenue, profit, and demand prediction models for individual products, we can then plug them into various optimization algorithms:

- First, we used heuristic solver to search for the optimal combination of promotional campaigns that maximizes the total profit summed over all products in the category. This helps merchandisers find effective promotional strategies without having to manually test different options.
- Searching for new promotion opportunities is challenging from the computational standpoint because the search space is very large - it includes products to be promoted, discount values, and promotion dates. We solved this problem using simulation and multi-armed bandit algorithms: the system starts with some initial set of promotions, and runs a virtual experiment changing the promotion parameters using the demand prediction function as a source of “sales data”. This process gradually converges to near-optimal parameters.

The technical solution includes the modeling subsystem and promotion optimization subsystem. The modeling subsystem consolidates transactional, promotional, and catalog data, and makes this data available for a data scientist who does feature engineering and designs the predictive models. The models are scheduled for regular re-training and made available for the promotion optimization system. Technologically, the modeling part of the solution is based on Spark, Python, and Python ML libs.

The optimization system includes a user portal where promotion and campaigns can be configured, and an optimization server that forecasts the performance of individual promotions or the entire promotion mix. This architecture is illustrated below:

The user portal was created based on the solution vision mockups we described earlier in the corresponding section. The following screen recording demonstrates how a merchandiser does a what-if analysis of the promotion mix:

Before we wrap up this post, let’s briefly discuss several questions that were frequently asked in connection with this work:

*Can this solution be used for optimization of regular prices?*- Yes. It basically requires the reconfiguration of the optimization framework because the prediction model supports it directly.*Is it possible to predict the demand for new products?*- Yes. The attribute-based approach allows it to do that. The accuracy of prediction can be improved significantly by adding the initial sales data once it’s available [1].*Does this solution account for product substitution effects?*- No, but it should be possible to extend it this way. In fact, product substitution effects sometimes can be estimated using relatively simple statistical analysis, and can be sufficient for use cases like assortment optimization [1].*Can this solution help with supply chain management and inventory planning?*- Yes, but it requires some extensions. For example, it will be better to predict the distribution of demand rather than average demand for this use case [1].

- A wide range of price and assortment optimization tasks can be solved using a small set of fundamental capabilities, given that these capabilities are carefully designed using the best practices of economic modeling.
- The ability to accurately predict revenue, profit, and demand as a function of prices and discounts is the key to price and promotion optimization.
- Once the prediction models are developed, a number of optimization use cases can be supported by plugging the models into an optimization framework. Examples of such use cases are the automatic discovery of promotion opportunities, and the automatic tuning of promotional campaigns.
- Proper model design and feature engineering allows us to predict the performance of new products, optimize assortment, and solve many tasks for inventory management.

[1] Marshall Fisher and Ananth Raman, The New Science of Retailing: How Analytics are Transforming the Supply Chain and Improving Performance, Harvard Business Review Press, 2010