A Multi-Objective Price Optimization Framework in Stores Using Reinforcement Learning
Inventory optimization is one of the core merchandising operations for retailers. A key objective of the inventory optimization workstream is to reduce inventory on or by a specific date, often times through price reduction. Typically, this is done to make space for new items, reduce inventory of overstocked or seasonal items, or move perishable items such as meat or milk – but there are other reasons as well. As the world’s largest retailer and grocer, Walmart strives to find the optimal price point for its general merchandise and groceries. However, when inventory reduction becomes necessary it traditionally implemented a price reduction strategy, which may require up to three different pricing changes, to help move unsold inventory from its stores.
Inventory reduction is a challenge for every retailer, but Walmart’s size and scale means the challenges, opportunities and costs are greater than other retailers:
- More than 4,750 U.S. retail stores, including more than 3,500 Supercenters that can carry more than 125,000 SKUs each
- More than 800 product categories across Grocery, Health and General Merchandise, of which more than 200 categories experience price reductions each year
- Shelf space containing slow-moving or unsold merchandise needs to give way to opportunities to stock them with better-selling items, which impacts revenue and increased costs
- Merchant’s input into the suggested price reductions needs to be taken into consideration as it increases the complexity of their inventory optimization operations and impacts anticipated revenue.
With Walmart’s previous strategy, the time and labor cost required to re-label discounted merchandise up to three times – across all stores and product categories -- was substantial. Determining reduction strategies had to be done on a store-by-store basis, as not all stores had the same inventory optimization issues. Afterwards, the effort to remove and replace merchandise was also significant. In order to maximize sales revenue and reduce operating costs, Walmart created an intelligent algorithm that accelerates sales of time-sensitive merchandise across individual stores, categories and SKUs by optimizing price reductions and minimizing the associated costs of relabeling and removing time-sensitive goods.
Walmart’s inventory optimization algorithm ingests data from individual stores including aggregated sales data and operating costs, how much and what types of merchandise to reduce, and the dynamic time frame for when the merchandise must be sold in order to make way for new merchandise. The core algorithms used to determine the repricing policy originate from mathematical optimization and deep reinforcement learning techniques. This approach applies data analytics, reinforcement learning, and dynamic optimization to make automated decisions (e.g., when to initiate price reductions and by what percentage price reduction percentage of markdown, when to give the markdown price) for each individual product at each store. This results in a high-performance model and a price adjustment policy tailored to each store.
Using machine learning to find the single best optimized price for merchandise and grocery items, Walmart is now able to successfully sell up to 80 percent of time-sensitive items with its initial price reduction, rather than going through the processes three separate times. This has resulted in lowered operating costs and increased sales of these designated items. This has resulted in lowered operating costs and increased sales, with some stores experiencing up to 15% higher sales of these designated items.
Freed from the constant task of re-labelling items, Walmart associates are able to perform higher-value tasks, such as helping more customers and providing better service. The algorithm has also had a significant impact on the daily lives of about 3,000 merchants by shifting responsibilities from repetitive, manual data-entry and manual decision-making on price fluctuations to handling exceptions and strategic scenario planning.
Walmart’s new inventory reduction protocol was implemented in early 2019 and has achieved outstanding sell-through rate and cost-saving performance. The system has successfully reduced excess inventory while boosting sales of that inventory, resulting in significant cost savings. This also improved customer purchasing power and reduced associate workload and related costs in stores. The operational savings and efficiency gains from Walmart’s inventory optimization algorithm gets passed along to customers in the form of its Everyday Low Prices, so customers get the right product at the right price every time.