The M5 Competition

The M5 Competition will start on March 2, 2020 and finish on June 30 of the same year. It differs from the previous four ones in four important ways, some of them suggested by the discussants of the M4 Competition.

  • First, it uses hierarchical sales data, generously made available by Walmart, starting at the item level and aggregating to that of departments, product categories, stores, and three geographical areas: the US States of California, Texas, and Wisconsin.
  • Second, besides the time series data, it also includes explanatory variables such as price, promotions, day of the week, and special events (e.g. Super Bowl, Valentine’s Day, and Orthodox Easter) affecting sales and used to improve forecasting accuracy.
  • Third, in addition to point forecasts, the distribution of uncertainty is being assessed by asking participants to provide information on four indicative prediction intervals and the median.
  • Fourth, the majority of the more than 43,000 time series display intermittency (sporadic sales including zeros).
Download Guidelines PDF

Aim

The aim of the M5 Competition is similar to the previous four: that is to identify the most appropriate method(s) for different types of situations requiring predictions and making uncertainty estimates. Its ultimate purpose is to advance the theory of forecasting and improve its utilization by business and non-profit organizations. Its other goal is to compare the accuracy/uncertainty of ML and DL methods vis-à-vis those of standard statistical ones, and assess possible improvements versus the extra complexity and higher costs of using the various methods.

Expectations & Methods content

Given the success of the previous four M-Competitions, the considerable number of participants attracted, and the significant contributions made, fundamentally changing the field of forecasting, similar or even higher achievements are expected from the M5 Competition, that is aimed at the fast growing data science community which will have easy access to the M5 dataset, as it will be run using the Kaggle Platform. It is our expectation therefore, that the number of participants will be several thousands.