/Dataset
Dataset2018-12-11T12:11:18+00:00

The Dataset

The M4 consists of 100,000 time series of YearlyQuarterly, Monthlyand Other (Weekly, Daily and Hourly) data.

The minimum number of observations is 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series.

The 100,000 time series of the dataset come mainly from the Economic, Finance, Demographics and Industry areas, while also including data from Tourism, Trade, Labor and Wage, Real Estate, Transportation, Natural Resources and the Environment.

The M4 Competition series, as those of the M1 and M3, aim at representing the real world as much as possible. The series were selected randomly from a database of 900,000 ones on December 28, 2017. Professor Makridakis chose the seed number for generating the random sample that determined the M4 Competition data. Some pre-defined filters were applied beforehand to achieve some desired characteristics, such as the length of the series, the percentage of Yearly, Quarterly, Monthly, Weekly, Daily, and Hourly data, as well as their type (Micro, Macro, Finance, Industry, Demographic, Other).

You can download the train dataset here: M4Dataset-train (.rar) | M4Dataset-train (.zip)
You can download the test dataset here: M4Dataset-test (.rar) | M4Dataset-test (.zip)
The dataset can be also found at the M4 GitHub together withe the M4 participants’ code.

If you are using R, the dataset is also available here: M4comp2018
(we would like to thank Rob J Hyndman’s PhD students Pablo Montero-Manso, Carla Netto, and Thiyanga Talagala for putting the M4 data on the R package at Github)

Additional information regarding the type, the frequency, the number of forecasts required per series and the starting dates can be found here: Info (originally the starting dates were not available to the participants)