Skip to main content

Use of scanner data and webscraping in price statistics

Objective and implementation

The calculation of price indices has a tradition of more than one hundred years at Statistics Austria; time series of the consumer price index date back to 1958. Data are collected on a primary statistical basis organised by Statistics Austria and takes place in shops, as well as by e-mail, by telephone or on the internet. In order to constantly improve data collection, new data sources are regularly evaluated to supplement the indices. Two data sources have already been successfully integrated into the statistical production process:

For the first product groups, the implementation of scanner data in the index calculation was implemented in January 2022. From this point on, the on-site survey of food and drugstore goods was replaced by the use of scanner data. Due to the Covid-related missing prices in spring and winter 2020, it was necessary to use scanner data in advance to compensate for missing price reports. Other product groups will follow in the coming years, for example clothing, electrical appliances or furniture.

Legal framework:

Scanner data:

Since 2019, a new CPI-Regulation regulates the delivery of scanner data from the large supermarket chains to Statistics Austria. Among other things, the regulation legally defines the sampling units, the periodicity, the time range and the variables of the data delivery.

Webscraping:

The following general conditions should be taken into account for webscraping:

To meet these requirements, our web scraping activities comply with the guidelines developed by Eurostat.

 

Innovation within the project

The project is innovative as it uses new data sources and improves the quality of price indices: data acquisition is more efficient, more up-to-date (among other things, no late price reports) and can cover higher data volumes. In the long term, scanner data can be used to cover a complete range of goods instead of a sample (limited to food and drugstore trade for the time being).

Interpretation of the results

The new data sources and their high data volume allow for different price index calculation methods, which can lead to different index properties. Their advantages and disadvantages have to be analysed and evaluated before deciding on their ultimate use for price statistics.

This may lead to higher price index volatility. Whether and to what extent there are deviations from results already published as official statistics is still unclear at the moment.

Further information and project results

The first price indices, based on prices collected automatically via web scraping, were included in the official index during 2021. New lettings of flats and houses were selected as the first area of ​​application for these purposes. After that, the collection of web scraping data was continuously expanded, most recently to electricity and gas, municipal fees, mobile phone tariffs and clothing. In some cases, this data is not only used to calculate the index, but also to check price data for plausibility.

Last updated on 2023-01-30.
Show bookmarks