The calculation of price indices has a tradition of more than one hundred years at Statistics Austria; time series of the consumer price index date back to 1958. Data are collected on a primary statistical basis organised by Statistics Austria and takes place in shops, as well as by e-mail, by telephone or on the internet. In order to constantly improve data collection, new data sources are regularly evaluated to supplement the indices. Two data sources have already been successfully integrated into the statistical production process:
For the first product groups, the implementation of scanner data in the index calculation was implemented in January 2022. From this point on, the on-site survey of food and drugstore goods was replaced by the use of scanner data. Due to the Covid-related missing prices in spring and winter 2020, it was necessary to use scanner data in advance to compensate for missing price reports.
Since January 2025, scanner data has also been used for the product groups of clothing and shoes. As these markets are more fragmented compared to the grocery retail sector and include more players, a larger number of companies must be included in the sample. However, it is not possible to achieve 80-90% coverage with scanner data alone. Therefore, traditional data collection remains a crucial component alongside scanner data. Moreover, online retail is gaining increasing importance. The implementation of web scraping started in the clothing sector in 2022 and was extended to the footwear sector in 2025. As a result, starting from 2025, the indices for clothing and shoes will be based on three pillars: scanner data, web scraping data, and traditional price collection.
In the coming years, additional product groups will be included, such as furniture, electronics, and DIY products.
Legal framework:
Scanner data:
Since 2019, a new CPI-Regulation regulates the delivery of scanner data from the large supermarket chains to Statistics Austria. Among other things, the regulation legally defines the sampling units, the periodicity, the time range and the variables of the data delivery.
Webscraping:
The following general conditions should be taken into account for webscraping:
To meet these requirements, our web scraping activities comply with the guidelines developed by Eurostat.
In recent years, technical developments and the increasing specialization of consumer offerings – characterized by significantly broader assortments, stronger segmentation of product groups, and changes in pricing strategies – have posed major challenges for price determination in the Consumer Price Index (CPI/HICP). The use of scanner data and web scraping data represents a significant qualitative advancement in consumer price statistics. The integration of sales volumes and turnover data, along with comprehensive coverage of reporting periods and product ranges, plays a key role in the quality of the CPI/HICP. These new data sources enable more efficient and faster data collection and help reduce the burden on the infrastructure for price collection.
The development work for the implementation of new data sources was co-financed by the European Union.
The various data sources and the large amount of data allow for the application of different methods for calculating price indices. These methods can lead to different index characteristics, which is why the advantages and disadvantages of each method must be carefully weighed before a final decision is made regarding their use in price statistics. It is crucial to compare indices based on traditional price surveys, scanner data, and web scraping data to assess their consistency with regard to seasonal patterns and price trends. Despite significant methodological differences, the results show a high degree of consistency between traditional methods and those based on alternative data sources. The new data sources do not provide a completely different picture, but rather a more differentiated one.
The first price indices, based on prices collected automatically via web scraping, were included in the official index during 2021. New lettings of flats and houses were selected as the first area of application for these purposes. After that, the collection of web scraping data was continuously expanded, most recently to electricity and gas, municipal fees, mobile phone tariffs and clothing. In some cases, this data is not only used to calculate the index, but also to check price data for plausibility.