Back to Programme

Factors of attrition in online panels: Evidence from 4 waves of the Values in Crisis survey in Russia.

Boris Sokolov (HSE Univeristy, Yale University) - Russian Federation
Vyaleta Korsunava (HSE Univeristy) - Belarus
Yuri Rykov (Okko) - Russian Federation

Keywords: Web surveys; opt-in panels; longitudinal surveys; respondent attrition


Abstract

Non-probability online surveys are increasingly used in numerous academic and practical applications, including longitudinal studies. While the strengths and weaknesses of online surveys are generally well-known, limited sample representativeness is a significant drawback. Furthermore, panel online surveys, in addition to this representativeness issue, may suffer from respondent attrition, a common problem in all longitudinal designs. This attrition can further exacerbate bias in both univariate and multivariate estimates derived from such surveys.

We analyze factors of respondent attrition in online longitudinal surveys using Russian data from the "Values in Crisis" (VIC) project, an international investigation into the societal impact of the COVID-19 pandemic that initially involved 18 countries. Russia is the only country where four waves of data collection were conducted: June 2020 (N = 1527), April-May 2021 (N = 1169), November-December 2021 (N = 1203), and July-September 2022 (N = 1205). Despite replenishing the sample with new participants to maintain quotas, attrition remained significant. Only 606 individuals (39.7% of the initial wave) participated in all four waves. While dropped-out respondents were replaced, they were also allowed to re-enter the study in subsequent waves.

Data was collected through a commercial opt-in panel. This panel overrepresented women and young individuals while underrepresenting the elderly, unmarried, and rural residents compared to the general population. The initial VIC sample also underrepresented the elderly, highly educated, and rural respondents. Moreover, VIC respondents exhibited significantly more liberal views than those in the World Values Survey, even after controlling for demographic differences between the samples.

We begin by examining demographic differences between two groups of initial (W1) participants: those who completed all four waves and those who dropped out after the first, second, or third wave. Female representation among the former group decreased significantly compared to the baseline sample. While women initially constituted approximately 53% of the sample (mirroring the female population share in the 2010 census), this proportion declined to 43% by the final wave, indicating substantial attrition among female respondents. Attrition rates were higher among participants with lower levels of education, those residing in rural areas, those with lower socioeconomic status, and those who were childless or unmarried.

Next, we trained several models (regularized logistic regression, random forest, and XGBoost) predicting respondent survival after four waves. These models utilized the full collection of VIC questions (more than 100). We then ranked the available demographic and attitudinal variables based on their relative importance in the model predictions. Overall, predictive accuracy was moderate, with accuracy about 0.61 and ROC-AUC about 0.67-0.68. Variables with the highest relative importance were greater age, longer interview length, and higher scores on Schwartz's Conservation and Self-Transcendence values.

Our findings suggest that respondent attrition in online opt-in panels can be substantial and challenging to predict at baseline. However, our analysis reveals that certain demographic and attitudinal variables are non-trivially associated with respondent survival. This information can be valuable for survey practitioners in designing more effective longitudinal online surveys.