Back to Programme

Exploring Methodological Alternatives in the ICT Panel Survey: Insights from Open-Ended Questions

Thiago Meireles (Cetic.br|Nic.br) - Brazil
Winston Oyadomari (Cetic.br|Nic.br) - Brazil
Marcelo Pitta (Cetic.br|Nic.br) - Brazil

Keywords: Open-ended questions, Non-probability sampling, Online Panel



Abstract

This study aims to provide and evaluate methodological alternatives for analyzing open-ended responses in large-scale digital surveys. By leveraging advanced text analysis techniques, it seeks to address challenges in handling qualitative data at scale while uncovering insights into societal issues. Traditional quantitative surveys often overlook the depth of qualitative insights that open-ended questions can provide. The ICT Panel Survey, conducted in Brazil, provides a unique platform for investigating these issues through innovative methodologies, addressing gaps in traditional quantitative approaches by incorporating open-ended responses. As digital technologies increasingly permeate everyday life, understanding users' perceptions and behaviors is essential for informing public policy and fostering responsible digital practices and ICT Panel Survey offers a unique opportunity to experiment with alternative methodologies that bridge the gap between qualitative and quantitative research, enhancing the depth and applicability of survey findings. The research utilizes Computer Assisted Web Interviewing (CAWI) to collect data from a quota-based sample of internet users aged 16 and above. Pseudo-weighting methods are employed to adjust for selection and coverage biases, using ICT Households as a reference. Open-ended responses are analyzed through a combination of manual categorization and supervised machine learning models, complemented by preprocessing steps such as stop words removal, stemming, and topic modeling. Bootstrap methods are applied to estimate confidence intervals for the categorized responses. The focus was on two key issues: privacy (2021) and electronic waste (2023), providing a testbed for evaluating methodological approaches. Each edition of the survey collected data from slightly over 2,500 respondents, from which a sample of 500 open-ended respondents' interpretations of privacy and electronic waste was manually categorized to train and validate the machine learning models. Ultimately, the study evaluated the use of supervised machine learning models to classify textual data comparing model outputs to manual categorizations. It also explores how techniques such as topic modeling and bootstrap confidence intervals enhance the reliability and interpretability of findings. The study also examines how techniques like topic model and the use of bootstrap methods to generate confidence intervals can enhance text classification processes and facilitate the presentation of reliable and robust estimates and this methodological approach successfully categorized complex qualitative data. For instance, in the privacy study, 50% of respondents defined privacy in terms of rights like freedom and individuality, while 33% associated it with data protection. In the electronic waste study, only 29% correctly identified it with physical waste and recycling, with many instead focusing on digital nuisances like spam and leftover data. These insights highlight the flexibility and depth provided by open-ended questions when paired with advanced analytical methods. These findings demonstrate the value of integrating qualitative and quantitative methodologies in large-scale surveys, offering robust alternatives for analyzing open-ended responses. These methods provide richer insights into digital behavior and perceptions, informing policy development and advancing the field of survey research. By addressing the challenges of qualitative data analysis into large-scale surveys, the study contributes to methodological innovation in social research.