Exploring Cross-Cultural Social Science Research with LLM-Generated Synthetic Respondents: Insights from Subjective Life Expectancy Probing
MAO LI (University of Michigan) - United States
Xinyi Chen (University of Michigan) - United States
Zeyu Lou (University of Michigan) - United States
Kaidar Nurumov (University of Michigan) - United States
Stephanie Morales (University of Michigan) - United States
Sunghee Lee (University of Michigan) - United States
Keywords: Large Language Models, Synthetic Response, Cross Cultural Survey, Subjective Life Expectancy, Open-ended Question
Abstract
The subjective life expectancy (SLE) question, which asks respondents to rate their probability of living beyond certain ages on a 0-to-100 scale, provides valuable insights into mortality-related behaviors. However, interpreting why respondents provide specific answers to SLE remains a persistent challenge. Probing questions, such as "Why did you choose ABC response?" are commonly employed in survey research to capture the reasoning behind respondents' answers. To deepen our understanding of these responses, this study leverages Large Language Models (LLMs) to address three key objectives: (1) to evaluate the quality of reasoning using a Natural Language Inference (NLI) framework to categorize probing responses as supportive, contradictory, or unrelated to SLE ratings, thereby uncovering cognitive biases; (2) to predict SLE ratings based solely on probing responses to assess the accuracy of these predictions; and (3) to examine the effectiveness and response patterns of LLMs across three languages (English, Spanish, and German). The alignment between LLM-generated responses and human responses was evaluated using data from a nonprobability web panel survey (N=1,793) and the Survey of Consumer Attitudes (N=506).
We found that (1) most probing responses supported respondents' self-estimations of SLE, while a smaller proportion (~20%) did not. The NLI framework revealed that such misalignments often stemmed from heuristic or emotional factors influencing lifespan estimation, with socio-demographic patterns shedding light on diverse response strategies; (2) the model’s predictions of SLE ratings deviated from human responses by an average of 15%, demonstrating reasonable alignment and highlighting the potential of LLMs to estimate respondents' subjective evaluations; and (3) LLM-generated responses exhibited lower variance compared to human responses, reflecting the model’s tendency to provide consistent and specific answers. This reduction in variance raises important questions for survey science, particularly regarding whether it is more pronounced in certain languages, with implications for cross-linguistic comparability and survey design.
While LLMs demonstrate considerable potential as synthetic respondents for enhancing data quality and understanding public attitudes, this research underscores their limitations in replicating the complexity of human reasoning. These findings emphasize the need for methodological rigor when integrating LLMs into survey research, particularly in cross-cultural and multilingual contexts.