Back to Programme

Foundations of a methodology for quantifying measurement error of social media text posts: Proof of concept using social media posts about the TikTok ban by highly active authors

Michael Elasmar (Boston University) - United States

Keywords: social media, measurement error, validity, reliability, methodology


Abstract

The past decade has witnessed an exponential growth in the expression of opinion in the form of text posted on social media. This paper deals with the type of social media text posts that are not bot-generated and that can potentially reflect the thoughts of individuals regarding specific topics. Researchers interested in opinion extraction from social media text content typically analyze whole tweets, whole product reviews, whole posts on Facebook or on other platforms. The corpus that is typically analyzed and reported on consists of the aggregation and patterns found across whole text posts. The focus of opinion extraction researchers has mostly been on the sentiment present in whole social media text content. There have been no attempts to quantify measurement error in this context since the analyses have so far solely focused on whole text content and not on the people generating the text content.

This paper proposes and tests a methodology that aims to quantify validity and reliability in the context of social media text expressions. First, a search was conducted to locate a topic that has been the focus of intense debate on social media. The topic chosen is the “Tik Tok Ban” proposed by the U.S. government in 2024. A search for “TikTok ban” on X (previously known as Twitter) was conducted via BrandWatch and covering a time period from January through May 2024. It resulted in 53,806 Tweets. After removing re-Tweets, an AI tool was used to break down the content of each social media post into the types of expression typically present in public opinion studies: e.g. beliefs, attitudes, intentions and behaviors. After identifying the types of expressions present in each post (e.g. belief, attitude, etc.), the AI tool was then used to identify the specific objects present in these expression types. An object, in this context, is the main subject of the expression type (e.g. U.S. Congress, President Biden, etc.). After coding all combinations of expression types by objects, the data was then reorganized so that the unit of analysis became the author of the Tweet rather than simply the text present in the Tweet. This was done by matching Tweet authors by using their Twitter identification numbers. We focused on highly active authors, those who expressed themselves more than once about the TikTok ban using a same expression type by object combination (e.g. belief about U.S. congress motivation for the TikTok ban). We computed the sentiment score present within each combination on a -9 to +9 scale. This resulted in pairs of expression by object combination scores for each highly active author. The data was then subjected to a confirmatory factor analysis. This was followed by the computation of a reliability coefficient. The methodology yielded proof of concept for quantifying measurement error as present in the text content reflecting beliefs about the TikTok ban among highly active authors on Twitter. Lessons learned, limitations and implications for future similar endeavors are discussed.