The Use of WhatsApp to Predict Demographic Characteristics
WhatsApp is currently the most popular messaging application with the largest name recognition, by far the largest user base, and the strongest corporate backing since its acquisition by Facebook in 2014. It allows people to easily share texts, pictures and audio files.
In a recent study, researchers Avi Rosenfeld, Sigal Sina, David Sarne, Or Avidov and Sarit Kraus showed how the messaging service WhatsApp could be used to facilitate data analysis. The study represents the first exhaustive analysis of WhatsApp messages. The authors collected over 5 million encrypted messages from over 100 students, between the ages of 18 to 34.
They developed predictive models using open source tools in R and Weka to predict WhatsApp usage patterns between different types of user groups and without accessing the message content. The work is data driven so findings are based on the algorithms’ output.
The assumption was that differences between WhatsApp users can be predicted by exclusively using general statistics about usage, even without having access to the content. For example, one might find that women write more, while men write shorter messages. Other criteria were response time, average conversation length, age differences, and the time of day a message is sent. Another assumption was that different types of group usage can be predicted based on general group attributes. For example, which groups will have certain type of content, such as file attachments or shorter messages, or which groups will have certain user activity (larger quantity, more frequent messaging, quicker response time).
Rosenfeld and his colleagues found that many message and group characteristics significantly differ across users of different demographics, such as gender and age. Additionally, they found that data analytics can be used to predict the users’ gender, age and group activity. The results provide several new insights into WhatsApp usage: Younger users in this dataset used the network more frequently. More years of education and age are positive factors in predicting how frequently people send file attachments. Overall, women use the network more often than men, while men are more often members of larger communication groups and send shorter messages. Also, the messages in larger groups are typically shorter than those in private, one-on-one conversations.
The authors believe that the methodology they used might be of general interest to other groups, such as demographers and government bodies, to facilitate data analysis without encroaching on users’ privacy. They recommend that additional studies should be undertaken to improve upon and extend the study they presented.