Count Time Series Modelling of Twitter Data and Topic Modelling: A Case of Indonesia Flood Events
Last modified: 2021-06-14
Twitter data provides rich and powerful information to leverage the dynamics of public perception to establish situational awareness and disaster mitigation strategies during critical times. In this paper, we perform topic modeling via Latent Dirichlet Allocation to extract topics from a collection of tweets related to Indonesia flood events in February 2021 with the query: “banjir”. The extracted topics are used as one of the features to build a generalized linear count time series model with Negative Binomial distribution. We find seven major topics from the model in which tweets containing a topic about the government’s role in handling the situation dominate the conversation. Taking into account a simple intervention analysis, we demonstrate a statistically significant change in the users’ behavior before and after the severe Jakarta flood on 20 February 2021. Moreover, a metric evaluation demonstrates that a covariate that describes the turning point of the Jakarta flood event is convenient to build a more accurate count time series model of the tweets.
Disaster surveillance; Floods; Time series; Topic modeling; Twitter