Abstract:
"Road accidents, a major cause of death and dismemberment in Sri Lanka is a serious issue that affects the livelihoods of its population as well as its economy. Despite various measures taken by law enforcement and government bodies the problem continues to grow. Many statistical analyses have been conducted in the past on the domain in Sri Lanka as well as other countries. Logistic regression and other methods have been used to identify the most significant factors affecting road accidents and the severity. With the rise of popularity in social media there has been a trend of using social media data for various statistical analysis such. Several research has been done in the area in the recent past to use social media data for traffic analysis, accident analysis, disaster management etc.
In this research the aim is to explore the possibility of using social media to get meaningful insights into road accidents in a Sri Lankan context. The basis for all research conducted on the domain of accidents in Sri Lanka has been periodically published data by government entities. As an alternative approach this research attempts to use social media posts as a data source.
In the initial scope of the research accident news posts will be extracted from a social media handle that specializes in reporting crowdsourced accident news in the Sinhala language. The extracted textual posts will be then cleansed, tagged, and categorized to obtain significant information from it. The feasibility of this approach will also be measured in the research project.
The cleansed, categorized, and structured data will then be analyzed running various machine learning techniques to discover the most significant factors that contribute to fatality during road accidents. The relationships among the variables will be explored. In addition, descriptive statistics regarding the various factors will be presented. The accuracy of the approach, the challenges and shortcomings will be discussed, and an evaluation will be conducted whether using social media posts in local language as a data source for accident data analysis is a feasible option."