To the end of our project it is time to dedicate a section to the final results and findings of the whole work. It turned out that the most time consuming tasks were not only to run the exploratory analysis on our data but also to gather and preprocess the data itself. Especially the processing of the twitter data has held some unexpected challenges, namely the proper handling of emojis and the overall handling of the Twitter API, which turned out to be more complex than expected. Nonetheless we were able to generate meaningful data, that contains all variables needed to run some interesting analysis on it. Especially Twitter offered a wide range of metadata that got delivered with each tweet (e.g. the retweet count, which was vital for one of our correlation approaches).
With the exploratory data analysis we wanted to answer our initial research questions. For this purpose we ran different approaches over the data to check, whether there is a significant correlation between the average sentiment of tweets a player receives before a game and his performance in-game. Unfortunately, it is to say, that our hypothesis doesn’t hold and there is no such significance to be observed.
To further investigate the findings we made, we wanted to elaborate the impact of the sentiments on a prediction model that predicts the players BPM performance score. The idea: If the sentiments highly contribute to the prediction and this prediction then is relatively good, it could be interpreted as indicator, that the significant correlation exists after all and we just made some mistakes in the analysis setup. And indeed did the majority of the sentiment scores highly contribute to the prediction. But unfortunately the prediction was quite poor. This could be interpreted as the sentiments pushing the predictor into a wrong learning direction and therefore are not significantly correlated to the prediction outcome.
Reflecting the overall project and its outcome some consideration regarding the whole project setup can be done:
1.For the twitter data, domain specific and more advanced sentiment extraction methods could be found or existing analyzers be tuned
2.For the prediction model, the input variables should be reviewed to gain a better prediction outcome
3.Generally the correlation analysis could be decoupled from the BPM performance variable and be applied to component variables of the BMP score (e.g. a correlation between the sentiments and the 3-point-field-goal percentage)
4.Further assumptions could be incorporated into the process, like the fact that some players don’t manage their own twitter accounts at all but let professional social-media agencies monitor the activities. Such accounts are of course irrelevant for our analysis