When talking about Big Data one of the possible interesting data sources are social media data. In this short 2 part series I will show 3 approaches to Social Media Analytics based on available Microsoft BI technologies.
In this first part I will cover the basics of using the statistics of different social media channels as a datasource with the goal of one BI Solution that covers an overview for all channels a company is active in. In the second part we will find out how get deeper information about the content of userpostings, like sentiment – an automated assessment of the content.
Let’s start with part 1. Imagine you are a marketeer for a global player. Of course you use the internet and social media to connect with your (potential) customers and try to make use of e.g. facebook, twitter, youtube, google analytics for the companies websites or klout. All of these channels have their own statstics providing meta-information about the activities on the channel like: hitrates, users, followers and so on. Of course you could use the statistics website of each of these channels, but that would be a very time-consuming process. For to check the statistics websites of the 5 names channels for e.g. 20 countries would mean a daily check of 100 statistics sites without having the chance for an aggregated overview for all the countries.
The solution is solved via the BI standard architecture – automatisation of collecting all the relevant statistical information, transforming the data for comprehensive analysis and storing them in a data warehouse. With this approach you also have the chance to do historical analysis – most of the native statistics of the channels only have a short data history provided.
One of the challenges for this approach is the flexibility of the architectural Data Warehouse layout. Channels change their statistical offerings from time to time, measurement get an new name or are replaced by new measurements. So the classical approach in the dimensional model where you have a fixed (named) column in the fact table for each measurement would not provide this flexibility. A more flexible approach would be to have a “Measure-Dimension”.
With this generic model you have to possibility to introduce new measures or adapt existing just by configuring the “Dimension Measure” underlying table, which could look like this:
As you can see you can also handle the format and even the aggregation type this way. This is especially relevant because some of the statistical measurements are so called “snapshot”-measurements, e.g. Facebook LifetimeTotalLikes which must not be summed up. When yesterday you had 20 fans on your facebook page and today you have 21 fans it is obvious, that on a weekly level you must not sum them up.
To see the possible result and convenience of such an BI approach to social media statistic data collected into one system here are some dashboard screenshots based on the underlying architecture as described. The dashboards are realized with Tableau Software.