Wednesday, May 27, 2009

The Secret of Correlating Blog Posts with Sales!

Is it possible to determine your brand sales by monitoring social media? The answer is yes, but how precisely is yet to be determined. To help answer this question we looked at brand beer sales data from Convenience Store Decisions and blog post data by brand as indicated on and Key Word search and data are shown as follows:

Using Linear Regression Methods it almost seems like child play at this point to show the correlation between sales data and blog posts. Even blocking for the possible effect of using the two search engines proved not necessary. The differences between two search engines were not important with this small set of data. One thing that was important in fitting the model was to transform the sales data to its natural logarithm. In a previous post I discussed how as an item increases in popularity in the blogs an exponential effect is seen its postings. It appears this may be the case with this data as well.

The model that is developed is Ln(Case Sales) = 2.812+0.00044*BlogHits - 0.00138*Block.
The blocking variable is 0 for Technorati and 1 for IceRocket. The data and regression equation look as follows. There is an upward trend in sales as blog posts increase.

Here is the ANOVA table for the regression analysis. The correlation is significant with a p-value of 0.0005. The R-Square seems to indicate the model only explains about 58% of the variance in the data. Not the best model, but what is important is that the correlation exists.

So what can we infer from this analysis.

1) Blog posts about certain beer brands are correlated with sales of those brands.
2) We cannot infer any type of cause and effect relationship.

Blog posts don’t cause sales. Sales can’t cause the blog posts either. The endogenous variable (lurking variable) we are really trying to quantify here is the relative consumer sentiment about a brand. High consumer sentiment leads to both more sales and more blog posts about a particular product. Blog posts could in fact be an indicator of consumer sentiment and this is the value of the analysis.

There are some pitfalls for this analysis. First, we can’t say for certain that the blog traffic is positive or negative. Is Natural Light receiving more blog traffic because of negative consumer sentiment than positive? Perhaps there is another factor involved. Secondly, it may be wrong to extend this type of analysis to certain other types of consumer products. Products that don’t make it past a noise level in the blogosphere probably cannot be indexed in this fashion. What about the time consequence. If sentiment changes over time, how can we be assured this will be reflected in the blog posts just by frequency of keywords. Trending of keywords can help with this.

There is nothing new here. It’s already been established that we can easily trend, Technorati, Twitter, Ice Rocket and other social media posts for keywords and topical interest areas. The question is has anyone made an attempt to model this interest and correlate it directly with sales for their products?

My vision is that this type of analysis can be fine tuned to produce a consumer blog index for various types of products. Not based on voting for popularity of a product but based on chatter. If this information correlates well with sales then it has significant interest to businesses promoting products. Are producers interested in this data and are consumers interested in seeing this type of data?

Can this method be substituted for certain conjoint analysis? If I am producing cameras and want to know the distribution of colors to produce, can I Google it? Or use IceRocket? Would a search with a logical combination of words representing colors and cameras yield results? The answers could be obtained in hours and save thousands of dollars and time on market research.
I would love to here comments from those of you already engaged in this analysis.
Post a Comment