Well, They Didn’t Ask Me.

Have you ever been surprised by the results of a survey and said to yourself, “well, they didn’t ask me”? I know I have. The fact is that the size and makeup of the sample used to determine the results of a survey can skew the results if it is poorly or subjectively selected. This is also true when doing analytics.

Always leverage as much data as possible when determining insights using business intelligence. If you can’t use everything, then select an objective sample that accurately represents the data to the best of your ability. Once I was trying to determine influencers impacting website usage for a large organization using Google Analytics traffic data. I had just seen a demo of Power BI’s new (at the time) Key Influencers visual and was excited to give it a try. I loaded it with millions of rows of data expecting it to take a while to process and was surprised at how quickly it returned its results. I was even more surprised at how the results didn’t make sense – I knew the data well enough to realize something was wrong. Further research revealed that this visual only uses 10K rows of data regardless of the size of your dataset. It is supposed to select a representative sample of data. In my case, it did not, and the resulting insights were flawed. I’m glad I knew the data. It would have been terrible if I had published misleading insights.

Also, when trying to obtain insights from your data make sure you have enough and that you use it. If you happen to be scouring through acquisitioned data trying to determine all the combinations of attributes to produce a junk dimension, make sure and evaluate the entire data set. Use a group-by query to find all the valid combinations and their corresponding counts. Counts that are extremely low typically indicate combinations that were created erroneously in the transactional system and should not be included as a dimension value. Instead, let an inferred dimension handle this bad data – you can use the results to encourage improvements in the application producing the data.

I recently ran into something to remember when using Power BI for data modeling. I needed to filter rows from a dataset based on a Y or N value – I wanted to keep only the Ys. When I went to apply my filter, I noticed that the only value available was Y. I instantly knew this was because Power BI operates on a sample of the data by default. I selected the Y and everything appeared to be ok. Out of the corner of my eye I realized the M Code for the step did not change – in fact, no filter was applied at all. It seems that Power BI compared my filter against the sampled data set and decided it was not necessary because all the rows already matched. This would not be the case for the entire data set. To remedy this, I had to adjust the sample size to include all rows – then Power BI appropriately applied my filter.

ConradBI can help you identify objective insight-filled datasets within your organization for analysis. We want your business to become the best version of itself. We will use as much of your data as possible to tell you how.