One of the early steps in the Business Intelligence (BI) process is acquisitioning data. Basically, you create a copy of available data that might contribute towards your analysis. The goal of BI is to optimize a business process, so grab everything you can that could possibly have impact.
Do we need an actual copy of the data, or can we use the source data directly? You should always make a copy. There are a variety of benefits that overshadow the cheap cost of disk space. One reason to make a copy is to avoid slowing down production systems while performing intensive BI operations. Another reason, even more important, makes way for data from disparate sources to be effectively joined together for needed analysis.
When determining how you are going to capture data there are a few things to consider. First, not all sources are easy to acquisition. You should evaluate your data profile and determine all the various technologies needed to copy the data. You may need to grab some data from an internal database while pulling other data from an online social media platform. The technological differences to grab these two sources is significant. You’ll need to use either talent (data engineers) or technology to get it done. Once you get the initial copy, you’ll need to determine how to keep the data updated.
There are two standard approaches used to keep data updated: the full load and the incremental load. The full load deletes everything you currently have and imports the source data fresh. Depending on what system you are using, this can be very fast. The incremental load, alternatively, updates, inserts, and deletes rows in the existing data. This method can be efficient at syncing a large source of data that doesn’t change frequently. There are many variations to each of these concepts that will become necessary over time to keep the data in your BI platform current.
Sometimes getting the latest data itself can be a little challenging. Some data providers may throttle your download throughput. You should consider the cost of this delay when determining how frequently you need to update your dataset. Data providers may also require you to download your data in chunks instead of everything at once. You’ll have to determine how to paginate your data and stitch it back together for acquisitioning. Challenges like these and others add up and may subtly take away from the benefit gained by your analysis work.
If you are acquisitioning data directly into your BI tool (like Power BI), you will probably find that your processing will become slower and slower over time. You may have to choose between keeping historical data necessary for deeper analysis or using delayed reports. To remedy this, you should consider moving your processing to a tool designed for that type of work, like a database and cube system. You can then use your BI tool to primarily visualize, which will stay fast.
These are just some of the things you will need to consider when deciding how to acquisition your data. Here at ConradBI, we have years of experience and would love to handle these challenges for you. We have consultants ready to help and are looking forward to the future release of our consumer toolset.