While “big data” technologies help us handle data, the fact of the matter is that the “data process” is more important than the technology itself. What do I mean by that? Let’s say that management postulates a question. The data scientist must now go acquire the data, clean and restructure it (which will consume majority of the time), and conclude with some combination of visualization and statistical analysis. Rinse and repeat. Of course, management is busy. So how does one go about optimizing their time? On paper, the simple answer is this: interactive BI dashboards that allows management to perform some level of datamining on their own.
One great technology to work into this data process: Qlik Sense.
Until my vacation last week, I was at least superficially familiar with most of the functionality found in Qlik Sense. The one feature that I had not explored was geographic visualization. As I wanted to explore its capability, and I am a very passionate member of Toastmasters, I figured that I would run an analysis on the clubs in my region (District 38). All information presented here is public and was acquired via the Find a Club feature of the Toastmasters International website.
To begin, I needed to construct my data process. Simple enough, right? Certainly... as long as you know where to look. My process looked like this:
- Webscrape the club data
- Reshape the data via R and regular expressions
- Compute summary statistics
- Use the Google maps API to pinpoint the longitude/latitude coordinates of each club
- Visualize the resultant data with Qlik Sense
The results? We can easily pinpoint each Division. In the image below, the colors of the bubbles distinguish each Division. Furthermore, the size of the the bubbles represents the number of clubs in close proximity. A larger bubble represents more clubs in that particular locale.
We can also easily drill down to the club level. Here we have modified the dashboard to incorporate contact information for each club. It is important to note that this “black box” process of acquiring and processing the data is completely automated. Now that the script exists, it takes maybe 30 seconds to update this dashboard via a single click of a button. If new clubs are created, or clubs are removed, our dataset will be updated to reflect that fact.
Before I continue, I want to emphasize that these dashboards are interactive. The data on each sheet is completely tied together. Hence, a single click will instantaneously update all of the data and charts that appear before you. To explore this further, see Qlik Web Example. With more data, only the sky (and your imagination) is the limit! Furthermore, this entire process could be easily adapted for maintaining internal organization visibility, analyzing competitors, or investigating distributors. One such example included in the free version of Qlik Sense:
In summary, by having people that are capable of quickly and accurately acquiring, processing, analyzing, and visualizing procurement data with tools like Python or R one can set management up for success by incorporating easily manipulated (but powerful) data visualization tools. Qlik Sense is an excellent example of one such tool.
For those interested, I intend to follow up this post with one about integrating data mining tools and techniques into this process to perform tasks like market or customer segmentation with subsequent Qlik Sense visualization.