Thanks in large part to modern advances in modern software and technology, we can acquire or create data on just about everything. This new-found ability to easily (and cheaply) generate and store data will likely have a very profound impact on the procurement profession. This will be the first of many posts relating to data science and its value to the procurement professional. In the future, I will author posts focused on data acquisition, data cleaning, data visualization, and predictive analytics. These posts will provide insights into how data can be processed and manipulated in order to help drive savings and support spend management.

It is important to note that these articles will likely be technical in nature. As my background is mathematics and statistics, I will generally include pertinent computations, derivations, and technical details. It is my intention to offer an elementary example in each post, including code, which might be used by an organization looking to pursue strategic procurement. I will include a mix of various technologies that we frequently employ at Source One such as R, Python, Bash/DOS, SQL, VBA, and C/C++. Occasionally I may mix in parallel / distributed programming.

Before we proceed any further we should clearly define data science. According to Wikipedia, data science is the extraction of knowledge from structured or unstructured data. We will go one step further and say that our intended use for data science is developing and extracting knowledge from data for the purposes of cost reduction via effective decision making. We add this last sentence to emphasize that we should be able to derive actionable insights from the data. 

What exactly is data science useful for? Simply put – anything relating to automation and prediction. With regard to automation, we can perform accurate and efficient computations that allow us to instantaneously make informed business decisions while simultaneously saving man-hours and ensuring accuracy of solution. Alternatively, once we have implemented a means to aggregate and cleanse data we may use it to increase organizational visibility and mine it for predictive purposes. Some very particular uses within the realm of strategic spend include baselining and comparative analysis, spend analysis, customer profiling, geography analysis and optimization, and external factor tracking (such as weather).

For my first blog post series, I will use the R programming language.  R is a free software environment for statistical computing and data visualization.  I have chosen R for two reasons. First, it is a statistical language that offers excellent visualization capabilities as demonstrated here: R statistical software. Second, R is likely to become a very useful business tool in the near future, as it is currently being integrated into Microsoft’s existing technology. See Revolution Analytics.

As we must have data in order to perform analysis, the first series I publish will be on Data Acquisition. I’ll update the links below as the articles go live.

  1. Data Acquisition & WebScraping via R: Structured Data I
  2. Data Acquisition & WebScraping via R: Structured Data II
  3. Data Acquisition & Web Scraping via R: Structured Data III
  4. Data Acquisition & Web Scraping via R: Unstructured Data I
  5. Data Acquisition & Web Scraping via R: Unstructured Data II
  6. Data Acquisition & Web Scraping via R: Unstructured Data III


Share To:

James Patounas

Post A Comment:

1 comments so far,Add yours

  1. Just the perfect post for procurement professionals, have read many books but this is the simplest and well written article.

    Regards,
    Anurag

    ReplyDelete