Last week, I proposed a New Year’s resolution for us to clean up our spend analysis data. After all, if we’re going to hit the ground running next week, we need a clean foundation to build 2018 sourcing initiatives on.

I recognize this week for the vacation time that it is, so I will keep this brief. In fact, I’d like to view this list as a high-level set of suggestions rather than drilling into a defined process. We all have too much egg nog to drink to bother with a long post.

Data Cleansing Best Practices

Long story short, we need to ensure that we apply a simple “onboarding” process whenever we incorporate a new set of data into our overarching spend analyses:
  • Keep tabs on data sources. As we identify the data sources we need to capture, we’ll get more of a handle on how much of a lift it will be to get them in shape. This will help us allocate needed resources and set deadlines in advance of the cleansing process. Are we pulling general ledger reports? Do we need to go down to a more granular level, such as invoice reports?

  • Collect and consolidate the data. We’re likely going to be working with multiple systems – we’ll want to convert the data to a simple standard, such as excel files. Choosing a single, widely accessible format like this ensures that a wide audience of required resources will be able to access the data. However, this is also one of the first opportunities to fracture our data further…

  •  Don’t put this consolidation on autopilot. Let’s stick with an Excel example. Excel does its best to interpret data the way it thinks you want it to. However, let’s say our data has characters like double quotes (maybe representing lengths of screws in our MRO description field). Excel may misinterpret that double quote and throw our columns out of whack – those screws would screw us! Check your data over for any questionable characters.

  •  Cleanse and normalize. We’ll want to think about standardizing our approach here. It is simple enough to say “correct any errors we find,” but what about non-errors that are still problematic? Let’s go back to or AT&T example – we need to pick a convention (for example, always converting the word “and” to an ampersand) and stick with it.

  • Append. We’ll want to identify what data is important, and ensure we collect it where possible. Not all sources will include all the fields we want; some appending may be in order. In advance of this cleansing, think about what type of data is relevant to your goals. For example, we may want to identify any organizational parent/child relationships – in other words, do we want to retain the knowledge that AT&T Mobility is part of the AT&T family?

  • Set up a schedule. This is not a one-and-done exercise. When we review new data, we validate on an intra-data set basis. We also want to think about the bigger inter-data set picture as well, ensuring rules set today govern new data tomorrow. Schedule regular data-wide reviews to ensure consistent rules are applied. You’ll also want to validate that new data points with previously unseen variables don’t disrupt any current-day automation.

2018, Here We Come!

If I had to boil it all down into a single key takeaway for you to consider moving into 2018, it would be this – Our spend analyses are the foundation of our strategic sourcing initiatives. Spend data are, in turn, the foundation of our analyses.

Keep the GIGO concept in mind, and do what you can to give 2018 the strongest foundation you can.
Share To:

Brian Seipel

Post A Comment:

0 comments so far,add yours