Clean data! Sounds maybe not so fun but a must to be able to use the raw data we have obtained as our analysis or report basis. We will look at this in the upcoming posts and it does not have to be so complicated or time consuming if you know the tricks.
Want to read the post in Swedish? Click here
Raw data is a concept that comes from receiving raw data from a system. It can be an ERP (finance) system, BI system or for example a CRM system where customer information is stored.
The data can come in all forms and have formatting that may not fit our Excel models at all. The data needs to be “cleaned” or processed before it can be used in a report. It can be about adjusting headings, columns or rows so that you can make a pivot table or with a SUM.IF formula link to a standard format. In the BI system or ERP system, it is advantageous to create a raw data report that is ready to use as a basis for, for example, a general report so that you do not have to clean the content every month. Otherwise, the problems, in what you get out of the system, are often:
- Text format in the raw data
- Blank cells
- Merged columns
- The element labels are not repeated
- Punctuation mark instead of comma or comma when thousands
- Multiple factors in the same cell instead of in individual columns
- The time format is incorrect
Bad data… bad analysis
If you do not spend time or set processes for how to deal with the raw data, it will affect the quality of what you get in the end. By mapping the different parts that need to be cleaned, you can reduce the time required. The best thing is to set up reports in, say, the BI system that are as useful as possible so that there is as little “cleaning” as possible needed. Then, with each extract, just follow the protocol for what you need to fix so that it goes quickly and automatically. Possibly you can have a prepared model that processes and cleans the raw data.
If the raw data is not processed correctly, the analysis or the results report will be more or less incorrect. To illustrate how wrong it can go, I have an example below. Do you see the error…? Cell C5 has a punctuation mark instead of a comma, which means that Q1’s total revenue lacks 8% as the 100,000 SEK is not included. You do not want to bring such a mistake to a management team meeting with the CEO! If you work with numbers day in and day out, you can easily become number blind, but a CEO will definitely question the data behind the report☹.
We will continue to clean data in the next post…
Absolute and relative cell reference
Sounds awkward right? Do not worry, it’s very easy.✌ The starting point is that a cell reference is a relative reference, which means that the reference is variable and not absolute or in other words definitive.
Without adjustment if you copy a formula that picks up data from a cell (A1) and paste it a column to the right, the formula will look for the value in cell B2… this becomes wrong.
If you want the cell reference to be absolute, meaning the formula to continue to pick up the value from only A1, you “lock” the cell using a dollar sign ($) through F4.
Working with absolute cell references and learning how to lock cells definitively, horizontally or vertically can make your calculations much easier.
The F4 button is the button that can do wonders for you in your work.
- One click on F4: the reference is absolute and the cell is locked in all directions ($A$2)
- Two clicks on F4: the reference is mixed and the cell is locked in horizontal direction (A$2)
- Three clicks on F4: the reference is mixed and the cell is locked in vertical direction ($A2)
- No click on F4: the reference is relative and the cell is not locked in any direction (A2)
You can see examples of the benefits of only relative cell reference in the analysis below. Here we want the formula to run both horizontally and vertically as we want to calculate all months and accounts against each other. The formula was written in cell J5 and then copied all the way down to L25.
The use for relative+absolute (mixed) cell reference with horizontal locking can be seen below. Here we want to calculate the cost part of the monthly income. No matter how we copy the formula, the same revenue lines are locked and used.
If you look at how I work myself, I think I use relative+absolute cell reference with vertical locking most often. The benefit of this function is great for calculations where the column should be the same as below where we calculate the month’s share of the quarter total.
In this post, we have talked about the benefits of absolute and relative cell reference in calculations and analyses and the importance of processing data before it is used. In the next post I will present practical examples of cleaning data.
Carl Stiller in cooperation with Learnesy