Wednesday, September 21, 2016

Simple file comparison test using Excel

Background

While testing, very often, we have to compare flat files and identify differences. Let's see how we can use this easily using VLOOKUP formula.

Flat files are simple files that can be opened in any editor like notepad. These files have data in tabular format with columns of data separated by either comma(,) or any other delimitors like pipe (|) or tab.

Example:  File1.txt

Year|Month|Date|Day
2016|09|23|Friday
2016|09|30|Friday
2016|10|07|Friday
2016|09|26|Monday
2016|10|03|Monday
2016|10|10|Monday
2016|09|24|Saturday


Files having fields/columns separated by comma are called CSV (comma separated values) files.

Say, we have two files to compare - File1.txt and File2.txt. Ours are pipe delimited files. 
Scroll to the bottom to download the files used here.
 
Here are the steps you have to take:


The Process

 

1. Open first file in Excel

a) Right click on the the first file - File1.txt, select 'Open with' and select Microsoft Office Excel.

This will open the file in Excel, but as you can see in the screenshot, the entire file contents will be in the first Excel column.
b) To split the columns into separate Excel columns using the pipe (|) delimitor, in Excel select the entire Column A, then go to Data tab (Alt+A) and click "Text to Columns".


c) This will open the "Convert Text to Columns Wizard". Here select "Delimited" (because our file is a delimited file, not a fixed width one) and click Next button.
d) In the next screen in the list of Delimiters select the checkbox against "Other" and type | in the box provided. You should be able to see a preview of the operation in the Data Preview area.

e) Click Finish.
Now you should be able to see the entire data properly formatted into columns in Excel.
 
 Notice that the file has a single worksheet with the name as File1

 2. Open second file in Excel

Now we will open the second file - File2.txt 
a) Click the Office button and then Open (else, press Ctrl+O), navigate and select the second file. If you are not able to see the file make sure "All files" are selected in the extensions or file types. Click Open.
b) The "Text Import Wizard" should open and use it in the same way as done for the first file. This should import the second file into a second Excel workbook in a worksheet named File2.
c) We will move this sheet to the other file to make comparisons easier. Right-click on the sheet name "File2" and select "Move or Copy". 
d) In the "Move selected sheets To book" dropdown select "File1.txt"
e) In the "Before sheet" option select "move to end" and click OK button
Now we should have a workbook having 2 sheets, each having one file open and properly formatted into Excel columns. 
We are ready to do the actual comparisons.

3. Choosing the Primary or Unique keys

For good comparisons we should understand the files properly - we should identify primary keys to uniquely identify each row. Ambiguous primary keys can lead to erroneous comparisons.
In our present example, we have 4 columns - Year, Month, Date and Day. Year can't be the primary key because multiple rows have the same Year. Due to same reason Month, Date or Day can't be primary key either. 
But, a combination of Year, Month and Date can give us a unique primary key. When we use this combination as Primary key then we can refer to a unique row in each file without ambiguity.
So, let's insert a new column at the beginning to each worksheet and name them "Key". We will use a formula to populate this column's values. Use this formula:
=CONCATENATE(B2,"|",C2,"|",D2)
Now extend the formula to populate Keys in all the rows.
 
Do this in both the sheets.
We can now use these Keys in each sheet to compare the other data elements (example Day, in this case)
 

4. Comparisons

a) In Worksheet File1, add a new column to store the values of "Day" from File 2 against each key.
b) Use VLOOKUP formula to get these values. Use this formula:
=VLOOKUP(A2,'File 2'!A:E,5,0)
c) Now extend the formula to populate all the values in this column.


d) Add another column at the end to store the results of the comparison. Use this formula to get the results:
=E2=F2
e) Extend the formula to compare all the results
 

Bingo! We have done the first round of comparisons and can immediately see the results. 

f) The second round is to implement the same this the other way round, that is, get the "Day" values from File1 into File2 sheet and compare. Like this:
 

It is always recommended to do 2-way checks as we did above. This helps in identifying any missing records in any file. Without the 2-way checks any missing record might not be highlighted in your tests and you may miss a critical defect.

4. Saving evidence

Last but not the least, we have to save down our test evidence. So, go ahead and save the Excel sheet as a .xlsx file with appropriate name.
     

Pros and Cons

Pros
1. It is very simple to do
2. No fancy tools are required

Cons
1. This is not very efficient for huge files as the formulas might take long to calculate or your machine might hang while processing
2. Difficult to analyse results in cases where there is no clearly defined Primary/Unique keys
        
Well, thanks for reading till the end. Give it a try and let me know your comments. Anyways, it is always handy to have this as a tool at your fingertips.

Files used here can be downloaded here:
File 1
File 2
Results