Stitching together library data with Excel

Part of my contribution to the Library Impact Data Project has been compiling the data from different sources into a single spreadsheet. We have been able to gather data on items borrowed in a given year, visits to the Kimberlin library building, logins to the network in the libraries and use of an Athens account to access electronic resources. Some of the techniques used were new to me, and may be worth knowing about (or remembering if I have to run the same jobs again next year).

Data merging scenario

You have two spreadsheets, one with data about items loaned from the library and one from the Access Control system showing who entered the building. Both cover the same date range and are organised by some kind of user ID.

Loans file

p123456 17
p098765 12
p456098 6

Visits file

123456 45
098765 20
674523 23

To match the two spreadsheets, we need a common field. The first column in each nearly does this, but we need to fix the missing ‘p’ from the start of each ID number. There is a merge columns function in Excel, so you could create a new first column, copy ‘p’s into each cell and then merge the two columns into one in Column C. This is a bit tedious if you have thousands of lines of data. Another way of achieving the same results would be to adapt another function, using concatenate to add a prefix letter to each cell, e.g. =CONCATENATE(“p”,A1). Copy this formula into the cells in the inserted B column to populate with the correct ID data that we can use to match with the other spreadsheet.

Column format

At this point you might complain that nothing of the sort happens, that all you get is a repeated view of the formula itself and not its result. Most likely this is because the column you want to use to create the merged data has been set to be in text format. Quite rightly, at least in its own terms, Excel it treating what you type in a text, rather than instruction. Change the format to ‘General’ and the formula will perform its magic.

There may be a good reason why you set the column to text though. Excel generally expects to be working with numbers, rather than strings. Unless the column is set to ‘text’ it is likely to quietly amend an ID like ‘098765’ to ‘98765’.

When importing .csv files this kind of change is carried out automatically, so a way round this is to change the file format to .txt and then when you import it, you get more control over how each column is to be handled.

Merging spreadsheets with VLOOKUP

If you don’t already have sensible labels for your columns, here is a good moment to put them in.
With both speadsheets open you can begin to merge the data. In the first cell of the column where the data is to be inserted, click on the fx button. A popup list of functions will appear. Search through the list until you see ‘VLOOKUP’. Select that and a new menu appears which enables you to set how the process is to work. The four values required are for: Lookup_value; Table-array; Col_index_num; Range_lookup. There is more on these fields on the Microsoft Excel site, but basically:

  1. Lookup_value is the identifier common to both spreadsheets. In the spreadsheet to which you want to bring the new data, click on the first relevant cell (A2).
  2. In the spreadsheet that you want to collect the new data from, click on the top left and bottom right cells to highlight the relevant data.
  3. The data that you want to import is in the second, right hand column, so type ‘2’ into the Col_index_num field
  4. Some of the rows will not produce matches. Someone could regularly visit the library without borrowing anything. In that case we only want the exact matches, so ‘false’ is the correct answer for this field.

Apply this with ‘OK’ and you should get either a value inserted into your cell, or an ‘N/A’ if there is no match. Copying and dragging this formula into the rest of the cells for the column should bring in the remaining data. This can be the most tedious part of the process if you have thousands of rows to copy into.
So now you should have:

Loans file

Identifier Loans for year Visits for year
p123456 17 45
p098765 12 20
p456098 6 N/A

Checking your merges

For the sake of data integrity and general peace of mind you will want to check that the data has been correctly carried across. You can use VLOOKUP again to check on the merge, but this time reverse the spreadsheets so that you are copying back data from the original destination sheet.

About Philip Adams

Senior Assistant Librarian at De Montfort University. I am interested in digital preservation and the use of data to measure a library's impact. All comments own.
This entry was posted in lidp, Statistics. Bookmark the permalink.