Update: I have made some changes to the Import tool – you can download the latest version at the original link listed below. See more info at the bottom of the post.
After creating and writing up my last post on the process to automate the download of 23andme Ancestry Finder data files for your matches, I could not resist working on a way to combine all of those data files into one.
I figured Microsoft Excel was the best common denominator to use for this process – there are any number of tools I could use to combine the files and work with them at the command line, but that would limit the number of people who could use it.
So, here’s the quick scoop: My Excel spreadsheet is designed for use with the .CSV data files you have already downloaded from 23andme’s Ancestry Finder. (Again, see my Ancestry Finder Download Tool post for more info on that process.)
- It will import ALL .CSV data files in a selected folder
- It will add a column with the name of the 23andme user the data file came from. (It uses the data file name for that, so don’t rename them after you download them!)
Then it will make you a sandwich. A nice juicy data sandwich.
Each data row will have the source person AND matching person’s name in it – otherwise you would have a list of a few hundred thousand matches without knowing whom they belonged to!
It is a fairly simple tool to use. All you need to do is:
- Download the Excel spreadsheet and open it
- Allow macros to run (if Excel prompts you about them)
- Click the big IMPORT button
The Excel macro will do all the work. Once all the files are imported, you can then copy and save the data into a new sheet if you want to use the import tool again at a later date. See the READ ME tab in the sheet for more info.
If you have many data files AND many of those are from people of Ashkenazi ancestry (read: large files!) then you may run into trouble if you are using an older version of Excel that has a limit of 65,536 data rows. I have not tested for that situation, but I expect either the macro might not finish properly AND/OR you just may not end up with all of your data. (Which is worse!) The only thing I can suggest is that you upgrade to at least Excel 2007.
The amount of time required to run the import depends on the speed of your computer and the number and size of your data files.
I have about 170 data files and with a 4 year old 2-processor PC with 4GB RAM it took less than a minute. A small group of people that share with me have Ashkenazi ancestry. Their data files are much larger data files due to the larger number of Ancestry Finder matches they have, so if you have a lot more of those than I do it will certainly take longer.
My next plan of action when I have time to look at all the data files some more is to create a simple way to highlight one or more people and/or chromosome locations in the data.
For example, you might sort the data by Chromosome, Start & End Segments. Then you could select a drop down box at the top of the sheet, choose your name and then all of the lines with your name on them could be highlighted. This would make looking through the data easier to manage.
Updates as of 14 May 2012:
Always use the latest version of the tool as all previous issues have been fixed. All can be downloaded at same, original links or via this folder link.
Version 1: (V1) Original version.
V2: Fixed V1 issue with moving comma-separated source text to columns in V1.
V3: Fixed issue with first grandparent column in data being deleted during macro cleanup.
V4: Worked around Excel “feature” where text-to-columns function used last method of delimiter (usually “Tab”) by adding code to force using “Comma” as separation character.
Also added this Revision Info and a simple message at the end of the macro to let users know it has completed.