 
      JPNor
Shared on Thu, 02/28/2013 - 09:46I know we have a number of people on the site who know this stuff inside and out. Here's my situation.
I will be receiving a spreadsheet with about 2200 rows on it (including people's first and last names in individual columns). Let's call this Speadsheet A.
Spreadsheets B and C are massive files - B has about 54,000 rows and C has 118,000 rows. Both also have individual columns for first and last name.
I need to check each name on Spreadsheet A against Spreadsheets B and C to determine if any of the people on Spreadsheet A are on either of the others. Rather than searching each name by typing it in (twice) I am looking for a way to automate the process.
Can anyone provide some ideas? I would prefer to insource this project instead of paying our IT company to do it. From my google searching it appears that this is very possible, but I have no idea where to start. Where are my nerds?
- JPNor's blog
- Log in or register to post comments
 
          
Comments
Submitted by Lbsutke on Thu, 02/28/2013 - 11:32
yep. You can import all three spreadsheets into access creating each on as a seperate table. You could then create queries to compare the tables to see if the names showed up in b and c when compared against A.
If that is all you are doing it is really pretty simple. The only issues you might have if you have multiple spellings or variations of names. Fore example Tom Smith might be just one person or could be multiple people. If you know for an absolute fact that would not be the case then it really is not an issue.
Also hopefully your data is formated the same way as well, but you will just need to do a check of the number of records and compare them against the original source. Remembering to subtract 1 record from the spreadsheet total of records. Due to the fact Access will not count the header row as a record.
I use to do this stuff at my old job so I am pretty familular with it.
If you have any questions let me know.
Submitted by tocleora on Thu, 02/28/2013 - 11:48
Yeah I forgot Access has that interface for querying, which will be a lot easier than doing the raw sql query. Good call and good points on the name variants as well!
Submitted by JPNor on Fri, 03/01/2013 - 08:23
Awesome, thanks for the advice. Unfortunately the client sent me a spreadsheet with only one column for the name (LAST, FIRST) instead of split. So I have to wait again before I can get this started.
LB your Tom Smith example brings up a great point. Validating the identity of the subject is simple enough with additional identifiers (this is required under federal law in my line of work anyway). However, your example reminds me of another issue - Tom Smith could also be Thomas Smith, just like William/Will/Bill/Billy. So I will need to keep a very close eye on the results. Either way, this query process looks like it will save me a TON of time.
Submitted by Lbsutke on Fri, 03/01/2013 - 10:27
Yep. This was part of my process for our yearly Sarbanes Oxley review and it made me want to punch puppies and then set them on fire.
If you have an questions let me know.
Submitted by tocleora on Thu, 02/28/2013 - 11:47
I'm somewhat new to excel and access (I've been on the linux side of things) but if you can import those spreadsheets into tables in access (or better yet if there's a way to query excel spreadsheets directly) you can do a simple sql query similar to:
I don't have a way to test these but doing a quick google I found these two links that may help you either query directly with excel or import excel into access.
http://support.microsoft.com/kb/141228
http://smallbusiness.chron.com/use-sql-statements-ms-excel-41193.html
Submitted by Lbsutke on Thu, 02/28/2013 - 11:58
yep you can link directly to the spreadsheets. But make sure you original data is not shit. One of my favority terms I learned from my boss. Shit in gives you shit out.
Submitted by Blue_Stiehl on Fri, 03/01/2013 - 10:54
You can use the COUNTIF function in Excel for this.