Excel gurus in the house?

9 posts / 0 new
Last post
#1 Sun, 07/08/2012 - 21:58
HadOne2Many's picture
Offline
Last seen: 10 years 7 months ago
Joined: 11/14/2006 - 23:00

Excel gurus in the house?

I have a problem I've been trying to work out and can't seem to come up with a good solution, was wondering if the collaborative hive mind here could help.

Here's the situation.  I have a spreadsheet with over a half a million rows of data corresponding to approximately 42,000 separate items.  I have three columns that contain data I need.  A = item name.  B = attribute name.  C = attribute value.  Currently thre are duplicates in column A, for example A could have item "abc123" listed multiple times with different values in B and C (such as B=color: C=blue, B=height; C=1, B=width; C=2, B=length; C=2.5, etc.)

I'm looking for a way to break each attribute name in column B into it's own column width the contents of C as the data in the cell.  Can this been done with a few formulas?

With such a large amount of data doing this by hand could take me weeks, and I'm trying to avoid that.

Appeciate the help ahead of time.

Mon, 07/09/2012 - 07:29
Blue_Stiehl's picture
Offline
Last seen: 4 years 10 months ago
Joined: 08/18/2005 - 23:00
Currently Playing: 

The first thing you need to determine is how many unique attributes are in column B. Current versions of Excel have a limit of just over 16,000 columns.

Mon, 07/09/2012 - 08:34
HadOne2Many's picture
Offline
Last seen: 10 years 7 months ago
Joined: 11/14/2006 - 23:00
I have 1913 unique attributes, so I should be good to go on that part.
Tue, 07/10/2012 - 12:14
HadOne2Many's picture
Offline
Last seen: 10 years 7 months ago
Joined: 11/14/2006 - 23:00
OK, I think I may have this one figured out, it worked perfectly in a smaller test environment, but I'm not sure what is going to happen when I try it on the actual data. First thing I've done is remove all the duplicate entries in the "Attribute Name" column. I did this by copy/pasting the data onto a new spreadsheet, sorting by name and running a "=IF(A1=A2,1,0)" formula in the second column. Then I copied this new column and did a paste special to remove the formula, leaving only the values. From here it was just a matter of sorting again by the 1's and 0's, then deleting all the 1's. What I am left with is a column with just the 1913 unique entries I mentioned in my previous post. EDIT: Neverming, the vlookup below isn't working, gotta rethink this. The next step was to copy these 1913 values and to a "paste special > transpose" into the first row starting with column D on my original spreadsheet. I now have headers for each attribute, but I still needed to get the columns populated with the values. I put this formula into cell D2 "=VLOOKUP(D$1,$B$1:$C$600000,2,FALSE)" then copied and pasted it into the rest of the spreadsheet. Gonna reboot my computer now to free up some resources, then give it a shot and see what happens.
Tue, 07/10/2012 - 23:22 (Reply to #4)
FadeIntoBlack's picture
Offline
Last seen: 1 year 10 months ago
Joined: 03/23/2005 - 23:00

HadOne2Many wrote:
I put this formula into cell D2 "=VLOOKUP(D$1,$B$1:$C$600000,2,FALSE)" then copied and pasted it into the rest of the spreadsheet. Gonna reboot my computer now to free up some resources, then give it a shot and see what happens.

 

I'll admit that your instructions about what you are trying to accomplish are kind of confusing to me.  Based on what I read, you are trying to break out all the information in column B into their own unique columns based on what the value is that is returned?  If that is the case, you may have to create a formula in each column.  I am sure I am missig something key that you are trying to get across because my first thought was why you aren't putting that much data in a pivot table. 

 

But from what I read, columns 'A', 'B', and 'C' are where your data exists and other columns are empty, so your VLOOKUP above won't work because the first argument is your lookup value, and you are pointing it toward an empty column.  If you can take a few pieces of the data and take a screenshot of it in it's current format, and then take two or three rows, scrub them to show how you want it to look and take another screenshot, then post both pics, I could help you the right formulas or VLOOKUPS maybe.  If the data came out of an EDW or something, post your SQL script and I can see if I can help there to maybe get your result without so much manual EXCEL work.

I fully admit though that it has been a long day so I am tired and what you typed may have made perfect sense.  ;)

Tue, 07/10/2012 - 13:54
Blue_Stiehl's picture
Offline
Last seen: 4 years 10 months ago
Joined: 08/18/2005 - 23:00
Currently Playing: 
Tue, 07/10/2012 - 16:51
SirPoonga's picture
Offline
Last seen: 10 years 5 months ago
Joined: 02/20/2006 - 23:00

Uhg, using a spreadsheet as a database...  using the right tool would help...

I could do it in an excel macro, but the programmer in me always goes for macros over excel functions.  If you can't get vlookup to work I might be able to make a macro for you.

Tue, 07/10/2012 - 19:46 (Reply to #7)
HadOne2Many's picture
Offline
Last seen: 10 years 7 months ago
Joined: 11/14/2006 - 23:00
I do have Microsoft Access as well, but I'm not familiar enough with it to make it do what I need and my end result needs to be a csv file so I'm thinking I'll need to run it through excel no matter what. I could be wrong though. I got a little frustrated after my first attemp fell flat and worked on something else today. I will take another stab at it tomorrow and see what I can do.
Fri, 07/13/2012 - 18:37
NorthernPlato's picture
Offline
Last seen: 9 years 6 months ago
Joined: 08/23/2005 - 23:00

A pivot table, as suggested by Fade, would likely be the easiest way to format the data you want though I'm not sure if the table can be exported into a .csv file in the manner it seems you're looking for.

Don't use the vlookup() for matching two criteria.  You should be able to do the same thing with nested index(match()) formula.

My understanding is that your original data is in three columns (part name, duplicated; series of property names, duplicated for each part; data, unique to each part name & property name), and that the table you've created has the part name (unique values only) in column A, and the property names in row 1, and you want to create a table pulling the values from column C in the original dataset to populate interected cells in the new dataset.

try the following (dry-coded, you will likely need to play with the cell references)

=index(sheet1!$a$1:$c$600000,match($A2&B$1,sheet1!$A$1:$A$60000&$B$1:$B$600000,0),3)

This should (I believe) pull the value from column C that corresponds to the value of the part name in the first column on the second sheet (you should put your new dataset in a new sheet) and the property name in row 1

Copy the formula to all the cells you want data in, then select the range of data, press F2 to open one of the cells for data entry and Ctrl+Shift+Enter to enter all the cells as an array formula.

Without a select of the dataset, it's the best I can do unfortunately.

 

If it works, select all the cells in the new worksheet, copy and paste special to paste values and overwrite the formulas because that many index formuli will seriously bog shit down

 

Edit:

I figured I'd get around to testing this out and seeing if it would work.  It does and the formula I used for my test data was {=INDEX(Sheet1!$A$1:$C$15,MATCH($A2&B$1,Sheet1!$A$1:$A$15&Sheet1!$B$1:$B$15,),3)}.  Enter it into one cell as an array, then copy right and down for the rest of the range. Obviously, the ranges and references would need to be changed.  I tried to attach a copy of the file, but it seems we can't, which is probably for the best considering some of the people here. (I'm looking at you, denizens of site chat and  links to lemonparty.org as a "halo stat site, check it out!"  You know who you are)

Join our Universe

Connect with 2o2p