(This was an attempt at a Christmas picture from last weekend. How big is baby Stella? Soooo big. Not big enough to lie down with her brothers and stay put, but soooo big.)
Oh, Google, sometimes you are stupid. You are sending people to my post called "More R Tricks" even though the tricks therein are not remotely tricksy to someone googling "R tricks." I should amend it so the first line says "Keep on googling, because you are in the wrong place. Unless you are an English major."
I do know that these posts have been useful to at least a couple of people, so I am going to keep writing them from time to time. One of the things that made me scratch my head when I was first contemplating my big messy dataset was the need to merge in variables from a couple of even larger and messier datasets. It turns out to be easy-peasy. You can find helpful merging directions here, but this is what I do.
First, I make a new object with just the ID variable and the variables of interest. I am working with a subset of a giant sample (600 kids, 1500 variables), and I could wind up with a boatload of superfluous data. So:
> attach(big.sample) [this lets me work with variable names directly, instead of typing big.sample$ before each one]
> subsample <- data.frame(ID1, IQ, digit.span, zygosity)
Now I have the variables of interest in a smaller data frame called "subsample." I could also do it by column number, if I wanted, as discussed in the post linked above, with a command like this:
>subsample <- big.sample[,c(1, 600, 602, 14)] [this would give me all the rows in those four specified columns]
Next I'm going to merge those three new variables into my smaller database. I'm going to tell R to match by participant ID and only include the IDs that appear in both datasets, like this:
> year1 <- merge(year1, subsample, by = "ID1", all = F)
If you specify "all = T," as in the examples I linked to above, you will wind up with everything from both datasets. Since the command as written tells R to overwrite your old dataset with the new one, this could induce a fair amount of teeth-gnashing. You might prefer to create a transitional object while you are getting the hang of it:
> year1.for.nervous.nellies <- merge(year1, subsample, by = "ID1", all = F)
If your identifying variables have different names in their respective datasets, there is no need to fret. Just do it like this:
> year1 <- merge(year1, subsample, by.x = "ID", by.y = "ID1", all = F) [by.x is the variable name in the first dataset, by.y its name in the second]
Now you can check and see if it worked:
> names(year1)
The last three items should be IQ, digit.span, and zygosity. Now suppose you want some frequency counts for your snazzy new data. Do the boys and the girls shake out equally as identical and fraternal twins? All you need to do is type this--
> table(year1$sex, year1$zygosity)
--and R will tell you all about it.
Recent Comments