Remember those posts I used to write about R, my stats program? This is a post about R. I am going to see if I can make this interesting, but you'll have to let me know if I succeed.
I am doing a presentation in June that involves a bigger sample than most of my recent work -- about 700 kids. As the database gets bigger, it's easier to get lost in it or to miss goofy things that would be more obvious in a smaller sample.
Weird fact about me: I love playing in data. I decided to get a PhD because I thought I would love teaching. And I do -- I had a blast in my class this morning, and I think they did too. (I mean, I know it's not really about having a blast. If I can cover the material and keep them laughing, I think that's a win-win.) I didn't know how much I would enjoy the research side of things.
I'm looking at a novel measure of language development that I devised for my dissertation. Like most measures of language development, it's sensitive to age. I wanted to see what was left if I pulled out the effects of age in my sample. In other words:
> newdata$stres <- rstandard(lm(new.measure ~ age, data = newdata, na.action = na.exclude))
That last little bit cost me some head-scratching: it tells R to plug in an "NA" for anybody who has missing data, so I can match up the right kid with his very own standardized residual. (Don't you hate it when somebody gives you the wrong standardized residual?)
Next up, I wanted to get each family's data on one line -- both twins, one row. I probably shouldn't feel like a genius that I figured this out, but here's the truth: if I had a roll of aluminum foil in my office I would make a little medal that said "CONQUEROR." Here's the trick: if you separate the firstborn twins--
> newdata.odd <- subset(newdata, ID %% 2 > 0)
--from the second-born twins--
> newdata.even <-subset(newdata, ID %% 2 == 0)
--then you can merge them back together by family ID! TADA!!
> newdata.merge <-merge(newdata.odd, newdata.even, by = "familyID", all = T)
In the SPSS version of this syntax that I was combing through, you have to rename all the variables yourself. But R knows better than to give you multiple columns of variables with the same name. You'll have newdata.merge$stres.x for co-twin A, and newdata.merge$stres.y for co-twin B. And if you want to estimate genetic vs. environmental influences on your shiny new measure, you're most of the way there.
I bet you can hardly stand the excitement! Preliminary results suggest that shared environment is more important for this conversational language measure than for other related measures. Extra! Extra! Read all about in two years when I get it written up and wangled through peer review!
If you are still reading, I appreciate it! (And if you know more stats than I do and you spotted something goofy, let me know.) If you skimmed down to this paragraph in hopes that it might contain something of actual interest, no hard feelings. Regularly scheduled programming will resume soon.
Who knew a post about statistics software could contain this many exclamation points?