Two years ago I was taking a class in regression analysis. During the week that my second son turned eight and I needed to produce Halloween costumes for four children, I also had a big stats assignment due. I was doing this assignment with a freebie student version of SAS when the computer on which it lived died a sudden and yet harrowing death.
Because I live an hour from campus with its computer labs, I opted to complete the assignment using a powerful and yet deeply unfriendly tool called R. Say it with a pirate-y inflection: Aaaarrrrrrr. A mean and nasty kind of pirate inflection, like you'd hear from a hook-handed pirate with three species of lice dwelling in the beard he set ablaze. (Don't ask me how the lice manage to thrive in a flaming beard. Just go with it.) I blogged about it here and here -- it was No Fun.
Recently I was reflecting on this experience and thinking it was providential. R is exactly the software I want to use for my dissertation, but I needed a kick in the pants to get started. Today I started analyzing my data, which might make me say huzzah if it didn't make me say harrumph. It has not been a successful afternoon, in large part because I have forgotten lots of things I used to know about R.
In addition to remembering the forgotten stuff, I am also going to need some new R skills. I have just learned that missing data will cause R to seize up and curl into a ball, like our pugnacious pirate post-bayoneting. "I can't give you a correlation coefficient," it said to me scornfully, while the lice jumped ship in search of a less fiery home. The thing about longitudinal studies is that families drop out, which means I have missing data aplenty. AaaRRRRRgghh, I say. I might be saying that a lot in the coming weeks.
Lots of things have changed in the two years since I first tried to bend R to my will. My children are older, I have more of them, I'm only months away from finishing my degree now. But one thing remains the same: the R manual is still written and edited by people with only a tenuous grasp of what a manual is actually for. Check out this paragraph from the section on dealing with missing values:
Notice that the logical expression
x == NA
is quite different fromis.na(x)
sinceNA
is not really a value but a marker for a quantity that is not available. Thusx == NA
is a vector of the same length asx
all of whose values areNA
as the logical expression itself is incomplete and hence undecidable.
I am a pretty peaceable person, my friends, but it's a good thing I don't have any pikes or cutlasses handy at this moment.
Recent Comments