Kieran Healy

Posted
29 January 2003 @ 1pm

Tagged
Sociology

The Ecology of Open Source Software

I’ve been working on a a paper [pdf] about Open Source Software development with Alan Schussman, a graduate student in my department. It’s still in a very early state—- we’re really just kicking around a few ideas—- but some of the findings are interesting. The basic idea is to try to look at the social structure of the OSS development community a bit more closely. Here’s a neat picture of the basic finding, which is that OSS projects are fantastically skewed on several different measures of activity. (They have `power law’ type distributions.) We’re not the first people to notice this by any means, though I don’t know of other work that looks at as many projects as we do, and Internet-related research on power laws has focused on the topology of the Internet rather than the organization of software development communities. The working paper has much more by way of context, prior research and speculation about what explains this.

Here’s the abstract.

The Ecology of Open Source Software Development.

Kieran Healy and Alan Schussman.

University of Arizona.

Abstract: Open Source Software (OSS) is an innovative method of developing software applications that has been very successful over the past eight to ten years. A number of theories have emerged to explain its success, mainly from economics and law. We analyze a very large sample of OSS projects and find striking patterns in the overall structure of the development community. The distribution of projects on a range of activity measures is spectacularly skewed, with only a relatively tiny number of projects showing evidence of the strong collaborative activity which is supposed to characterize OSS. Our findings are consistent with prior, smaller-scale empirical research. We argue that these find-ings pose problems for the dominant accounts of OSS. We suggest that the gulf between active and inactive projects may be explained by social-structural features of the community which have received little attention in the existing literature. We suggest some hypotheses that might better predict the observed ecology of projects.


3 Comments

Posted by
Iain J Coleman
29 January 2003 @ 5pm

A very interesting paper. Just one wee word of warning. It’s pretty unsafe to infer anything about the power law characteristics of a distribution if you’ve only sampled it across a couple of decades (i.e. a couple of factors of ten) of data. In your paper, the main point about the distributions is their skewness, and the power-law-like behaviour is a side point. That’s fine, and I’ve no quibbles with it. I just want to caution you against going too far down the power-law route in future work, unless you’re sure your data set covers enough orders of magnitude to support such an interpretation.


Posted by
Kieran Healy
29 January 2003 @ 6pm

Fair enough—as far as I can tell, in Internet research (and social science more generally), I haven’t seen much substantive interpretation of
‘power law’ type features such as the topology of the Internet. The stuff I’ve seen just says ‘Look at this shape—interesting, eh?’ but there’s much less about the mechanism producing that distribution. My main interest is in why so few projects are active, and what’s driving that, and (as you say) that point holds without making too much of the power law thing. It’s just highly skewed.


Posted by
Neel Krishnaswami
31 January 2003 @ 6am

A quick comment: there are many kinds of very important community assistance that don’t show up in the CVS logs. I’m doing a small language-development effort, and if you look at the logs I’m the only committer, and I do so relatively infrequently. However, that radically understates the amount of help I’ve gotten from the outside world.

There were a lot of theoretical CS papers that were essential to my design, and the authors have been uniformly receptive and helpful when I have emailed them about their papers. In fact, I’m applying to graduate school in CS this year, and it would be fair to say that all of my recommendation letters were made possible by meeting computer scientists who were interested in and supportive of what I have been doing. I guess this is a reputational effect like Tirole talks about, but I should point out that I got lots of help even before I started publishing my code, so community assistance is what got me to the point where I could start generating any reputational capital in the first place!

Incidentally, one substantial difference between computer science and economics is that you can find most papers for free in CS. Citeseer is free; JSTOR isn’t. This makes it much easier to track down references and try out state-of-the-art things when messing around with CS-type stuff.