Recently a blogger wrote an article comparing the mailing list interaction in the communities around major open source infrastructure projects. It is a personal project by a blogger using various data sources available in the internet. But the post kickstarted discussion among the punditry talking about whether OpenStack or CloudStack is the top ranking infrastructure project with each party pointing to metrics convenient to them. Even though the bloggers intentions were completely different, cloud chatterati (or clouderati or pundits or whatever term you want to use here) is going wild analyzing these metrics. Not to be left alone, I thought I will jump in and offer my analysis of this analysis 🙂
What are the usual metrics used to study open source projects and why it matters
Though different folks use different metrics to highlight the point they want to highlight, the most popular among them are:
- Software downloads
- Mailing list activity
- Production deployment claims
- Conference/PR case studies
- Github activity
- Google Trends
- Job board metrics
There are other metrics too but these are the most widely used metrics in any discussion/debate. In this post, I will offer my thoughts on some of these metrics and see if the discussion leads eventually leads to a more comprehensive metric that can be used to measure the health of an open source project.
To begin with, I want to dismiss two of the metrics right away as they are completely meaningless. They are software download numbers and mailing list activity (the metric behind current brouhaha). Even though software download gives some idea about the interest in the project, it doesn’t give any granular information like how many of them are for use in production systems, how many of these downloads have lead to installations, how many of these installations are being used, etc..Without this insight, the number of software downloads means nothing. Similarly, the activity on the mailing list is another meaningless metric. Even though mailing lists are like oxygen for open source projects, the activities in these lists can range from some serious discussions to unnecessary flame wars. In fact, you can see many mailing lists where flame wars are a daily affair. Measuring the activity doesn’t represent anything about the project’s health. If the activity on the mailing list is an accurate indication of any project’s health, we can then conclude that OpenStack was healthiest during the board elections fiasco. #justsayin.
There are some suggestions that metrics regarding production deployments can accurately describe the health of the open source project. Yes, it does describe it accurately from the user point of view but it is still not a good metric. First, it is difficult to get the metric uniformly for all the open source projects. Many of the end users may not be willing to publicly talk about how they are using an open source project. Second, even if we assume that we get that metric accurately, it only offers information on the usage of the project and not the developer activity. In order to understand the actual health of the open source project, we need data both on the usage front and developer contribution. Not only this metric is difficult to obtain, it only gives partial information about a project. The same argument applies for conference/PR case studies too.
Github activity, including pull requests, commits, number of lines of code, etc., is a good indicator of developer activity but it doesn’t say anything about the actual usage of the software. Though it is possible to make an indirect observation about actual usage based on the developer activity (why would developers spend their time if there is no traction for the software), it still doesn’t give accurate information on the use of the software. Though github activity is easier to obtain, it still gives partial information on the health of the project. Google Trends offers some interesting insight on the interest on a particular project but it is still a partial information. Job board metrics (like number of job openings on sites like indeed.com) is a very good metric that can offer insight on the actual usage of the software but it is still an indirect measure.
So
If someone is interested in getting a grip on the health of an open source project, it is important that they take into account many relevant metrics so that they can build an accurate story covering all the bases. Talking about the health of the project based on a single metric is meaningless. It is definitely a waste of time to talk about the health of a project based on metrics like number of software downloads and mailing list activities. #justsayin.
We definitely need more meaningful signals built into Github. I measure API SDKs and libraries using the metrics you describe and definitely not enough. I would love to see deeper discussions about which connections are meaningful around downloaded and forked code.
I agree with your point that only a mixture of metrics covering users, media, development and others are meaningful to assess the health of a project. You may want to check out http://talesfromthecommunity.wordpress.com/2012/06/16/viewing-communities-as-funnels/
I use a mixture of about 30 metrics to my community funnel and only look at long-term trends (within my project). I find that useful to identify problems, design remedies and see whether they are having an impact.
Comparing one community against another is an entirely different matter. A much harder problem frought with difficulties. It also leaves the door open to spinning results in one way or another. For example, I had a go at comparing developer activity on KVM and Xen last week, which I didn’t publish. On the face of it, this should be quite simple. BUT it actually turns out to be extremely hard. For example, KVM re-uses a lot of the Linux Kernel as well as most of QEMU. So does Xen: but architectural differences between the two projects have the effect that the KVM community optimizes and develops a lot more code in QEMU, whereas the Xen community only focuses on only a small portion of QEMU and is otherwise content with what others do in QEMU. Even though both projects use most of QEMU, would it be fair to credit all activity in QEMU to both KVM and Xen? It gets worse if you look at codelines: for example the KVM codebase is essentially a clone of the Linux kernel (for convenience). So do I equate KVM with the Linux kernel, or do I pick out the files and directories that actually make up KVM? And it goes on and on when you actually are trying to do this seriously.
In essence you have to answer the question of what “constitutes” the project and what do you actually count. An almost intractable problem, in particular when you have complex software with dependencies, maybe implemented in different languages even. The lesson clearly is that even when different projects solve a similar problem, their software architecture, the motivation of the people behind the projects, culture as well as decisions made by a project in the past skew the metrics and can make a direct comparison of two projects hard or even meaningless.
+1 to Lars. Measuring one community is complex but doable. You need all of the dimensions mentioned by Krishnan and more (like “are there books published on the project?”). You don’t want to look at download numbers, but you want to look at trends: are downloads increasing, decreasing or stalling? That is a more meaningful dimension than the pure number.
Comparing different communities is a radically different topic. I think Qyinye is doing a good job trying to make this comparison and I look at his research only to see the trends for the other projects. If there are radical changes (spikes or valleys) I go look at the data sources to understand what really happened: was there a flame? are commit logs/requests for review being sent to the regular mailing list?
This has been a big interest of mine.
For those talking about git numbers, gitstats is pretty good — but could possibly benefit from different output modes — http://gitstats.sourceforge.net/ — and an older attempt I made — https://github.com/mpdehaan/lookatgit. My version is too object oriented though and will break down if you run it against the kernel, and probably has some stats errors 🙂
I previously also wrote a mailing list scanner for Red Hat — to determine which project lists were reasonably healthy — though it had some flaws. The main thing I wanted to track was traffic by user domain (to decide outside interest vs internal company interest), but this is difficult because lots of folks (including me) use personal accounts. One goal I never really achieved in all of that was managing the “people who post here also post there”, which is one thing that I think would be very interesting to gather. The other goal was to connect the two apps together to form more complete views into OSS communities.
In any event, the character of individual communities can be very different — depending on the user base, folks may like to discuss things a lot, or less — and real discussions may happen on IRC, in hallways, in github pull requests, or in person. Mailing lists are hard to look at. One of the more tricky things — the list with more traffic can sometimes be the one that is in more churn (versus forward momentum) or may have a lot of confused or new users — and there’s no way to really digest from the data to tell which is which programmatically.