LinkedIn Twitter
Director, OpenShift Strategy at Red Hat. Founder of Rishidot Research, a research community focused on services world. His focus is on Platform Services, Infrastructure and the role of Open Source in the services era. Krish has been writing @ CloudAve from its inception and had also been part of GigaOm Pro Analyst Group. The opinions expressed here are his own and are neither representative of his employer, Red Hat, nor CloudAve, nor its sponsors.

6 responses to “Open Source Metrics: Let Us Get Realistic”

  1. Kin Lane

    We definitely need more meaningful signals built into Github. I measure API SDKs and libraries using the metrics you describe and definitely not enough. I would love to see deeper discussions about which connections are meaningful around downloaded and forked code.

  2. Lars Kurth (@lars_kurth)

    I agree with your point that only a mixture of metrics covering users, media, development and others are meaningful to assess the health of a project. You may want to check out

    I use a mixture of about 30 metrics to my community funnel and only look at long-term trends (within my project). I find that useful to identify problems, design remedies and see whether they are having an impact.

    Comparing one community against another is an entirely different matter. A much harder problem frought with difficulties. It also leaves the door open to spinning results in one way or another. For example, I had a go at comparing developer activity on KVM and Xen last week, which I didn’t publish. On the face of it, this should be quite simple. BUT it actually turns out to be extremely hard. For example, KVM re-uses a lot of the Linux Kernel as well as most of QEMU. So does Xen: but architectural differences between the two projects have the effect that the KVM community optimizes and develops a lot more code in QEMU, whereas the Xen community only focuses on only a small portion of QEMU and is otherwise content with what others do in QEMU. Even though both projects use most of QEMU, would it be fair to credit all activity in QEMU to both KVM and Xen? It gets worse if you look at codelines: for example the KVM codebase is essentially a clone of the Linux kernel (for convenience). So do I equate KVM with the Linux kernel, or do I pick out the files and directories that actually make up KVM? And it goes on and on when you actually are trying to do this seriously.

    In essence you have to answer the question of what “constitutes” the project and what do you actually count. An almost intractable problem, in particular when you have complex software with dependencies, maybe implemented in different languages even. The lesson clearly is that even when different projects solve a similar problem, their software architecture, the motivation of the people behind the projects, culture as well as decisions made by a project in the past skew the metrics and can make a direct comparison of two projects hard or even meaningless.

  3. smaffulli

    +1 to Lars. Measuring one community is complex but doable. You need all of the dimensions mentioned by Krishnan and more (like “are there books published on the project?”). You don’t want to look at download numbers, but you want to look at trends: are downloads increasing, decreasing or stalling? That is a more meaningful dimension than the pure number.

    Comparing different communities is a radically different topic. I think Qyinye is doing a good job trying to make this comparison and I look at his research only to see the trends for the other projects. If there are radical changes (spikes or valleys) I go look at the data sources to understand what really happened: was there a flame? are commit logs/requests for review being sent to the regular mailing list?

  4. Michael DeHaan (@laserllama)

    This has been a big interest of mine.

    For those talking about git numbers, gitstats is pretty good — but could possibly benefit from different output modes — — and an older attempt I made — My version is too object oriented though and will break down if you run it against the kernel, and probably has some stats errors 🙂

    I previously also wrote a mailing list scanner for Red Hat — to determine which project lists were reasonably healthy — though it had some flaws. The main thing I wanted to track was traffic by user domain (to decide outside interest vs internal company interest), but this is difficult because lots of folks (including me) use personal accounts. One goal I never really achieved in all of that was managing the “people who post here also post there”, which is one thing that I think would be very interesting to gather. The other goal was to connect the two apps together to form more complete views into OSS communities.

    In any event, the character of individual communities can be very different — depending on the user base, folks may like to discuss things a lot, or less — and real discussions may happen on IRC, in hallways, in github pull requests, or in person. Mailing lists are hard to look at. One of the more tricky things — the list with more traffic can sometimes be the one that is in more churn (versus forward momentum) or may have a lot of confused or new users — and there’s no way to really digest from the data to tell which is which programmatically.