Have you seen the LibreOffice stats shown at FOSDEM? They have got a lot of momentum from its very beginning and seem doing well. I’d like to see the source of that, though, to compare how they build the report with ours.
Etiquetado: Software Libre RSS Mostrar/ocultar comentarios | Atajos de teclado
-
Andrés
-
Andrés
«Just as demagogues may subvert democracy, so self-promotion may subvert meritocracy.»
Open Source Projects and the meritocracy myth -
Andrés
Analysis of free software communities (III): activity and manpower
- Images: on the left, the number of changes to the codebase (commits) agregated by year. On the right, the number of developers with at least 1 commit that year.
- Data: trunk from project repositories during the period 1999-2010.
Is it something we could extrapolate from the data there?
Certainly, not the number of features developed or bug fixes. It is even barely possible to compare activity between projects, as there are a high variability in terms of changesets: some people could send several little changesets and others just 1 big change, some project could have a special policy which affect the results (i.e.: make a commit formatting the code accoring to the style rules and other with the changes), etc. Some people could even argue that the language they are written in affects the number of changes (GRASS is written in C, gvSIG in Java and QGIS in C++) due to the libraries available or the semantics of every language. So, is it possible to find out something? Well, in my opinion, we can trace at least the following:
- the internal evolution of a project.
- how a project is doing in terms of adding new blood.
So, let’s make again the exercise of finding out what’s happening here:
GRASS
- It calls the atention the curve of activity in the project: growth by periods (2001-2004 and 2005-2007) with local maximums in 2004 and 2007. Our hypothesis was that it was due to the way the project works: the developers here make changes both in the trunk and in the branch of the product to release (be it 6.4 or 6.5) at the same time, with a lot of changesets moved between both the trunk and the branches (so doing heavy backporting). In a recently conversation with Markus Neteler, he has explained me better how they work and I guess the rhythm we see in the graphics is due to that.
- In terms of number of developers, GRASS has showed a continuous growth until 2008; since then, the number of regular developers stabilizes.
gvSIG
- gvSIG shows an incredible high period of activity during 2006-2008 (4500 changesets by year and most that 30 people involved!). To understand the Gauss bell of activity, is needed to know the background of the project: gvSIG development has been led by contract, which means that all activities (planning, development, testing, etc) were led by the client needs who pay for it. Only recently, these processes have been opened to a broader community (firms and volunteers collaborating in the project within the gvSIG association). So, it makes sense that the beginnings had seen less activity (high phases of planing) and afterwards they got to agregate so many people in such a short period of time.
- But, in 2010 it suffered a sudden stop in development (only 233 changes to the codebase were made, while a pace of 4500 changes were made during previous years). This decreasing in activity is highly correlated to the number of developers involved. It’s hard to say why it happens: could it be due to the efforts were directed to gvSIG 2.0 development? could it be due to the reorganization in the project and the creation of gvSIG asociation? Well, few can we said at this respect with the data available, further research is required to determine that.
QGIS
- Steady grow both in terms of contributions and contributors. 2004 and 2008 years determine two peaks of activity and people participating in the development. Our preliminar hypothesys was that it was due to the release of the first stable version and the release of 1.0, as well as become an oficial project of OSGEO. Gary Sherman has confirmed that in a recent post (history of QGIS commiters) and an interview (part1 and part2). Besides, he pointed out that in 2007 the project added python support for plugin development, which possibly was one of the reasons of the growth in 2008 and afterwards.
- An interesting finding is that, every 4 years the project has doubled the amount of developers involved with a slower but steady growth in activity.
Well, hope these graphics have helped us to understand better how is the project activity and the manpower every project is able to aggregate around it. Next posts in the serie, will focus on the developers involved and the culture surrounding them. Looking forward to your feedback! -
Jorge
Regarding gvSIG I guess you were looking at gvSIG main repo, I don’t know. Just for the records, I want to note that gvSIG 2.0 development has been exploded to several OSOR projects and because of maven modularity there are many different locations where activity happens. César Ordiñana has been maintaining the list of repos at gvSIG Desktop 2.0 entry at Ohloh http://www.ohloh.net/p/gvsig-desktop-2/enlistments
I agree that gvSIG development has decreased in activity by “main contracts” but it’s increasing the contributions my small contracts that public administrations make to improve some specific parts of the products. I like a lot this way, as it demonstrates the maturity of understanding of public bodies decision makers regarding what free software is (pay for improvements and maintenance, not just for new ultra-cool features).
There are more to discuss here but well, it’s enough for a blog comment
Nice reports!!
-
amaneiro
Yep, the report depicts the activity in gvSIG 1.X line.
-
Andrés
Analysis of free software communities (II): adoption trends
Find below the statistics for mailinglist activity in GRASS, gvSIG and QGIS during the period 2008-2010. The first one shows data from the general user mailinglists for each project. Take into account that data for gvSIG agregated both international and spanish mailinglist due the reasons stated here.
The next one shows the same data (number of people writing and number of messages by month) for the developers mailinglists.
Is it something we could extrapolate from the data there?
Well, certainly not the user base. The data shyly introduce us the trends, not the real user base. The model we adopted to study the projects reflects just a part of the community -which is arguably the engine of project- but don’t take the data as the number of users for each project. For sure, each one of our favorite projects has more users than those participating in (these) mailinglists!
Anyway, here some food for thought:
- GRASS: it smoothly decreases in terms of number of messages as well as people writing, which happen within users and developers. The tendency is not clear though.
- gvSIG: the data shows a steadly increasing number of users participating in the mailinglists. On the other hand, although it is the project with more people suscribed to developer mailinglist, it shows the less activity of the three projects (in terms of # of messages in developer lists): few technical conversations seemed to happen through the mailinglists during that period.
- QGIS: according to the data, a clear growth exists in the community. In the period in study (3 years) the number of users and developers participating in mailinglists has been doubled!
Andrés
Analysis on free software communities (I): a quantitative study on GRASS, gvSIG and QGIS
When selecting an aplication, it’s very common to weight tecnological factors -what the aplication enable us to do?- and economic ones -how money do we need?. And yet, there is a third factor to take into account, the social aspects of the project: the community of users and developers who support it and make it be alive.
During a serie of posts begin with this, I’m going to show a quantitative analysis of communities from 3 reference projects in GIS arena: GRASS, gvSIG y QGIS. We selected those, as they are viewed as the more mature projects in desktop GIS, they are under OSGEO Fundation umbrella and show some differences on the actors who bootstrapped and manage today.
![]() |
![]() |
What we have done?
During the more than 25 years of free software movement, it has delighted us with the high capacity for fostering creation and innovation a community-based model has. Along last years, that model proved its viability in other areas too: content creation (wikipedia), cartographic data creation (openstreetmaps), translating books, etc. Yet, few is known on “how to bootstrap and grow a community”. The only thing we can do is observing what others have done and learn from their experience.
In order to contribute to the understanding on how a community-based project works I’ve work with Francisco Puga and other people from Cartolab to put together some of the public information the projects generate and make some sense from that. The actors in a community interact with each other, and, when that happen through internet, a trail is left (messages to mailinglists have author information and date, code version systems log information about the authors too, …). Basing our work on this available and public information -and standing on the shoulder on giants -i.e: reviewing a lot of research works similar to what we like to build- we have developed a quantitative analysis on the communities supporting GRASS, gvSIG and QGIS.
How did we make it?
The first step was to evaluate and gather all the public information a project, for what we like to do it in automated way. But, as we had to compare the 3 projects, the data had to be homogeneous: at least exists in both 3 and be in a comparable format. Taking these constraints into account (and the limited time we had for this!) we have collected information from 2 different systems:
- Code versions control systems: from every project, we cloned all information available in their repositories to a local git repo, in order to parse the log of changes. This allowed us to study all the history of projects, from the very begining to December 2010.
- Mailinglists: by means of mailingliststats tool -built mainly by our friend Israel Herráiz, thanks bro!- we gather data from March 2008 to December 2010.
Some disclaimers:
- Projects have a number of branches, plugins and so. We focused the study on the main product, what an user get when she downloads it. Further study on the plugins ecosystem is needed, and it will give us more fine-tuning information.
- Projects have a number of mailinglists more than we have studied (translators, steering committee, other local/regional mailinglists, etc), varying on each case. The analysis was focused on developers and users ones due to we think they are representative enough to mark the trend. We are not interested in giving an exact number (which may be impossible to measure!) but in drawing the long-term fluctuation of participation. Our intuition and past experiences, says that those mailinglists will follow a correlation of participation with the larger community surrounding the projects.
- In the particular case of gvSIG users mailinglists, we have studied spanish and english mailinglist jointly. It makes sense doing so as the spanish mailinglist still have the core of contributions from hispanoamerican countries and non-spanish people interacts through international mailinglist. It is like the project have two hearts.
- Unfortunately, quality of data have limited the period in study: the range is from March 2008 to December 2010. Prior to that, not all projects have information due to mailinglist migrations.
What is it useful for?
It’s possible to analyze a community from a variety of points of view. Our approach is a quantitative focus by means of a common model which agregate users depending on their level of participation:
- Leaders: those who build the product and make the decisions.
- Power users: those who adapt it to their needs and using it intensively.
- Casual users: those who using it for a concrete task.
This approach allow us to better understand the size of the community and how they interact, as it’s not the same the value provided by someone who in 6 months only sent 1 mail to a mailinglist than other person who spent that time sending more than 100 patches to the code.
With these constraints, we managed to built the following indicators:
- Adoption trend within users and developers: based on mailinglists data.
- Status: post published.
- Activity and manpower: based on code contributions (commits).
- Status: post published.
- Composition of the community: based on code contributions (commits).
- Status: still to be published.
- Generational analysis: based on code contributions (commits).
- Status: still to be published.
During next weeks, I will be publishing the results of the study, in order to help us to understand how different free software communities work, and what we can learn from that. Stay tunned!
Coda
The results shown here are borrowed from a paper I led jointly with Francisco Puga, Alberto Varela and Adrián Eirís from Cartolab, a GIS university research laboratory based on A Coruña. The results were shown on the V Jornadas de SIG Libre, Girona 2010. If you are fluent in spanish (reading or listening), you can benefit from these resources:
- (in spanish) The complete paper [PDF].
- (in spanish) The slides [PDF].
- (in spanish) Video explaining the highlights - not my best performance though
-
Markus Neteler
Hi,
a quick feedback: in table “Tabla 3: Top 10 desarrolladores – GRASS” the committers “markus” and “neteler” are the same person… that’s me. In a future version of the document, maybe put it together
into one line as “markus|neteler”.cheers
Markus Neteler-
amaneiro
Yep, we supposed it. Your case is not the only one, though, but we couldn’t find the time to research this in more depth (for example: asking the own users, matching the mails, …).
-
-
Markus Neteler
A comment concerning the GRASS GIS repository. Of course it is a fact that the first version was published in 1984. But since no civil internet existed nor any distributed versioning system, it is only traceable back till 1999. We decided to put GRASS into CVS the day before the famour “year 2000″ bug… So slide 4 of your presentation should be corrected (likewise the document). See also http://wiki.osgeo.org/wiki/Open_Source_GIS_History
-
Markus Neteler
The “user trends” of just 2.x years (2008-2011) are too short for multi-year projects. Find the mailing list statistics since 1999 (note that the GRASS lists were started in 1992!) here:
http://markmail.org/search/?q=qgis
-
amaneiro
Oh, what an amount of data for a research-junkie as me
I’ll compare that to ours findings. Thanks!
-
-
Cameron Shorter
Hi,
I’m fascinated by studies such as you have described, as users are regularly asking us at LISAsoft about recommendations on which Open Source project they should use, and I’d love to be able to base my response upon some solid metrics.In particular, I’d love to be able to point people at metric results for all the 50 odd projects which have been included on the OSGeoLive DVD. http://live.osgeo.org
On a related note, I’ve written a more subjective description about the keys to success building the OSGeoLive community here: http://cameronshorter.blogspot.com/2011/06/memoirs-of-cat-herder-coordinating.html
Cameron Shorter
-
Barend Köbben
Putting the Spanish and English lists of gvSIG together is basically cheating… You should have included all non-english lists for all the softwares.
-
amaneiro
Barend, I don’t think so. The indicator try to measure the trend, not the exact number. It’s very wrong to see it as it was the user base, which I suppose was your point. If we tried to do the later, summing both lists will be very inappropiate, as you suggest. But if you try the former I think it makes sense, as the community is splitted in both spanish-speaking and english-speaking (which no happens in the other projects). Basically, the project has 2 hearts with activity, and the tendendy in one place can affect to the other.
Although measuring all mailinglists would be the ideal situation, we couldn’t afford that.
Nevertheless, the tendency agregating both or taking into account the lists separately is the same, so it supports our initial guesses.
-
Andrés
Growing a community: some texts
I’m a longer passionate on community-oriented products: I’ve researched on how they work, have led one to their goal and participate in some. It’s not a new story what they are considered a powerful way to build your products (sometimes, a better one than doing in through the market or internally in a firm/closed-group-of-people). Nevertheless, I’m still looking for some good resources to learn more. For those who like the topic, find here someones I found useful (and I’d like hearing your recommendations!):
- Producing Open Source Software: the best book I’ve read on how to manage free software projects. Not only a good review on several tools, but also take into account the policies, what gives sense and glue together the community. Very practical.
- Coase’s Penguin, or Linux and the nature of the firm, by Yochai Benkler. The best academic text I’ve ever red on the matter. Benkler tries to explain why in S-XXI communities emerge as a new way to build products. You will find parallelism to the text where Coase explained why firms emerged in the S-XIX and replace local markets as preferred option. I think some more work is needed to formalized this concept in the academic arena, but the paper is clear, understandable and put the basis to further research. It’s a pioneer.
- Community antipatterns: a good talk by Dave Neary. Although it’s also focused on software development, I think it has lessons for broad communities. Sometimes, and much more in recently discovered fields, we have no idea what have worked, but know what have no worked.
- Other’s experiences. Particulary, I’ve found very useful these texts:
- Contribute/BestPractices, by Mozilla.
- How wikipedia works.
- Others I can’t remember now.
Andrés
Los costes de no trabajar upstream
Imagina el siguiente caso: deseas usar una aplicación que es software libre para construir tu propia solución ad-hoc sobre ella. Y lo harás muchas veces para diferentes clientes/productos. ¿Cómo enfocarlo? ¿Construyes tu solución con tus mejoras para ti modificando lo necesario o integras tus mejoras en la versión upstream, en el proyecto original?
Si ése es tu caso, te recomiendo que leas estos 2 artículos. El primero se centra en los aspectos económicos y sociales, el segundo en los técnicos y sociales:
- The cost of going it alone, de Dave Neary. Un buen repaso histórico con casos como el de Softway con GCC (cambios relacionados con Windows NT), Nokia con GNOME (cambios relacionados con Maemo) o Google e IBM con el kernel (el primero por cambios en Android, el segundo por cambios relacionados con drivers para manejar discos virtuales).
- Working with upstream: an interview with Laszlo Peter, by Stormy Peters. Laszlo Peter era release engineer en Sun, es decir, quien se tenía que preocupar de que en cada nueva release de Solaris todo fuese bien.
-
Nacho V
Jim Zemlin (Linux Foundation chief) talking about the issue of contributing back, “[It's] not the right thing to do because of some moral issue or because we say you should do it. It’s because you are an idiot if you don’t. You’re an idiot because the whole reason you’re using open source is to collectively share in development and collectively maintain the software. Let me tell you, maintaining your own version of Linux ain’t cheap, and it ain’t easy.”
http://www.networkworld.com/news/2011/083011-zemlin-250234.html
Andrés
«If we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”»
Edsger Dijkstra. Quoted on Are all patches create equal? article by Jonathan Corbet, a must read.
Andrés
«We (free software communities) already have a system much better than elections: you can choose which leader(s) you wish to follow and how much. If you want to be a leader, start leading, and see who wants to help.»
Richard Stallman, during an old interview.
Andrés
No os perdáis este post de Tim Sutton, release manager de QGIS que resumen de algún modo los debates en la comunidad en los últimos meses. A resaltar, 2 de los sospechosos habituales: el complejo balance estabilidad VS nuevas funcionalidades y el modelo de financiación del proyecto. Como post de acompañamiento, toca releer cómo y por qué KCube Consulting cedió 6 meses de un desarrollador a la comunidad QGIS.






cesare 18:33 el 7 octubre, 2011 Enlace permanente |
Hi, very interesting post! Only a question: where do you foind the data taht you’ve used in the charts?
amaneiro 17:29 el 8 octubre, 2011 Enlace permanente |
Hello Cesare, the data comes from the code repository of every project. We parsed it and generated the stats. If you are interested in playing with them, find them here.