Categories
All CartoLab English Software Libre

Analysis on free software communities (I): a quantitative study on GRASS, gvSIG and QGIS

Disclaimer – this post is part of a serie: I (this one), IIIIIIV and V.

When selecting an aplication, it’s very common to weight tecnological factors -what the aplication enable us to do?- and economic ones -how money do we need?. And yet, there is a third factor to take into account, the social aspects of the project: the community of users and developers who support it and make it be alive.

During a serie of posts begin with this, I’m going to show a quantitative analysis of communities from 3 reference projects in GIS arena: GRASSgvSIG y QGIS. We selected those, as they are viewed as the more mature projects in desktop GIS, they are under OSGEO Fundation umbrella and show some differences on the actors who bootstrapped and manage today.

What we have done?

During the more than 25 years of free software movement, it has delighted us with the high capacity for fostering creation and innovation a community-based model has. Along last years, that model proved its viability in other areas too: content creation (wikipedia), cartographic data creation (openstreetmaps)translating books, etc. Yet, few is known on “how to bootstrap and grow a community”. The only thing we can do is observing what others have done and learn from their experience.

In order to contribute to the understanding on how a community-based project works I’ve work with Francisco Puga and other people from Cartolab to put together some of the public information the projects generate and make some sense from that. The actors in a community interact with each other, and, when that happen through internet, a trail is left (messages to mailinglists have author information and date, code version systems log information about the authors too, …). Basing our work on this available and public information -and standing on the shoulder on giants –i.e: reviewing a lot of research works similar to what we like to build- we have developed a quantitative analysis on the communities supporting GRASS, gvSIG and QGIS.

How did we make it?

The first step was to evaluate and gather all the public information a project, for what we like to do it in automated way. But, as we had to compare the 3 projects, the data had to be homogeneous: at least exists in both 3 and be in a comparable format. Taking these constraints into account (and the limited time we had for this!) we have collected information from 2 different systems:

  • Code versions control systems: from every project, we cloned all information available in their repositories to a local git repo, in order to parse the log of changes. This allowed us to study all the history of projects, from the very begining to December 2010.
  • Mailinglists: by means of mailingliststats tool -built mainly by our friend Israel Herráizthanks bro!– we gather data from March 2008 to December 2010.

Some disclaimers:

  • Projects have a number of branches, plugins and so. We focused the study on the main product, what an user get when she downloads it. Further study on the plugins ecosystem is needed, and it will give us more fine-tuning information.
  • Projects have a number of mailinglists more than we have studied (translators, steering committee, other local/regional mailinglists, etc), varying on each case. The analysis was focused on developers and users ones due to we think they are representative enough to mark the trend. We are not interested in giving an exact number (which may be impossible to measure!) but in drawing the long-term fluctuation of participation. Our intuition and past experiences, says that those mailinglists will follow a correlation of participation with the larger community surrounding the projects.
  • In the particular case of gvSIG users mailinglists, we have studied spanish and english mailinglist jointly. It makes sense doing so as the spanish mailinglist still have the core of contributions from hispanoamerican countries and non-spanish people interacts through international mailinglist. It is like the project have two hearts.
  • Unfortunately, quality of data have limited the period in study: the range is from March 2008 to December 2010. Prior to that, not all projects have information due to mailinglist migrations.

What is it useful for?

It’s possible to analyze a community from a variety of points of view. Our approach is a quantitative focus by means of a common model which agregate users depending on their level of participation:

  • Leaders: those who build the product and make the decisions.
  • Power users: those who adapt it to their needs and using it intensively.
  • Casual users: those who using it for a concrete task.

This approach allow us to better understand the size of the community and how they interact, as it’s not the same the value provided by someone who in 6 months only sent 1 mail to a mailinglist than other person who spent that time sending more than 100 patches to the code.


With these constraints, we managed to built the following indicators:

  • Adoption trend within users and developers: based on mailinglists data.
  • Activity and manpower: based on code contributions (commits).
  • Composition of the community: based on code contributions (commits).
  • Generational analysis: based on code contributions (commits).

During next weeks, I will be publishing the results of the study, in order to help us to understand how different free software communities work, and what we can learn from that. Stay tunned!

Coda

The results shown here are borrowed from a paper I led jointly with Francisco Puga, Alberto Varela and Adrián Eirís from Cartolab, a GIS university research laboratory based on A Coruña. The results were shown on the V Jornadas de SIG Libre, Girona 2010. If you are fluent in spanish (reading or listening), you can benefit from these resources:

From those who can’t, I’ll summarize the main points through small posts on each topic’s paper. The original authors have not reviewed the text as published in my blog, so consider any opinion expressed here as my own (have them to review my texts is a boring and time-consuming task I’m sure they prefer to skip). Please, beg my english.
Categories
All English

Wiki update

Done some reorganization on wiki contents and wrote a bit on refactoring and code smells. I’m proud on the pace and themes the wiki is evolving: I have grown quite a bit of software development topics, which is a reflection on my readings and focus last years. Although could evolve later, the topics on software development are organized in 3 subcategories:

Categories
All English Fortunes Radar

I will never stop learning. I won’t just work on things that are assigned to me. I know there’s no such thing as a status quo. I will build our business sustainably through passionate and loyal customers. I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything. I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation. I will communicate as much as possible, because it’s the oxygen of a distributed company. I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.

— Automattic Creed, Matt Mullenweg.
Categories
All English Software Libre

Growing a community: some texts

I’m a longer passionate on community-oriented products: I’ve researched on how they workhave led one to their goal and participate in some. It’s not a new story what they are considered a powerful way to build your products (sometimes, a better one than doing in through the market or internally in a firm/closed-group-of-people). Nevertheless, I’m still looking for some good resources to learn more. For those who like the topic, find here someones I found useful (and I’d like hearing your recommendations!):

  • Producing Open Source Software: the best book I’ve read on how to manage free software projects. Not only a good review on several tools, but also take into account the policies, what gives sense and glue together the community. Very practical.
  • Coase’s Penguin, or Linux and the nature of the firm, by Yochai Benkler. The best academic text I’ve ever red on the matter. Benkler tries to explain why in S-XXI communities emerge as a new way to build products. You will find parallelism to the text where Coase explained why firms emerged in the S-XIX and replace local markets as preferred option. I think some more work is needed to formalized this concept in the academic arena, but the paper is clear, understandable and put the basis to further research. It’s a pioneer.
  • Community antipatterns: a good talk by Dave Neary. Although it’s also focused on software development, I think it has lessons for broad communities. Sometimes, and much more in recently discovered fields, we have no idea what have worked, but know what have no worked.
  • Other’s experiences. Particulary, I’ve found very useful these texts:
In the road to understand how a community fully works, you will review topics as economics, group interaction and even antropology! I find it very intructive. As broad as the theme is, it has plenty of room to learn more of other sciences. So, being involved in a community, study or just read about it’s a good oportunity to learn.
Categories
All English iCarto

I’m not such a fan of comparatives to rank things. But I find them useful to know your pros and cons, or at least to know how the surrounding community perceive your product. While having a coffee today I found this article on gis @ stackexchange: QGIS and gvSIG comparison. Made me happy than 2 out of 6 gvSIG pros are tools where I’m engaged: NavTable and OpenCADTools. Keep rocking cartolab and iCarto!

Categories
All English Radar

«It may be impossible to despise your client or users and still deliver a quality product.»

— Advi Grimm, There is no such a thing as a good field programmer

A good story on leading by example. Or why code quality matters.

Categories
All English Fortunes Radar

Real artists ship.

Steve Jobs, 1983. Also: how Apple releases its products and why it’s one of its strengths.
Categories
All English

«My friends and I have been coddled long enough by a billionaire-friendly Congress. It’s time for our government to get serious about shared sacrifice.»

— Warren Buffer, NYT 15/08/11

On the the differences between the taxes to labour and the taxes to capital in USA.

Categories
All English iCarto Programming pills

How gvsig manages the snappers

Last week I paired together with Francisco Puga to review the status of opencadtools. As Fran is doing a great work in preparing the integration of opencadtools as default CAD tools in gvSIG, I wanted to know first hand how it was going. iCarto and Cartolab were kind enough to sponsor this pairing session. One of the results, apart from working with Fran -which is always motivating and enjoyable, per se-, was a deeper understanding on how snappers work in gvSIG, which is something I had asked myself sometimes. And, as one of the improvements of opencadtools is a followgeometry snapper, it seems a good goal to review that part of the project. Find below the summary:

CADToolAdapter class in extCAD extension maintains a list of snappers and layers to snap to from the editing layer. When the mouse is moved, the snappers are recalculated following this algorithm (note that the code below is the core of the method, some other parts/casts and boilerplate code is missing):

ArrayList snappers = SnapConfigPage.getActivesSnappers();
ILayerEdited layerInEdition =
    CADExtension.getEditionManager().getActiveLayerEdited();
ArrayList layersToSnap = layerInEdition.getLayersToSnap();

for (FLyrVect layer : layersToSnap) {

    // Getting the set of geometries within the envelope
    // The envelope is calculated based on the tolerance the user wants
    SpatialCache cache = layer.getSpatialCache();
    List geometries = cache.query(envelope);

    // Updating the nearest point
    for (Feature geomToSnap : geometries){
        for (int i=0; i distance){
                minimunDistance = distance;
        }
    }
}

This algorithm is executed every time the user move the mouse and is very quick if you have few layers to snap to. But, as the number of layer to check increases, the editing process becomes very slow. Besides, as a comment of software design, after reviewing this part of code, I like the way the snappers fit in gvsig cad tools. If you want to add a new snapper, just need to implement ISnapperVectorial interface and make getSnapToPoint method to return the nearest point to the position of the mouse. So, designing your own snappers is very easy!

By the way, if you feel like replying how other GIS applications (QGIS, uDig, …) manage the snappers, I’d be more than happy to hear and learn that!

Categories
All English Radar
Two stories on developing software applications. In both cases the client is the public administration, but the way every one was managed and build was quite different:
  • Lean from the trenches, de Henrik Kniberg. Tell the history of how the PUST was built: PUST is an acronym of “Polisens mobila Utrednings Stöd”, a national-wide application for Swedish policemen.
  • Who killed the virtual case file, a very instructive showcase on the famous failure of VCF, an application to manage cases developed by the FBI.