Archivo de la etiqueta: QGIS

Analysis of free software communities: coda

As you can see in my last posts (I, II, III, IV and V), I finally managed to translate the paper we released last year in V jornadas de SIG Libre (please, beg my english!). It took me a year and my wisdom teeth removed to find the time.

Our intention (Fran and me) when this paper first poped out from our heads was to foster debate on the best practices around a free software project. While at CartoLab, we presented the idea to Alberto; he encouraged us to work on it and gave the time and resources needed; also in the later stages he contributed to polish the trends and conclusions. I’m deeply grateful for all his patience and empathy.

I’m very proud of the work we have done: the first study of this kind in the GIS arena, and somehow a picture of 10 years of FOSS4G software development (for the desktop side). I hope the study is worth the effort and it continues to create debates on how to better work together.

Analysis of free software communities (V): generational analysis

Disclaimer - this post is part of a serie: IIIIIIIV and V (this one).

  • Images: on the left, contributions of top 3 developers along the project history; on the right, evolution of developers participating during 2010.
  • Datatrunk from project repositories during the period 1999-2010.

Is it something we could extrapolate from the data there?

This indicator gives us some sense on how the leadership changed and how the knowledge transfer was done in every project. The paper elaborates a bit more the points of turnover and integration of new blood in the project (highly correlated with this indicator) with statistics of top 10 developers.

All that will give us some insights on every project:

GRASS

  • The charts and data depict how a new generation took over the leadership from 2005 onwards. The process seems to be happened in a very organic way -in the sense that people grew its skills at a steady pace for a long time- and also deep to the roots: from the top10 only 4 out of 10 people continue collaborating with the project.
  • The data also shows how the top3 represent half of the work in the project, which suggest that several developers are highly involved with no one having too much influence (actually, the top contributor during 2010 means 40% of work).

gvSIG

  • The charts and data depict a highly distributed team with a high rate of turnover. Top3 is responsible for less than half of the contributions, being top10 around 60%. The change of leadership happened very quickly around 2007 and only 2 out of 10 contributors from top 10 kept working in 2010.
  • Besides, the top10 shows a homogeneous involvement in terms of number of contributions, which may reflect that all of them had a similar role and impact in the development of gvSIG.

QGIS

  • The charts and data depict a project dependent of its top3 with a contributions-friendly culture. Top3 activity means a hight rate of contributions over total but seems they have integrated well new blood as 9 out of 10 most active developers working in QGIS have started in different years and continue involved.
  • Top10 people have different ratios of involvement, ranging from 6% to 50%, which may reflect the heterogeneity of its core developer base (from volunteers to full-time developers).

Analysis of free software communities (IV): community workhours

Disclaimer - this post is part of a serie: IIIIII, IV (this one) and V.

  • Images: on the left, number of changes to the codebase (commits) agregated by hour of day. On the right, number of commits grouped by day.
  • Datatrunk from project repositories during the period 1999-2010.

Is it something we could extrapolate from the data there?

This indicator is intended to give us some information on the patterns of behavior of contributors. Specifically, we can track how is a typical week for the core developers in every project: the timeline shows when the integration happened, don’t reflect the time in which the work was done; so it’s telling us the history of people with commit permissions, what we know as the leaders.

Let’s try to extract some information from there:

GRASS

  • Internationalization: the hourly chart represents a gauss bell centered on 15h GMT, which in most European countries would be after lunch, being morning in the Americas. That could reflect that both continents represent the vast majority of core commiters. Nevertheless, the work is relatively well distributed along different hourly zones.
  • Volunteers: the daily chart shows a light drop of work during the weekend, likely due to hired developers or people who likely make contributions mostly within their working hours. Nevertheless, there is still a high rate of contributions being integrated during weekend, which may be a sign of a well stablished volunteer base of core-developers.

gvSIG

  • Internationalization: almost all the integration happens in a journey from Monday to Friday, with a hourly range from 09:00 to 20:00 GMT. That is strongly correlated to the hours of opening of a typical shop in Spain and reflects the nature on how the application was built in that period: led by a public body which contracted development to Spanish firms.
  • Volunteers: seems that volunteer work in core was reaching to none, which reflects the original nature of the project in that period.

QGIS

  • Internationalization: the hourly chart is nearly to a plain rate of contributions, which is a strong sign of a highly distributed leadership along the world. It’s even difficult to suggest which zones would be the prominent in terms of developers.
  • Volunteers: the daily chart reflects a steady work along the week, with no signs of falling during the weekend, which may be related to a strong base of volunteers core commiters.

Analysis of free software communities (III): activity and manpower

Disclaimer - this post is part of a serie: III, III (this one), IV and V.

  • Images: on the left, the number of changes to the codebase (commits) agregated by year. On the right, the number of developers with at least 1 commit that year.
  • Data: trunk from project repositories during the period 1999-2010.

Is it something we could extrapolate from the data there?

Certainly, not the number of features developed or bug fixes. It is even barely possible to compare activity between projects, as there are a high variability in terms of changesets: some people could send several little changesets and others just 1 big change, some project could have a special policy which affect the results (i.e.: make a commit formatting the code accoring to the style rules and other with the changes), etc. Some people could even argue that the language they are written in affects the number of changes (GRASS is written in C, gvSIG in Java and QGIS in C++) due to the libraries available or the semantics of every language. So, is it possible to find out something? Well, in my opinion, we can trace at least the following:

  • the internal evolution of a project.
  • how a project is doing in terms of adding new blood.

 So, let’s make again the exercise of finding out what’s happening here:

GRASS

  • It calls the atention the curve of activity in the project: growth by periods (2001-2004 and 2005-2007) with local maximums in 2004 and 2007. Our hypothesis was that it was due to the way the project works: the developers here make changes both in the trunk and in the branch of the product to release (be it 6.4 or 6.5) at the same time, with a lot of changesets moved between both the trunk and the branches (so doing heavy backporting). In a recently conversation with Markus Neteler, he has explained me better how they work and I guess the rhythm we see in the graphics is due to that.
  • In terms of number of developers, GRASS has showed a continuous growth until 2008; since then, the number of regular developers stabilizes.

gvSIG

  • gvSIG shows an incredible high period of activity during 2006-2008 (4500 changesets by year and most that 30 people involved!). To understand the Gauss bell of activity, is needed to know the background of the project: gvSIG development has been led by contract, which means that all activities (planning, development, testing, etc) were led by the client needs who pay for it. Only recently, these processes have been opened to a broader community (firms and volunteers collaborating in the project within the gvSIG association). So, it makes sense that the beginnings had seen less activity (high phases of planing) and afterwards they got to agregate so many people in such a short period of time.
  • But, in 2010 it suffered a sudden stop in development (only 233 changes to the codebase were made, while a pace of 4500 changes were made during previous years). This decreasing in activity is highly correlated to the number of developers involved. It’s hard to say why it happens: could it be due to the efforts were directed to gvSIG 2.0 development? could it be due to the reorganization in the project and the creation of gvSIG asociation? Well, few can we said at this respect with the data available, further research is required to determine that.

QGIS

  • Steady grow both in terms of contributions and contributors. 2004 and 2008 years determine two peaks of activity and people participating in the development. Our preliminar hypothesys was that it was due to the release of the first stable version and the release of 1.0, as well as become an oficial project of OSGEO. Gary Sherman has confirmed that in a recent post (history of QGIS commiters) and an interview (part1 and part2). Besides, he pointed out that in 2007 the project added python support for plugin development, which possibly was one of the reasons of the growth in 2008 and afterwards.
  • An interesting finding is that, every 4 years the project has doubled the amount of developers involved with a slower but steady growth in activity.
Well, hope these graphics have helped us to understand better how is the project activity and the manpower every project is able to aggregate around it. Next posts in the serie, will focus on the developers involved and the culture surrounding them. Looking forward to your feedback!

Analysis of free software communities (II): adoption trends

Disclaimer – this post is part of a serie: I, II (this one), IIIIV and V.

Find below the statistics for mailinglist activity in GRASS, gvSIG and QGIS during the period 2008-2010. The first one shows data from the general user mailinglists for each project. Take into account that data for gvSIG agregated both international and spanish mailinglist due the reasons stated here.

The next one shows the same data (number of people writing and number of messages by month) for the developers mailinglists.

Is it something we could extrapolate from the data there?

Well, certainly not the user base. The data shyly introduce us the trends, not the real user base. The model we adopted to study the projects reflects just a part of the community -which is arguably the engine of project- but don’t take the data as the number of users for each project. For sure, each one of our favorite projects has more users than those participating in (these) mailinglists!

Anyway, here some food for thought:

  • GRASS: it smoothly decreases in terms of number of messages as well as people writing, which happen within users and developers. The tendency is not clear though.
  • gvSIG: the data shows a steadly increasing number of users participating in the mailinglists. On the other hand, although it is the project with more people suscribed to developer mailinglist, it shows the less activity of the three projects (in terms of # of messages in developer lists): few technical conversations seemed to happen through the mailinglists during that period.
  • QGIS: according to the data, a clear growth exists in the community. In the period in study (3 years) the number of users and developers participating in mailinglists has been doubled!
Few more can be said, hope the graphics are explicative enough! Looking forward to your feedback.

Analysis on free software communities (I): a quantitative study on GRASS, gvSIG and QGIS

Disclaimer - this post is part of a serie: I (this one), IIIIIIV and V.

When selecting an aplication, it’s very common to weight tecnological factors -what the aplication enable us to do?- and economic ones -how money do we need?. And yet, there is a third factor to take into account, the social aspects of the project: the community of users and developers who support it and make it be alive.

During a serie of posts begin with this, I’m going to show a quantitative analysis of communities from 3 reference projects in GIS arena: GRASSgvSIG y QGIS. We selected those, as they are viewed as the more mature projects in desktop GIS, they are under OSGEO Fundation umbrella and show some differences on the actors who bootstrapped and manage today.

What we have done?

During the more than 25 years of free software movement, it has delighted us with the high capacity for fostering creation and innovation a community-based model has. Along last years, that model proved its viability in other areas too: content creation (wikipedia), cartographic data creation (openstreetmaps)translating books, etc. Yet, few is known on “how to bootstrap and grow a community”. The only thing we can do is observing what others have done and learn from their experience.

In order to contribute to the understanding on how a community-based project works I’ve work with Francisco Puga and other people from Cartolab to put together some of the public information the projects generate and make some sense from that. The actors in a community interact with each other, and, when that happen through internet, a trail is left (messages to mailinglists have author information and date, code version systems log information about the authors too, …). Basing our work on this available and public information -and standing on the shoulder on giants -i.e: reviewing a lot of research works similar to what we like to build- we have developed a quantitative analysis on the communities supporting GRASS, gvSIG and QGIS.

How did we make it?

The first step was to evaluate and gather all the public information a project, for what we like to do it in automated way. But, as we had to compare the 3 projects, the data had to be homogeneous: at least exists in both 3 and be in a comparable format. Taking these constraints into account (and the limited time we had for this!) we have collected information from 2 different systems:

  • Code versions control systems: from every project, we cloned all information available in their repositories to a local git repo, in order to parse the log of changes. This allowed us to study all the history of projects, from the very begining to December 2010.
  • Mailinglists: by means of mailingliststats tool -built mainly by our friend Israel Herráizthanks bro!- we gather data from March 2008 to December 2010.

Some disclaimers:

  • Projects have a number of branches, plugins and so. We focused the study on the main product, what an user get when she downloads it. Further study on the plugins ecosystem is needed, and it will give us more fine-tuning information.
  • Projects have a number of mailinglists more than we have studied (translators, steering committee, other local/regional mailinglists, etc), varying on each case. The analysis was focused on developers and users ones due to we think they are representative enough to mark the trend. We are not interested in giving an exact number (which may be impossible to measure!) but in drawing the long-term fluctuation of participation. Our intuition and past experiences, says that those mailinglists will follow a correlation of participation with the larger community surrounding the projects.
  • In the particular case of gvSIG users mailinglists, we have studied spanish and english mailinglist jointly. It makes sense doing so as the spanish mailinglist still have the core of contributions from hispanoamerican countries and non-spanish people interacts through international mailinglist. It is like the project have two hearts.
  • Unfortunately, quality of data have limited the period in study: the range is from March 2008 to December 2010. Prior to that, not all projects have information due to mailinglist migrations.

What is it useful for?

It’s possible to analyze a community from a variety of points of view. Our approach is a quantitative focus by means of a common model which agregate users depending on their level of participation:

  • Leaders: those who build the product and make the decisions.
  • Power users: those who adapt it to their needs and using it intensively.
  • Casual users: those who using it for a concrete task.

This approach allow us to better understand the size of the community and how they interact, as it’s not the same the value provided by someone who in 6 months only sent 1 mail to a mailinglist than other person who spent that time sending more than 100 patches to the code.


With these constraints, we managed to built the following indicators:

  • Adoption trend within users and developers: based on mailinglists data.
  • Activity and manpower: based on code contributions (commits).
  • Composition of the community: based on code contributions (commits).
  • Generational analysis: based on code contributions (commits).

During next weeks, I will be publishing the results of the study, in order to help us to understand how different free software communities work, and what we can learn from that. Stay tunned!

Coda

The results shown here are borrowed from a paper I led jointly with Francisco Puga, Alberto Varela and Adrián Eirís from Cartolab, a GIS university research laboratory based on A Coruña. The results were shown on the V Jornadas de SIG Libre, Girona 2010. If you are fluent in spanish (reading or listening), you can benefit from these resources:

From those who can’t, I’ll summarize the main points through small posts on each topic’s paper. The original authors have not reviewed the text as published in my blog, so consider any opinion expressed here as my own (have them to review my texts is a boring and time-consuming task I’m sure they prefer to skip). Please, beg my english.

qgis 1.7: los debates de la comunidad

No os perdáis este post de Tim Sutton, release manager de QGIS que resumen de algún modo los debates en la comunidad en los últimos meses. A resaltar, 2 de los sospechosos habituales: el complejo balance estabilidad VS nuevas funcionalidades y el modelo de financiación del proyecto. Como post de acompañamiento, toca releer cómo y por qué KCube Consulting cedió 6 meses de un desarrollador a la comunidad QGIS.

Análisis de comunidades de Software Libre (I): resultados de un estudio sobre GRASS, gvSIG y QGIS

A la hora de seleccionar una aplicación se valoran habitualmente factores tecnológicos -qué nos permite hacer la aplicación- y económicos -cuánto nos cuesta lo que necesitamos. Y se nos olvida un tercer factor muy a tener en cuenta: los aspectos sociales del proyecto, la comunidad de usuarios y desarrolladores que lo mantienen vivo.

A lo largo de una serie de post que inicio hoy voy a presentar un análisis de las comunidades de 3 proyectos de referencia en el mundo SIG: GRASS, gvSIG y QGIS. Durante el proceso de selección nos hemos quedado con estos 3 porque consideramos que son los más importantes y maduros SIG de escritorio, están además bajo el paraguas de la Fundación OSGEO y presentan diferencias en los actores que los gestionan.

¿Qué hemos hecho?

En los más de 25 años que tiene el movimiento del software libre, se ha demostrado la gran capacidad de creación que tiene un modelo centrado en la comunidad. Un modelo que, además, ha mostrado su viabilidad expandiéndose a otras áreas: creación de contenidos (wikipedia), creación de datos cartográficos (openstreetmaps), traducción de libros, etc. Pero si bien conocemos su potencia, poco sabemos sobre “cómo crear y gestionar una comunidad“. Lo único que podemos hacer es observar qué han hecho los demás y cómo les ha ido. Probar. Tratar de extrapolar heurísticos de la experiencia de otros.

Para contribruir al entendimiento de cómo funcionan las comunidades de software libre -Francisco Puga, otra gente del Cartolab y yo- hemos realizado un análisis de las comunidades en base a la información pública que generan. Los actores de una comunidad interactúan entre sí, y, cuando eso ocurre a través de internet, las interacciones dejan rastro:

  • Listas de correo: los mensajes contienen la fecha, el autor, etc.
  • Wiki: es posible obtener información sobre el autor, la fecha de creación, el número de ediciones de una página, etc.
  • Sistemas de control de errores: información sobre quién y cuándo se reportó, si está resuelto o no, etc.
  • Sistemas de control del código: podemos obtener la actividad sobre la aplicación basándonos en el número de cambios (commits), conocer quién los hizo, la fecha, etc.

Con la base de esta información pública disponible, lo que hemos hecho ha sido un estudio cuantitativo sobre las comunidades que rodean y sostienen a estos proyectos.

¿Cómo lo hemos hecho?

Gracias a la disponibilidad de ciertas herramientas que nos facilitaron el proceso de obtención de información, además de tener en cuenta la calidad de los datos para poder hacer comparativas entre los proyectos, lo que finalmente logramos hacer fue lo siguiente:

  • Sistemas de control de código: hemos volcado toda la información disponible al sistema de control de versiones git para luego parsear su histórico. Esto nos ha permitido estudiar toda la historia de desarrollo de los proyectos hasta diciembre del 2010. Datos para grassgvsigqgis
  • Listas de correo: hemos usado para ellos la herramienta mailingliststats -que construyó principalmente Israel Herráiz, thanks bro!- con datos desde marzo de 2008 hasta diciembre de 2010, en base a:

Algunas aclaraciones sobre el estudio de las listas de correo:

  • Los 3 proyectos tienen muchas más listas para diversos aspectos (traducciones, dirección del proyecto, listas locales, etc). Nos hemos centrado en éstas porque creemos que son suficientes para marcar la tendencia, que realmente es lo que nos interesa; no los números gordos que serían engañosos.
  • En el caso de las listas de usuarios, para gvsig hemos estudiado además de la lista internacional, también la española. Ésta última es donde nació el proyecto y muestra todavía la actividad principal. No hacerlo introduciría sesgos.
  • Por desgracia, la calidad de los datos nos ha limitado el período de estudio: hemos conseguido analizar desde Marzo de 2008 hasta diciembre del 2010.

¿Para qué nos vale?

El estudio de una comunidad tiene diferentes enfoques. El nuestro se basa en el modelo que divide a la comunidad en 3 niveles de participación e implicación:

  • Leaders: aquellos que construyen el producto.
  • Power users: aquellos que lo adaptan a sus necesidades y lo usan intensivamente.
  • Casual users: aquellos que lo usan para una tarea concreta.

Esta aproximación facilita la comprensión de cómo funciona realmente la comunidad, ya que no es lo mismo la aportación de una persona a través de un único mensaje en una lista de correo a la de alguien que se ha pasado 6 meses creando la aplicación. Nos aporta además, información sobre la adopción de las herramientas así como patrones de participación y actividad entre los distintos actores.


Con este enfoque y metodología hemos conseguido realizar los siguientes indicadores:

  • Tendencias de adopción entre usuarios: basado en las listas de correo.
  • Tendencias de adopción entre desarrolladores: basado en las listas de correo.
  • Actividad y fuerza de trabajo: basado en contribuciones de código (commits).
  • Análisis de composición de la comunidad: basado en contribuciones de código.
  • Análisis generacional: basado en contribuciones de código.

En las siguiente semanas iremos publicando los resultados del estudio, de cara a comprender mejor cómo funciona una comunidad de software libre. Stay tunned!

Participación empresarial en software libre: el curioso caso de QGIS y KCube Consulting

QGIS es una de las aplicaciones de software libre más populares en el mundo de los Sistemas de Información Geográfica. Como buen proyecto de software libre gente de diversa índole participa en él llevada por diferentes motivaciones. La historia que me gustaría contar hoy es la de la empresa KCube Consulting, que ha ofrecido a la comunidad QGIS un desarrollador a tiempo completo durante 6 meses.

Cesión de tiempo: directa e indirecta

Hoy en día, es más bien habitual que las empresas participen en proyectos de software libre. Si bien, en general, lo que nos encontramos es que una empresa participa en el proyecto centrándose en aquellas partes que más le interesan. Por ejemplo, si eres una empresa que vende tarjetas de video es probable que participes en el desarrollo del kernel de linux dando soporte a tu tarjeta gráfica, de cara a que los usuarios puedan usarla en plataformas Linux. Ese tipo de “cesión de tiempo indirecta” es ya muy común.

La oferta de KCube Consulting, que podríamos llamar “cesión de tiempo directa“, es también una ruta ya explorada en el mundo del software libre, aunque menos habitual que la anterior. Sin embargo, como bien resume Tim Sutton en el post de presentación de la oferta, esta aproximación tiene grandes ventajas:

  • A nivel técnico: la organización va a colaborar codo con codo con los core developers del proyecto. Además de forjar relaciones sociales con la comunidad, desarrollará un conocimiento profundo a nivel técnico de cómo funciona QGIS. Esto, a medio plazo, le permitirá ofrecer a sus clientes mejores servicios sobre QGIS.
  • A nivel marca: la empresa KCube Consulting se asocia al producto QGIS. Cualquiera que visite el blog y vaya en busca de empresas que provean servicios para QGIS verá a KCube Consulting como una buena opción. A falta de valorar otros aspectos, KCube tiene ya una buena primera impresión. A nivel márketing y comunicación pocas acciones podrían tener tan buen retorno.

El proceso para gestionar su participación

Una vez planteada la oferta, toca pensar: ¿y cómo gestionarla? ¿Tiene la comunidad QGIS recursos para dar soporte a esta petición? ¿Tiene una dirección técnica que permita priorizar las tareas y sacarle rendimiento? Veamos cómo lo ha hecho.

  1. El primer paso ha sido que, Tim Sutton, uno de los principales mantenedores del proyecto, posteó en su blog la oferta de KCube Consulting abriendo el proceso a la comunidad.
  2. A través de una página wiki del propio proyecto, cualquier persona participante del proyecto podía enviar propuestas sobre el trabajo a realizar por el desarrollador de KCube durante los 6 meses.
  3. Con esas ideas y una encuesta posterior en la página del proyecto se están sacando las líneas generales sobre las que trabajar estos 6 meses.

Este es un proceso que aún no ha terminado y en 6 meses veremos los resultados. Pero lo que está claro es que abrir el proceso de toma de deciciones a la comunidad que participa en el proyecto, es muestra de una madurez grande. Esta anécdota habla de que la dirección técnica es compartida y las decisiones pueden (y son) realizadas de un modo abierto y transparente, conjuntamente con la comunidad. Son, además, realizadas a través de los medios virtuales de los que se dota el proyecto, para garantizar que cualquiera pueda participar.

Conclusiones

  • La cesión de tiempo directa a un proyecto de software libre no sólo favorece la formación de técnicos en la empresa. Si no que, a primera vista, la posiciona ante los ojos del mercado como uno de los proveedores a valorar.
  • En proyectos de software libre, tener una dirección técnica del proyecto abierta a la comunidad no sólo es posible, sino que enriquece la toma de decisiones y lanza un mensaje claro a todo a el mundo: esto se contruye entre todos.