(Geo) Database evolution while developing

During last year, I followed with interest the different approaches on how to evolve the design of a database being discussed within the postgresql community. Following is my take on that one: how this year I developed a project with an intense evolving DB design using an agile approach.

The context

My requisites for this project were twofold:

  • An evolving DB design: at the beginning of the project I didn’t know how the DB design was to going to be. I had set to use some advanced techniques for data modeling which never had used in production (dynamic segmentation and linear referencing with PostgreSQL/PostGIS) and needed an approach which supported my evolving understanding of the domain.
  • Intense collaboration with analists: the project needed some intense work on data-processing to polish and create the data for the application. I knew this was to be an iterative process where both developers and analists would collaborate together to define and clarify the model we needed.

My approach

So, in the process of improving and automating my delivery pipeline, I set some rules for the project:

  • DB management through SQL and control versioning: the database was created from DDL scripts and data was stored as CSV (if alphanumeric) or SQL (generated from Shapefiles to store geographical information).
  • Application and database evolve together: so their code should too, which in practice means I put the app and DB directories/projects under the same git repo.
  • Test driven development: I needed to break the problem in small chunks I could deal with, while my understanding of the domain improved. Besides, when refactoring the DB (schemas, triggers, functions, etc) -which happened frequently- I needed to know all the pieces were working OK. I decided to use pgTap for that.

And how it turned out?

  • The pipeline worked smoothly: both the analists and developers were working in their confort zone with the proper tools; desktop GIS applications the formers, command-line and SQL the laters.
  • git provides an excelent mechanism for versioning text, so I had powerful tools at hand for versioning SQL structure and data (diff, cherry-pick, interative rebases, etc). Besides, see where the data was varying (name and type of fields, its values, etc) allowed us to early discovered some bugs and problems.
  • Database and application evolving to the same pace. By tagging the versions we can build in seconds the binaries needed for any version of the application with the proper DB.
  • Tests at DB level are a life-saver. pgTap allowed me to refactor the database whith no risk and a lot of confidence on what I was doing. I had all kind of tests: check if a trigger is launch if an UPDATE happens, a function is working, data integrity and model validation after the initial restore, etc.
  • Same process for deplying to developing, staging and production environments, which resulted in fewer errors and no panic-moments.
  • Having the data in the repo and regenerating BD from scracth was very comfy and quick (less than a minute in my laptop the whole DB: 100Mb of raw SQL) and similar numbers when deploying to stage through the wire. In a daily bases I only had to regenerate specific schemas of the DB, so waitings was an order of seconds.


We should consider the database as other deliverable to our clients and set the same quality, standards and methodology to develop it. In that sense, agile philosophy and practices match very well with the DB evolution.

At the tools level, I was reluctant to introduce new tools and steps I didn’t know very well in such a tight schedule, so I decided to stick to the basic and spartan (git repo, shell scripts, pgTap and SQL), then iterate and grow a solution for our specific case. Although I missed some refactoring tools, it turned out to be a good approach and now I´m in good position to know the tradeoffs of the process, which in next projects will help me to choose a specialized tool, if necessary.

Designing for growth

«Code should grow by addition rather than mutation.»

The best example of that axiom I ever found is the one in Beck’s Implementation patterns. What goes next is almost an exact reproduction of the book. After reading this post, if you liked, I’d strongly recommend you to buy a copy.

Imagine a graphic editor where one of the main metaphors is the Figure. It has methods like display(). So, if you want to support several figures in the editor, you can start by writing just a simple conditional:

public void display() {
    switch (getType()) {
        case RECTANGLE :
        case OVAL :
        case TEXT :
        default :

This naïve example, shows early its cons: if you want to add a new kind of figure, you will need to modify the GraphicEditor class adding a new case statement in every switch you have written along the class. Not only it is tedious and error-prone, but it also requires modify a working functionality, putting it at risk. Besides, you should coordinate the changes with other developers using that class. At this point, likely you have thought on better ways to manage change: make the code growth by addition rather than mutation. How is that? Well, two of the tools you have available are inheritance (using classes) and delegation (using interfaces):

If you should choose one of another is a matter of context. It depends. For example, if you would want to rewrite the display method using delegation you would write:

public void display() {

while let the rectangle and other figures to implement the logic of displaying themselves.

This change will allow us to add new kind of figures dinamically. We will only need to implement the Figure interface and use the mechanisms of GraphicEditor to select our new figure. No needs to change the existent code neither coordination costs or try to understand the whole picture before making any change. Just need to focus on implementing our tool. That’s the power of designing for growth: managing complexity.

wiki: entradas sobre desarrollo de software

Durante el fin de semana, a ratos libres, he ido completando el wiki con una entrada sobre los principios del desarrollo de software que todavía tienen que evolucionar. Aunque la programación es una tensión continua entre varios heurísticos, tener en mente lo anterior y seguir unas técnicas de trabajo adecuadas me ayuda a tomar decisiones.