Turns out algorithms are racists

«Technology is neither good nor bad; nor is it neutral.»

Melvin Kranzberg’s six laws of technology

One of the things I was very into a decade ago was studying the intertwine between technology, culture, and society. From those years, I developed a sensitivity about my role as an engineer, or as an enabler of possible worlds.

This is one of the things I wanted to avoid:

A person isn’t able to clean his hands because the machine sensors are only prepared to detect white hands! That’s a horror story that could make a BlackMirror episode.

This made me think about the mainstream perception of Machine Learning and Artificial Intelligence technology. Lately, some friends of mine are sharing with me clickbait news like Facebook shuts down robots after they invent their own language. They ask me if robots could take over, soon. Well, I can tell you something: at this stage of technology, I am not worried about robots taking over. What I do worry about is how our inability to understand technology creates racists algorithms that reinforce our biases.

Psychological safety

(…) the number-one indicator of a successful team wasn’t tenure, seniority or salary levels, but psychological safety.

Think of a team you work with closely. How strongly do you agree with these five statements?

  1. If I take a chance, and screw up, it will be held against me
  2. Our team has a strong sense of culture that can be hard for new people to join.
  3. My team is slow to offer help to people who are struggling.
  4. Using my unique skills and talents come second to the objectives of the team.
  5. It’s uncomfortable to have open honest conversations about our team’s sensitive issues.

Teams that score high on questions like these can be deemed to be “unsafe”. Unsafe to innovate, unsafe to resolve conflict, unsafe to admit they need help.

— Engineering a culture of psychological safety

Code and decision trees

An example on how changing the language for thinking may help us to simplify our programs.

A lot of what programs do is transforming inputs into outputs. Take, for example, a piece of JavaScript code like this:

itemsToMarkup( items, viewType, galleryType ) {

    let markup;

    switch ( viewType ) {
        case 'gallery':
            if ( 'individual' === galleryType ) ) {
                markup = getHTML( items );
            } else {
                markup = getShortcode( items );

        case 'image':
            markup = getHTML( items );

    return markup;

What’s the purpose of this code? It takes some input data structures and outputs a markup, either proper HTML or a code to be processed by later stages of the pipeline. At the core, what we are doing is taking a decision based on the input’s state so it can be modeled as a decision tree:


By restating the problem in a more simple language, the structure is made more evident. We are free of the biases that code as a language for thinking introduces (code size, good-looking indenting, a certain preference to use switch or if statements, etc). In this case, conflating the two checks into one reduces the tree depth and the number of leaves:


Which back to code could be something like:

itemsToMarkup( items, viewType, galleryType ) {

    let markup;
    const isGalleryButNotIndividual = ( view, gallery ) => view === 'gallery' && gallery !== 'individual';

    if( isGalleryButNotIndividual( viewType, galleryType ) ) {
        markup = getShortcode( items );
    } else {
        markup = getHTML( items );

    return markup;


By having a simpler decision tree, the second piece of code makes the input/output mapping more explicit and concise.

A new middle class?

In this paper, we make the case that the high-productivity digital firms are starting to generate a new middle class. It’s a virtuous circle. Consumers flock to those firms because they offer lower prices and better service. Workers migrate there from low-productivity firms because the high-productivity firms offer better wages for the same occupations—and, often, steadier hours and better benefits.

— The Creation of a New Middle Class?: A Historical and Analytic Perspective on Job and Wage Growth in the Digital Sector. [PDF]

The Google repository

I’ve been reading how Google organizes its codebase: they maintain a hyper-large repository containing everything, since the beginning of the company. I guess you may find Gmail, Photos, or AdWords there. You won’t find Android or Chrome, though – these are open source projects.

The repository is 86Tb of data, 1 billion of files, and 35 billion of commits. To manage this complexity, they needed to build their own tools: a home-grown Version Control System that can work effectively with such a repository at this scale, editor integration, building and automated testing tools, etc.

They develop all the code against trunk/master, meaning that if you are updating a library, you’ll also need to fix all applications that depend on it. Every project will be up-to-date, even abandoned projects.

The advantages

The main reasons they claim this approach works for them are: it makes easier reusing blocks of knowledge company-wide and reduces the friction to contribute between projects/teams. UI primitives, building tools, etc, all are shared by any project that wants them, it’s just a matter of depending on the master version. It minimizes the costs of versioning/integration and the curse of being left behind when something is updated and you cannot keep up with the changes (the experts will do it for you!).

As a side effect, when working on libraries/frameworks it’s easier to understand the performance/impact/etc of a specific change (you can run tests on real projects) and to put together a task-force to fix issues affecting several applications.

The disadvantages

This approach comes with downsides as well: they mention the amount of maintenance this setup requires even with all the tooling they have already built. With a monolithic repo, it’s easy to run into unnecessary dependencies that bloat the binary size of a project (and they do), the costs inherent to updating basic blocks used through the whole company, etc.

Another point is that it makes difficult having external contributors. Although they have a space in the repository for public/open-sourced projects, the article is unclear on how they manage 3rd-party contributions there – external programmers don’t have access to the internal building tools that Google programmers have. High-profile products like Android or Chrome -where outside contributors are expected and encouraged- have walked away from this approach.


I highly recommend reading the paper, it’s a pretty unique approach, and the article does a good job on presenting a balanced perspective.

How comparing things is faster and simpler with immutability

The third post of the series about the differences between values and references is focused on a practical example, the same trick that is at the core of React and Redux performance.

In the previous post of the series, I wrote about the nature of value and reference data types, and the differences between shallow and deep operations. In particular, the fact that we need to rely on deep operations to compare things is a major source of complexity in our codebases. But we can do better.

Comparing mutable structures

When working with mutable data structures, things like determining whether an object has been changed or not is not so simple:

var film = {
    'title': 'Piratees of the Caribean', 
    'released': 2003

// At some point, we receive an object and one of its properties
// might have changed. But how do we know?
newFilm = doSomething( film );

film === newFilm; // What does a shallow equality yield?

If we are allowed to mutate objects, although film and newFilm identifiers are equal, the payload might have been updated: a shallow equality check won’t suffice, we’ll need to perform a deep equality operation with the original object to know.

Comparing immutable structures

In JavaScript, primitives (numbers, strings, …) are immutable, and reference data types (object, arrays, …) are not. But if mutable structures are the reason why comparing things is difficult, what would happen if we worked with reference data types as if they were immutable?

Let’s see how this would work:

If something changes, instead of mutating the original object, we’ll create a new one with the adequate properties. As the new and the old object will have different identifiers, a shallow equality check will set them apart.

var film = {
    'title': 'Piratees of the Caribean', 
    'released': 2003

var doSomeThing = function( film ) { 
    // ... 
    return Object.assign( 
        {'title': 'The curse of the Black Pearl'} 

var newFilm = doSomething( film ); 

film === newFilm; // false

If nothing changes, we’ll return the same object. Because the identifier is the same, the shallow equality check will yield true.

var film = {
    'title': 'Piratees of the Caribean', 
    'released': 2003

var doSomeThing = function( film ) { 
    // ... 
    return film; 

var newFilm = doSomething( film ); 

film === newFilm; // true

It is easier to tell what have changed when reference data types are immutable because we can leverage the shallow equality operations.

As a side-effect, it takes less effort to build a whole lot of systems that depend on calculating differences: undo/redo operations, memoization and cache invalidation, state machines, frameworks to build interfaces with the immediate mode paradigm, etc.


One of the reasons I started this series of posts was to explain how using immutable reference data types was one of the tricks at the core of Redux and React. Their success is teaching us a valuable lesson: immutability and pure functions are the core ideas of the current cycle of building applications – being the separation between API and interface the dominant idea of the previous cycle.

I have already mentioned this some time ago, but, at the time, I wasn’t fully aware of how quick these ideas will spread to other areas of the industry or how that will force us to gain a deeper understanding of language fundamentals.

I’m glad they did because I believe that investing in core concepts is what really matters to stay relevant and make smart decisions in the long term.

How equality and copy operations work

This is the second post of a series about how fundamental operations work depending on the nature of data they work with. JavaScript is used as example.

In the introductory post of this series we talked about the differences between value and reference data types:

  • Value data types store their payload as the contents of the variable.
  • Reference data types store an identifier as the contents of the variable, and that identifier is a reference to the actual payload in an external structure.

Through this post will see how the equality and copy operations use the content of the variable, meaning that they’ll use the payload for data types and the identifier for reference types.

Working with value data types

Let’s say we have the following value variables:

In plain JavaScript, this would be:

var foo = 42;
var bar = 42;
foo === bar; // this yields true

If we were copying variables instead:

var foo = 42;
var bar = foo;
foo === bar; // true

foo = 23;
foo === bar; // false

As the content of the variables is the mere payload, the operations are straightforward.

Working with reference data types

Let’s say now that we are working with reference data type variables:

In JavaScript, this would translate as:

var x = {'42': 'is the answer to the ultimate question'};
var y = {'42': 'is the answer to the ultimate question'};
x === y; // This yields false.

When we create new reference data type variables, they are going to have a brand new identifier, no matter whether the payload is actually the same than other existing variable. Because the language interpreter is comparing identifiers, and they are different, the equality check yields false.

What if we were copying variables instead:

var x = {'42': 'is the answer to the ultimate question'};
var y = x; // Copies x identifier to y.
x === y; // This yields true.

It is important to realize why these are equal: because their identifiers are equal, meaning that both variables are indexing the same payload.

With that in mind, what would happen on modifying the payload?

x['42'] = 'the meaning of life'; // Changes the payload.

x === y; // Still true, the identifiers haven't changed.
console.log(y['42']); // Yields 'the meaning of life'.


var x = {'42': 'is the answer to the ultimate question'};
var y = x; // Copies x identifier
x === y; // We already know this is true.

x = {'42': 'the meaning of life'}; // New identifier and payload.

x === y; // This would yield false.
console.log(x['42']); // 'the meaning of life'
console.log(y['42']); // 'is the answer to the ultimate question'

The reason is that x = {'42': 'the meaning of life'} assigns a new identifier to x, that references a different payload – so we’ll be back to the first scenario shown in this block.

(A short aside: in the introduction, I mentioned that references and pointers were different. The above case is a good example of how they’re different: if y was a pointer, it would index the contents of x, so both variables would remain equals after x contents change.)

In computer science, the operations that work with the contents of the variable (be it values or reference identifiers) are called shallow operations, meaning that they don’t go the extra step to find and work with the actual payload. On the other hand, deep operations do the extra lookup and work with the actual payload. Languages usually have shallow/deep equality checks and shallow/deep copy operations.

JavaScript, in particular, doesn’t provide built-in mechanisms for deep equality checks or deep copy operations, these are things that either we build ourselves or use an external library.

An example with nested reference data types

A JavaScript idiom to create new objects by reusing parts of existing ones is using the method Object.assign(target, …sources):

var x = {'42': 'meaning of life'};
var y = Object.assign({}, x);
x === y; // Yields false, identifiers are different.
x[42] === y[42]; // Yields true, we are comparing values.

Object.assign creates a shallow copy of every own property in the source objects into the target object. If the target has the same prop, it’ll be overwritten. In the example above, we’re assigning a new identifier to the variable y, whose own properties will be the ones present in the object x.

This works as expected for objects whose own properties are value data structures, such as string or number. If any property is a reference data structure, we need to remember that we’ll be working with the identifiers.

For example:

var book = {
    'title': 'The dispossesed',
    'genre': 'Science fiction',
    'author': {
        'name': 'Ursula K. Le Guin',
        'born': '1929-10-29'

// We are creating a newBook object:
// * the identifier would be new
// * the payload would be created by shallow copying 
//   every book's own property
var newBook = Object.assign({}, book);

newBook === book; // false, identifiers are different

// Compare value data types properties:
newBook['title'] === book['title']; // true
newBook['genre'] === book['genre']; // true

// Compare reference data types properties:
newBook['author'] === book['author']; // true

Both newBook and book objects have the same identifier for the property author, that references the same payload. Effectively, we have two different objects with some shared parts:

If we change some properties, but not the author identifier, both book and newBook will still see the same author payload:

book['title'] = 'Decisive moments in History';
book['genre'] = 'Historical fiction';
book['author']['name'] = 'Stefan Zweig';
book['author']['born'] = '1881-11-28';

newBook === book; // Yields false, identifiers are still different.

// Value variables have diverged.
newBook['title'] === book['title']; // false
newBook['genre'] === book['genre']; // false

// The author identifier hasn't changed, its payload did.
newBook['author'] === book['author']; // true 
newBook['author']['name'] === book['author']['name']; // true 
newBook['author']['born'] === book['author']['born']; // true

For both objects to be completely separate entities, we need to dereference the author identifier in some of them. For example:

book['title'] = 'Red Star';
book['genre'] = 'Science fiction';
book['author'] = { // this assigns a new identifier and payload
    'name': 'Alexander Bogdanov',
    'born': '1873-08-22'

newBook === book; // Yields false, identifiers are still different.

// Reference identifier for author changed,
// book.author and newBook.author are different objects now.
newBook['author'] === book['author']; // false


Humans have superpowers when it comes to pattern matching, so we are biased towards using that superpower whenever we can. That may be the reason why the reference abstraction is sometimes confusing and why the behavior of shallow operations might seem inconvenient. At the end, we just want to manipulate some payload, why would do be interested in working with identifiers?

The thing to remember is that programming is a space-time bound activity: we want to work with potentially big data structures in a quick way, and without running out of memory. Achieving that goal require trade-offs, and one that most languages do is having fixed memory structures (for the value data types and reference identifiers) and dynamic memory structures (for the reference payload). This is an oversimplification, but I believe it helps us to understand the role of these abstractions. Having fast equality checks is a side-effect of comparing fixed memory structures, and we can write more memory efficient programs because the copy operation works with identifiers instead of the actual payload.

Working with abstractions is both a burden and a bless, and we need to understand them and learn how to use them to write code that is simple. In the next post, we shall talk about one of the tricks that we have: immutable data structures.