Values and References in JavaScript

  1. Value and reference data types
    • Boxing and unboxing
  2. Identity and copy operations
  3. Use case: How comparing things is faster and simpler with immutability
  4. Editing values and references (to be published soon)
  5. Passing arguments to functions (to be published soon)

1. Value and reference data types

Let’s say we are using a hypothetical language and we define two variables, foo and bar, whose value is the same: 42. foo is a value data type while bar is a reference data type. How would they be different?

The difference is how these types behave when it comes to operate with them: check for identity, copying, editing its values, passing them to functions, etc.:

  • value data types operate with the value of the variable
  • reference data types operate with a reference to the actual value

An useful way to picture this would be:

Why is it like that? Essentially, for memory reasons. We’ll look at this deeper in later sections when we come to see how the copy operation and passing arguments to functions work.

Now, let’s talk JavaScript. The language defines the following data types: undefined, null, number, boolean, string, symbol and object — note that array is a particular kind of object. It also considers every data type but object a primitive. Unlike objects, primitives in JavaScript are immutable and don’t have properties or methods.

Although the JavaScript standard doesn’t explicitly mention anything about value data types VS reference data types, it implicitly does it by the way it defines the operations of the language. It is safe to say that:

  • primitive types behave like value data types
  • object type behaves like a reference data type — an any sub-types such as array

Boxing and unboxing

Languages with both value and reference data types, tend to provide ways to convert values into references, and vice-versa. This is called boxing and unboxing. It is common that each value has a reference counterpart, and languages tend to provide automatic boxing and unboxing in some situations.

Let’s talk JavaScript again. Notice how it has reference data types for the corresponding value data types:

TypePrimitive (value)Object (reference)
string primitive / String objectvar str = 'meaning of life';var str = new String( 'meaning of Life' );
number primitive / Number objectvar number = 42;var number = new Number( 42 );
boolean primitive / Boolean objectvar bool = true;var bool = new Boolean( true );

JavaScript primitives don’t have methods or extra properties like the reference objects have. Yet, they’ll be automatically boxed to the equivalent reference object when you’re trying to use one of its methods or properties. This is the reason why:

var foo = 'meaning of life'; // Primitive string.
foo.toUpperCase(); // Yields 'MEANING OF LIFE'.

Although foo is a primitive, we can use the object methods thanks to the auto-boxing. We could think of it as a type conversion in other languages: ((String) foo).toUpperCase(); Auto-boxing is also at the root of some confusing behavior in JavaScript:

var foo = 'meaning of life'; // Primitive string

foo.constructor === String; // Yields true.
// When we call a property or method belonging the object String,
// the foo variable will be automatically boxed,
// so it behaves like the String object.

foo instanceof String; // Yields false.
// In this case foo is in its natural state (unboxed),
// so we are comparing the primitive to the reference.

typeof foo; // Yields 'string'
// In this case, foo is in its natural state (unboxed),
// so we are asking the system what kind of variable it is.

2. Identity and copy operations

Equality and copy operations use the content of the variable in the stack-like structure that holds them all. Note that it means that these operations will use the value when working with value data types and the reference when working with reference data types.

Value data types

Let’s say we have the following value variables:

In plain JavaScript, this would be:

var foo = 42;
var bar = 42;
foo === bar; // true

If we were copying variables instead:

var foo = 42;
var bar = foo;
foo === bar; // true

foo = 23;
foo === bar; // false

As the content of the variables is the mere payload, the operations are straightforward.

Reference data types

Let’s say now that we are working with reference variables:

In JavaScript, this would translate as:

var x = { '42': 'is the answer to the ultimate question' };
var y = { '42': 'is the answer to the ultimate question' };
x === y; // Yields false.

When we create reference variables, they are assigned a new identifier, no matter whether the payload is actually the same as other existing variables. Now remember that the equality operation works with the content of the stack-like structure (identifiers). So, it yields false because it is comparing different identifiers, not the actual payload.

What if we were copying variables instead:

var x = { '42': 'is the answer to the ultimate question' };
var y = x; // Copies x identifier to y.
x === y; // Yields true because we're comparing identifiers.

So far, so good. What would happen when the payload is changed?

x['42'] = 'the meaning of life'; // Changes the payload.
x === y; // Still true, the identifiers haven't changed.
console.log( y['42'] ); // Yields 'the meaning of life'.

But then:

var x = { '42': 'is the answer to the ultimate question' };
var y = x; // Copies x identifier to y.
x === y; // Yields true.

x = {'42': 'the meaning of life'}; // New identifier (and payload).

x === y; // Yields false.
console.log( x['42'] ); // 'the meaning of life'
console.log( y['42'] ); // 'is the answer to the ultimate question'

(A short aside: in the introduction, I mentioned that references and pointers were different. The above case is a good example of how they’re different: if y was a pointer, it would index the contents of x, so both variables would remain equals after x contents change.)

These operations work with the contents of the variable and don’t go the extra step to find and work with the actual payload, so they’re called shallow operations. On the other hand, deep operations would do the extra lookup and would work with the actual payload. Languages usually have shallow/deep equality checks and shallow/deep copy operations. JavaScript, in particular, doesn’t provide built-in mechanisms for deep equality checks or deep copy operations, these are things left to application-land.

Nested reference data types

There’s a JavaScript idiom to create new objects by reusing parts of existing ones: Object.assign( target, …sources ).

var x = { '42': 'meaning of life' };
var y = Object.assign( {}, x );
x === y; // Yields false, identifiers are different.
x[42] === y[42]; // Yields true, we are comparing values.

Object.assign creates a shallow copy of every own property in the source objects into the target object. If the target has the same property, it’ll be overwritten. In the example above, we’re assigning a new identifier to the variable y, whose own properties will be the ones present in the object x.

This works as expected for objects whose own properties are value data structures, such as string or number. If any property is a reference data structure, we need to remember that we’ll be working with the identifiers.

For example:

var book = {
    'title': 'The dispossesed',
    'genre': 'Science fiction',
    'author': {
        'name': 'Ursula K. Le Guin',
        'born': '1929-10-29'
    }
};

// We are creating a newBook object:
// * the identifier would be new
// * the payload would be created by shallow copying 
//   every book's own property
//
var newBook = Object.assign({}, book);

newBook === book; // false, identifiers are different

// Compare value data types properties:
newBook['title'] === book['title']; // true
newBook['genre'] === book['genre']; // true

// Compare reference data types properties:
newBook['author'] === book['author']; // true

Both newBook and book objects have the same identifier for the property author, that references the same payload. Effectively, we have two different objects with some shared parts:

If we change some properties, but not the author identifier, both book and newBook will still see the same author payload:

book['title'] = 'Decisive moments in History';
book['genre'] = 'Historical fiction';
book['author']['name'] = 'Stefan Zweig';
book['author']['born'] = '1881-11-28';

newBook === book; // Yields false, identifiers are still different.

// Value variables have diverged.
newBook['title'] === book['title']; // false
newBook['genre'] === book['genre']; // false

// The author identifier hasn't changed, its payload did.
newBook['author'] === book['author']; // true 
newBook['author']['name'] === book['author']['name']; // true 
newBook['author']['born'] === book['author']['born']; // true

For both objects to be completely separate entities, we need to dereference the author identifier in some of them. For example:

book['title'] = 'Red Star';
book['genre'] = 'Science fiction';
book['author'] = { // this assigns a new identifier and payload
    'name': 'Alexander Bogdanov',
    'born': '1873-08-22'
};

newBook === book; // Yields false, identifiers are still different.

// Reference identifier for author changed,
// book.author and newBook.author are different objects now.
newBook['author'] === book['author']; // false

Coda

Humans have superpowers when it comes to pattern matching, so we are biased towards using that superpower whenever we can. That may be the reason why the reference abstraction is sometimes confusing and why the behavior of shallow operations might seem inconvenient. At the end, we just want to manipulate some payload, why would do be interested in working with identifiers?

The thing to remember is that programming is a space-time bound activity: we want to work with potentially big data structures in a quick way, and without running out of memory. Achieving that goal require trade-offs, and one that most languages do is having fixed memory structures (for the value data types and reference identifiers) and dynamic memory structures (for the reference payload). This is an oversimplification, but I believe it helps us to understand the role of these abstractions. Having fast equality checks is a side-effect of comparing fixed memory structures, and we can write more memory efficient programs because the copy operation works with identifiers instead of the actual payload.

Working with abstractions is both a burden and a bless, and we need to understand them and learn how to use them to write code that is simple. In the next post, we shall talk about one of the tricks that we have: immutable data structures.

3. How comparing things is faster and simpler with immutability

In the previous sections, I wrote about the nature of value and reference data types, and the differences between shallow and deep operations. In particular, the fact that we need to rely on deep operations to compare things is a major source of complexity in our code. But we can do better.

Comparing mutable structures

When working with mutable data structures, things that can be modified, determining whether an object has actually been changed or not is not so straightforward:

var film = {
    'title': 'Piratees of the Caribean', 
    'released': 2003
};

// At some point, we receive an object and one of its properties
// might have changed. How do we know?
newFilm = doSomething( film );

film === newFilm; // What does a shallow equality yield?

If we are allowed to mutate objects, although film and newFilm identifiers are equal, the payload might have been updated: a shallow equality check won’t suffice, we’ll need to perform a deep equality operation with the original object to know.

Comparing immutable structures

In JavaScript, primitives (numbers, strings, …) are immutable, and reference data types (object, arrays, …) are not. But if mutable structures are the reason why comparing things is difficult, what would happen if we worked with reference data types as if they were immutable?

Let’s see how this would work:

If something changes, instead of mutating the original object, we’ll create a new one with the adequate properties. As the new and the old object will have different identifiers, a shallow equality check will set them apart.

var film = {
    'title': 'Piratees of the Caribean', 
    'released': 2003
};

var doSomeThing = function( film ) { 
    // ... 
    return Object.assign( 
        {}, 
        film, 
        {'title': 'The curse of the Black Pearl'} 
    ); 
}

var newFilm = doSomething( film ); 

film === newFilm; // false

If nothing changes, we’ll return the same object. Because the identifier is the same, the shallow equality check will yield true.

var film = {
    'title': 'Piratees of the Caribean', 
    'released': 2003
};

var doSomeThing = function( film ) { 
    // ... 
    return film; 
} 

var newFilm = doSomething( film ); 

film === newFilm; // true

It is easier to tell what have changed when reference data types are immutable because we can leverage the shallow equality operations.

As a side-effect, it takes less effort to build a whole lot of systems that depend on calculating differences: undo/redo operations, memoization and cache invalidation, state machines, frameworks to build interfaces with the immediate mode paradigm, etc.

Coda

One of the reasons I started this series of posts was to explain how using immutable reference data types was one of the tricks at the core of Redux and React. Their success is teaching us a valuable lesson: immutability and pure functions are the core ideas of the current cycle of building applications – being the separation between API and interface the dominant idea of the previous cycle.

I have already mentioned this some time ago, but, at the time, I wasn’t fully aware of how quick these ideas will spread to other areas of the industry or how that will force us to gain a deeper understanding of language fundamentals.

I’m glad they did because I believe that investing in core concepts is what really matters to stay relevant and make smart decisions in the long term.