What’s up with zip?

November 25, 2013

Did you know that across modern programming languages there are 4 completely different implementations of the zip function?

If you run the following psuedo-code in Ruby, Python, Javascript, and F#, you’ll get a completely different result in each language.

Join me as I take a quick dive into how (and why) the functionality of zip varies across different languages with easy to understand examples.

What is zip?

According to Wikipedia, zip (known as convolution in computer science) is a function that “maps a tuple of sequences into a sequence of tuples.” Rephrased and simplified, zip takes a pair of lists and outputs a list of pairs.

When you have lists of equal lengths, the functionality is straightforward. However, when you have lists of differing lengths, there are a few different ways the function can behave.

I was surprised to find out that modern programming languages do not agree on the implementation of this case.

Minimum length

Languages like Python, Clojure, and Common Lisp use the length of the shortest list to determine the length of the returned list of pairs.

I emailed Raymond Hettinger, one of the core contributors to Python, and he had the following to say about this design decision.

The history of Python’s zip() is documented here:  http://www.python.org/dev/peps/pep-0201/Barry [the author of that PEP] is an Emacs guy, so ELisp is part of his core vocabulary. The roots of all zips trace back to Lisp where zipping stops with the shortest input list:  http://jtra.cz/stuff/lisp/sclr/mapcar.htmlStopping with the shortest is a useful behavior because it allows infinite input streams to be combined with finite streams (a reasonably common functional programming technique).

The potentially negative side effect of this implementation is that you can destroy the data of the longer input sequence.

Max length

While Javascript doesn’t have a built in zip, one of the most popular Javascript libaries, underscore.js, provides a built in zip. In this implementation, zip has the opposite functionality of the Python implementation: it uses the length of the longest list and fills the pairs with undefined when the shorter list runs out.

In a Github issue discussing this topic, the maintainer of the library, Jeremy Ashkenas, decided that they would leave the implementation as-is:

Without any super strong arguments in any particular direction (infinite lists are unconvincing in JavaScript, because _.zip has to be eager), I think we should leave this as-is. Not destroying any of the incoming data is a nice feature, and you can always stop iterating when you see undefined values, or compact out undefined parts of your result.

Lengths must be equal

F# (and map based implementations of zip in Racket) won’t even let you use zip with lists of different lengths.

Personally, I think this implementation runs the least risk for new people in a language: if you try a case, which could return different results across different languages, you’ll be protected from losing data (or generating confusing null values) by the type check.

Length of first list

Ruby may have the best (theoretically) and worst (functionally) implementation of zip: it uses the length of the list that you call zip on (it’s a method on the Array object) as the length of the final list of pairs.

With this implementation, if we use zip correctly, we can get the best of both worlds. That said, it’s a language subtlety that will almost certainly be lost of the vast majority of Ruby users — potentially adding more confusion that it’s worth.

Is there a right answer?

If we look at the Wikipedia definition of zip, we see the following line:

Let \ell denote the length of the longest word, i.e. the maximum of |x|, |y|

This suggests that the length of the longest list should be used; however, Wikipedia turns right back on itself and adds:

A variation of the convolution operation is defined…where \underline{\ell} is the minimum length of the input words

In other words, no one really knows. I certainly don’t, bringing me to my next point…

Do you know more?

If you know more about where zip comes from, what the “correct” definition is, or why there’s such variation, please enlighten us — post a comment, tweet at me, or send me an email and I’ll add any additional information to this piece.

Thanks to Joe Wegner and Avinash D’Souza for reading drafts of this. 

blog comments powered by Disqus