Did you know that across modern programming languages there are 4 completely different implementations of the zip function?
Join me as I take a quick dive into how (and why) the functionality of zip varies across different languages with easy to understand examples.
According to Wikipedia, zip (known as convolution in computer science) is a function that “maps a tuple of sequences into a sequence of tuples.” Rephrased and simplified, zip takes a pair of lists and outputs a list of pairs.
When you have lists of equal lengths, the functionality is straightforward. However, when you have lists of differing lengths, there are a few different ways the function can behave.
I was surprised to find out that modern programming languages do not agree on the implementation of this case.
Languages like Python, Clojure, and Common Lisp use the length of the shortest list to determine the length of the returned list of pairs.
I emailed Raymond Hettinger, one of the core contributors to Python, and he had the following to say about this design decision.
The history of Python’s zip() is documented here: http://www.python.org/dev/
The potentially negative side effect of this implementation is that you can destroy the data of the longer input sequence.
_.zip has to be eager), I think we should leave this as-is. Not destroying any of the incoming data is a nice feature, and you can always stop iterating when you see undefined values, or compact out undefined parts of your result.
F# (and map based implementations of zip in Racket) won’t even let you use zip with lists of different lengths.
Personally, I think this implementation runs the least risk for new people in a language: if you try a case, which could return different results across different languages, you’ll be protected from losing data (or generating confusing null values) by the type check.
Ruby may have the best (theoretically) and worst (functionally) implementation of zip: it uses the length of the list that you call zip on (it’s a method on the Array object) as the length of the final list of pairs.
With this implementation, if we use zip correctly, we can get the best of both worlds. That said, it’s a language subtlety that will almost certainly be lost of the vast majority of Ruby users — potentially adding more confusion that it’s worth.
If we look at the Wikipedia definition of zip, we see the following line:
Let denote the length of the longest word, i.e. the maximum of |x|, |y|
This suggests that the length of the longest list should be used; however, Wikipedia turns right back on itself and adds:
A variation of the convolution operation is defined…where is the minimum length of the input words
In other words, no one really knows. I certainly don’t, bringing me to my next point…
If you know more about where zip comes from, what the “correct” definition is, or why there’s such variation, please enlighten us — post a comment, tweet at me, or send me an email and I’ll add any additional information to this piece.
Thanks to Joe Wegner and Avinash D’Souza for reading drafts of this.