Zipping (computer science)
Template:Short description Template:About In computer science, zipping is a function which maps a tuple of sequences into a sequence of tuples. This name zip derives from the action of a zipper in that it interleaves two formerly disjoint sequences. The inverse function is unzip.
ExampleEdit
Given the three words cat, fish and be where |cat| is 3, |fish| is 4 and |be| is 2. Let <math>\ell</math> denote the length of the longest word which is fish; <math>\ell = 4</math>. The zip of cat, fish, be is then 4 tuples of elements:
- <math> (c,f,b)(a,i,e)(t,s,\#)(\#,h,\#)</math>
where # is a symbol not in the original alphabet. In Haskell this truncates to the shortest sequence <math>\underline{\ell}</math>, where <math>\underline{\ell} = 2</math>:
<syntaxhighlight lang="haskell"> zip3 "cat" "fish" "be" -- [('c','f','b'),('a','i','e')] </syntaxhighlight>
DefinitionEdit
Let Σ be an alphabet, # a symbol not in Σ.
Let x1x2... x|x|, y1y2... y|y|, z1z2... z|z|, ... be n words (i.e. finite sequences) of elements of Σ. Let <math>\ell</math> denote the length of the longest word, i.e. the maximum of |x|, |y|, |z|, ... .
The zip of these words is a finite sequence of n-tuples of elements of Template:Math, i.e. an element of <math>((\Sigma\cup\{\# \})^n)^*</math>:
- <math> (x_1,y_1,\ldots)(x_2,y_2,\ldots)\ldots(x_\ell,y_\ell,\ldots)</math>,
where for any index Template:Math, the wi is #.
The zip of x, y, z, ... is denoted zip(x, y, z, ...) or x ⋆ y ⋆ z ⋆ ...
The inverse to zip is sometimes denoted unzip.
A variation of the zip operation is defined by:
- <math> (x_1,y_1,\ldots)(x_2,y_2,\ldots)\ldots(x_{\underline{\ell}},y_{\underline{\ell}},\ldots)</math>
where <math>\underline{\ell}</math> is the minimum length of the input words. It avoids the use of an adjoined element <math>\#</math>, but destroys information about elements of the input sequences beyond <math>\underline{\ell}</math>.
In programming languagesEdit
Zip functions are often available in programming languages, often referred to as Template:Mono. In Lisp-dialects one can simply Template:Mono the desired function over the desired lists, Template:Mono is variadic in Lisp so it can take an arbitrary number of lists as argument. An example from Clojure:<ref>map from ClojureDocs</ref>
<syntaxhighlight lang="clojure">
- `nums' contains an infinite list of numbers (0 1 2 3 ...)
(def nums (range)) (def tens [10 20 30]) (def firstname "Alice")
- To zip (0 1 2 3 ...) and [10 20 30] into a vector, invoke `map vector' on them; same with list
(map vector nums tens) ; ⇒ ([0 10] [1 20] [2 30]) (map list nums tens) ; ⇒ ((0 10) (1 20) (2 30)) (map str nums tens) ; ⇒ ("010" "120" "230")
- `map' truncates to the shortest sequence; note missing \c and \e from "Alice"
(map vector nums tens firstname) ; ⇒ ([0 10 \A] [1 20 \l] [2 30 \i]) (map str nums tens firstname) ; ⇒ ("010A" "120l" "230i")
- To unzip, apply `map vector' or `map list'
(apply map list (map vector nums tens firstname))
- ⇒ ((0 1 2) (10 20 30) (\A \l \i))
</syntaxhighlight>
In Common Lisp: <syntaxhighlight lang="lisp"> (defparameter nums '(1 2 3)) (defparameter tens '(10 20 30)) (defparameter firstname "Alice")
(mapcar #'list nums tens)
- ⇒ ((1 10) (2 20) (3 30))
(mapcar #'list nums tens (coerce firstname 'list))
- ⇒ ((1 10 #\A) (2 20 #\l) (3 30 #\i)) — truncates on shortest list
- Unzips
(apply #'mapcar #'list (mapcar #'list nums tens (coerce firstname 'list)))
- ⇒ ((1 2 3) (10 20 30) (#\A #\l #\i))
</syntaxhighlight>
Languages such as Python provide a Template:Mono function.<ref name="pydocs">map(function, iterable, ...) from section Built-in Functions from Python v2.7.2 documentation</ref> Template:Mono in conjunction with the Template:Mono operator unzips a list:<ref name="pydocs"/> <syntaxhighlight lang="pycon"> >>> nums = [1, 2, 3] >>> tens = [10, 20, 30] >>> firstname = 'Alice'
>>> zipped = list(zip(nums, tens)) >>> zipped [(1, 10), (2, 20), (3, 30)]
>>> list(zip(*zipped)) # unzip [(1, 2, 3), (10, 20, 30)]
>>> zipped2 = list(zip(nums, tens, list(firstname))) >>> zipped2 # zip, truncates on shortest [(1, 10, 'A'), (2, 20, 'l'), (3, 30, 'i')] >>> list(zip(*zipped2)) # unzip [(1, 2, 3), (10, 20, 30), ('A', 'l', 'i')] </syntaxhighlight>
Haskell has a method of zipping sequences but requires a specific function for each arity (Template:Mono for two sequences, Template:Mono for three etc.),<ref>zip :: [a] -> [b] -> [(a, b)] from Prelude, Basic libraries</ref> similarly the functions Template:Mono and Template:Mono are available for unzipping: <syntaxhighlight lang="haskell"> -- nums contains an infinite list of numbers [1, 2, 3, ...] nums = [1..] tens = [10, 20, 30] firstname = "Alice"
zip nums tens -- ⇒ [(1,10), (2,20), (3,30)] — zip, truncates infinite list unzip $ zip nums tens -- ⇒ ([1,2,3], [10,20,30]) — unzip
zip3 nums tens firstname -- ⇒ [(1,10,'A'), (2,20,'l'), (3,30,'i')] — zip, truncates unzip3 $ zip3 nums tens firstname -- ⇒ ([1,2,3], [10,20,30], "Ali") — unzip </syntaxhighlight>
Language comparisonEdit
List of languages by support of zip:
Language | Unzip | Unzip 3 tuples | Unzip n tuples | Notes |
---|---|---|---|---|
Clojure | Template:Mono | Template:Mono | Template:Mono | |
Common Lisp | Template:Codett | Template:Codett | Template:Codett | |
F# | Template:Mono Template:Mono Template:Mono |
Template:Mono Template:Mono Template:Mono |
||
Haskell | Template:Mono | Template:Mono | Template:Mono | Template:Mono for n > 3 is available in the module Template:Mono |
Python | Template:Mono | Template:Mono | Template:Mono |
See alsoEdit
ReferencesEdit
<references/>