Editing Schwartzian transform

{{Short description|Programming idiom for efficiently sorting a list by a computed key}}
{{More footnotes needed|date=February 2024}}
In [[computer programming]], the '''Schwartzian transform''' is a technique used to improve the efficiency of [[sorting]] a list of items. This [[programming idiom|idiom]]<ref>{{cite book |year=2002 |chapter=2.3 Sorting While Guaranteeing Sort Stability |editor1-last=Martelli |editor1-first=Alex |editor2-last=Ascher |editor2-first=David |title=Python Cookbook |publisher=O'Reilly & Associates |page=[https://archive.org/details/pythoncookbook00mart/page/43 43] |isbn=0-596-00167-3 |quote=This idiom is also known as the 'Schwartzian transform', by analogy with a related Perl idiom. |chapter-url-access=registration |chapter-url=https://archive.org/details/pythoncookbook00mart/page/43 }}</ref> is appropriate for [[comparison sort|comparison-based sorting]] when the ordering is actually based on the ordering of a certain property (the ''key'') of the elements, where computing that property is an intensive operation that should be performed a minimal number of times. The Schwartzian transform is notable in that it does not use named temporary arrays.

The Schwartzian transform is a version of a [[Lisp programming language|Lisp]] idiom known as '''decorate-sort-undecorate''', which avoids recomputing the sort keys by temporarily associating them with the input items. This approach is similar to {{not a typo|[[memoization]]}}, which avoids repeating the calculation of the key corresponding to a specific input value. By comparison, this idiom assures that each input item's key is calculated exactly once, which may still result in repeating some calculations if the input data contains duplicate items.

The idiom is named after [[Randal L. Schwartz]], who first demonstrated it in [[Perl]] shortly after the release of Perl 5 in 1994. The term "Schwartzian transform" applied solely to Perl [[programming language|programming]] for a number of years, but it has later been adopted by some users of other [[programming language|languages]], such as [[Python (programming language)|Python]], to refer to similar idioms in those languages. However, the algorithm was already in use in other languages (under no specific name) before it was popularized among the Perl community in the form of that particular idiom by Schwartz. The term "Schwartzian transform" indicates a specific idiom, and ''not'' the algorithm in general.

For example, to sort the word list ("aaaa","a","aa") according to word length: first build the list (["aaaa",4],["a",1],["aa",2]), then sort it according to the numeric values getting (["a",1],["aa",2],["aaaa",4]), then strip off the numbers and you get ("a","aa","aaaa"). That was the algorithm in general, so it does not count as a transform. To make it a true Schwartzian transform, it would be done in Perl like this:
<syntaxhighlight lang="perl">
@sorted = map  { $_->[0] }
          sort { $a->[1] <=> $b->[1] or $a->[0] cmp $b->[0] } # Use numeric comparison, fall back to string sort on original
          map  { [$_, length($_)] }    # Calculate the length of the string
               @unsorted;
</syntaxhighlight>

==The Perl idiom==
The general form of the Schwartzian transform is:

<syntaxhighlight lang="perl">
@sorted = map  { $_->[0] }
          sort { $a->[1] cmp $b->[1] or $a->[0] cmp $b->[0] }
          map  { [$_, foo($_)] }
               @unsorted;
</syntaxhighlight>

Here <code>foo($_)</code> represents an expression that takes <code>$_</code> (each item of the list in turn) and produces the corresponding value that is to be compared in its stead.

Reading from right to left (or from the bottom to the top):
* the original list <code>@unsorted</code> is fed into a <code>map</code> operation that wraps each item into a (reference to an anonymous 2-element) array consisting of itself and the calculated value that will determine its sort order (list of item becomes a list of [item, value]);
* then the list of lists produced by <code>map</code> is fed into <code>sort</code>, which sorts it according to the values previously calculated (list of [item, value] ⇒ sorted list of [item, value]);
* finally, another <code>map</code> operation unwraps the values (from the anonymous array) used for the sorting, producing the items of the original list in the sorted order (sorted list of [item, value] ⇒ sorted list of item).

The use of anonymous arrays ensures that memory will be reclaimed by the Perl garbage collector immediately after the sorting is done.

==Efficiency analysis==
Without the Schwartzian transform, the sorting in the example above would be written in Perl like this:
<syntaxhighlight lang="perl">
@sorted = sort { foo($a) cmp foo($b) } @unsorted;
</syntaxhighlight>

While it is shorter to code, the naive approach here could be much less efficient if the key function (called {{mono|foo}} in the example above) is expensive to compute. This is because the code inside the brackets is evaluated each time two elements need to be compared. An optimal [[comparison sort]] performs ''[[big o notation|O]]''(''n log n'') comparisons (where ''n'' is the length of the list), with 2 calls to {{mono|foo}} every comparison, resulting in ''O''(''n log n'') calls to {{mono|foo}}. In comparison, using the Schwartzian transform, we only make 1 call to {{mono|foo}} per element, at the beginning {{mono|map}} stage, for a total of ''n'' calls to {{mono|foo}}.

However, if the function {{mono|foo}} is relatively simple, then the extra overhead of the Schwartzian transform may be unwarranted.

==Example==

For example, to sort a list of files by their [[mac times|modification times]], a naive approach might be as follows:

 '''function''' naiveCompare(file a, file b) {
     '''return''' modificationTime(a) < modificationTime(b)
 }
 
 ''// Assume that sort(list, comparisonPredicate) sorts the given list using''
 ''// the comparisonPredicate to compare two elements.''
 sortedArray := sort(filesArray, naiveCompare)

Unless the modification times are {{not a typo|memoized}} for each file, this method requires re-computing them every time a file is compared in the sort. Using the Schwartzian transform, the modification time is calculated only once per file.

A Schwartzian transform involves the functional idiom described above, which does not use temporary arrays.

The same algorithm can be written procedurally to better illustrate how it works, but this requires using temporary arrays, and is not a Schwartzian transform. The following example pseudo-code implements the algorithm in this way:

 '''for each''' file '''in''' filesArray
     insert array(file, modificationTime(file)) at end of transformedArray
 
 '''function''' simpleCompare(array a, array b) {
     '''return''' a[2] < b[2]
 }
 
 transformedArray := sort(transformedArray, simpleCompare)
 
 '''for each''' file '''in''' transformedArray
     insert file[1] at end of sortedArray

==History==

The first known online appearance of the Schwartzian transform is a December 16, 1994 [http://groups.google.com/group/comp.unix.shell/browse_frm/thread/31da970cebb30c6d?hl=en posting by Randal Schwartz] to a thread in comp.unix.shell [[Usenet newsgroup]], crossposted to comp.lang.perl. (The current version of the [http://history.perl.org/PerlTimeline.html Perl Timeline] is incorrect and refers to a later date in 1995.) The thread began with a question about how to sort a list of lines by their "last" word:

 adjn:Joshua Ng
 adktk:KaLap Timothy Kwong
 admg:Mahalingam Gobieramanan
 admln:Martha L. Nangalama

Schwartz responded with:

<syntaxhighlight lang="perl">
#!/usr/bin/env perl
require 5; # New features, new bugs!
print
    map { $_->[0] }
    sort { $a->[1] cmp $b->[1] }
    map { [$_, /(\S+)$/] }
    <>; 
</syntaxhighlight>

This code produces the result:

 admg:Mahalingam Gobieramanan
 adktk:KaLap Timothy Kwong
 admln:Martha L. Nangalama
 adjn:Joshua Ng

Schwartz noted in the post that he was "Speak[ing] with a lisp in Perl", a reference to the idiom's [[Lisp (programming language)|Lisp]] origins.

The term "Schwartzian transform" itself was coined by [[Tom Christiansen]] in a follow-up reply. Later posts by Christiansen made it clear that he had not intended to ''name'' the construct, but merely to refer to it from the original post: his attempt to finally name it "The Black Transform" did not take hold ("Black" here being a pun on "schwar[t]z", which means black in German).

==Comparison to other languages==
Some other languages provide a convenient interface to the same optimization as the Schwartzian transform:
* In [[Python (programming language)|Python]] 2.4 and above, both the {{mono|sorted()}} function and the in-place {{mono|list.sort()}} method take a {{mono|1=key=}} parameter that allows the user to provide a "key function" (like {{mono|foo}} in the examples above). In Python 3 and above, use of the key function is the only way to specify a custom sort order (the previously supported {{mono|1=cmp=}} parameter that allowed the user to provide a "comparison function" was removed). Before Python 2.4, developers would use the lisp-originated decorate–sort–undecorate (DSU) idiom,<ref>{{cite web |title=How To/Sorting/Decorate Sort Undecorate |url=https://wiki.python.org/moin/HowTo/Sorting#The_Old_Way_Using_Decorate-Sort-Undecorate}}</ref> usually by wrapping the objects in a (sortkey, object) tuple.
* In [[Ruby (programming language)|Ruby]] 1.8.6 and above, the {{mono|Enumerable}} abstract class (which includes {{mono|Array}}s) contains a {{mono|sort_by}}<ref name="Module: Enumerable">{{cite web |title=Module Enumerable |url=https://docs.ruby-lang.org/en/master/Enumerable.html#method-i-sort_by}}</ref> method, which allows specifying the "key function" (like {{mono|foo}} in the examples above) as a code block.
* In [[D (programming language)|D]] 2 and above, the {{mono|schwartz Sort}} function is available. It might require less temporary data and be faster than the Perl idiom or the decorate–sort–undecorate idiom present in Python and Lisp. This is because sorting is done in-place, and only minimal extra data (one array of transformed elements) is created.
* [[Racket (programming language)|Racket's]] core <code>sort</code> function accepts a <code>#:key</code> keyword argument with a function that extracts a key, and an additional <code>#:cache-keys?</code> requests that the resulting values are cached during sorting. For example, a convenient way to shuffle a list is {{code|2=racket|(sort l < #:key (λ (_) (random)) #:cache-keys? #t)}}.
* In [[PHP (programming language)|PHP]] 5.3 and above the transform can be implemented by use of {{mono|array_walk}}, e.g. to work around the limitations of the unstable sort algorithms in PHP.<syntaxhighlight lang="php">
function spaceballs_sort(array& $a): void
{
    array_walk($a, function(&$v, $k) { $v = array($v, $k); });
    asort($a);
    array_walk($a, function(&$v, $_) { $v = $v[0]; });
}</syntaxhighlight>
* In [[Elixir (programming language)|Elixir]], the {{mono|Enum.sort_by/2}} and {{mono|Enum.sort_by/3}} methods allow users to perform a Schwartzian transform for any module that implements the {{mono|Enumerable}} protocol.
* In [[Raku (programming language)|Raku]], one needs to supply a comparator lambda that only takes 1 argument to perform a Schwartzian transform under the hood: <syntaxhighlight lang="Perl6">@a.sort( { $^a.Str } ) # or shorter: @a.sort(*.Str)</syntaxhighlight> would sort on the string representation using a Schwartzian transform, <syntaxhighlight lang="Perl6">@a.sort( { $^a.Str cmp $^b.Str } )</syntaxhighlight> would do the same converting the elements to compare just before each comparison.
* In [[Rust (programming language)|Rust]], somewhat confusingly, the {{mono|slice::sort_by_key}} method does ''not'' perform a Schwartzian transform as it will not allocate additional storage for the key, it will call the key function for each value for each comparison. The {{mono|slice::sort_by_cached_key}} method will compute the keys once per element.
* In [[Haskell (programming language)|Haskell]], the <code>sortOn</code> function from the base library performs a Schwartzian transform.

==References==
<references/>

==External links==
{{Wikibooks|Algorithm Implementation/Sorting|Schwartzian transform}}
* [http://www.stonehenge.com/merlyn/UnixReview/col64.html Sorting with the Schwartzian Transform by Randal L. Schwartz]
* [http://perl.plover.com/TPC/1998/Hardware-notes.html#Schwartzian_Transform Mark-Jason Dominus explains the Schwartzian Transform]
* http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52234
* Python Software Foundation (2005). [https://www.python.org/doc/faq/programming/#i-want-to-do-a-complicated-sort-can-you-do-a-schwartzian-transform-in-python 1.5.2   I want to do a complicated sort: can you do a Schwartzian Transform in Python?]. Retrieved June 22, 2005.
* [https://metacpan.org/module/Memoize Memoize Perl module - making expensive functions faster by caching their results.]

[[Category:Sorting algorithms]]
[[Category:Articles with example Perl code]]
[[Category:Articles with example Racket code]]
[[Category:Perl]]