Editing Multiple dispatch (section)

== Understanding dispatch ==
Developers of computer software typically organize [[source code]] into named blocks variously called [[subroutine]]s, procedures, subprograms, functions, or methods. The code in the function is executed by ''calling'' it – executing a piece of code that references its ''name''. This transfers control temporarily to the called function; when the function's execution has completed, control is typically transferred back to the instruction in the ''caller'' that follows the reference.

Function names are usually selected so as to be descriptive of the function's purpose. It is sometimes desirable to give several functions the same name, often because they perform conceptually similar tasks, but operate on different types of input data. In such cases, the name reference at the function call site is not sufficient for identifying the block of code to be executed. Instead, the number and type of the arguments to the function call are also used to select among several function implementations.

In more conventional, i.e., [[Dynamic dispatch#Single and multiple dispatch|single-dispatch]] [[object-oriented programming]] languages, when invoking a method (''sending a message'' in [[Smalltalk]], ''calling a member function'' in [[C++]]), one of its arguments is treated specially and used to determine which of the (potentially many) classes of methods of that name is to be applied. In many languages, the ''special'' argument is indicated syntactically; for example, a number of programming languages put the special argument before a dot in making a method call: <code>special.method(other, arguments, here)</code>, so that <code>lion.sound()</code> would produce a roar, whereas <code>sparrow.sound()</code> would produce a chirp.

In contrast, in languages with multiple dispatch, the selected method is simply the one whose arguments match the number and type of the function call. There is no ''special'' argument that ''owns'' the function/method carried out in a particular call.

Multiple dispatch should be distinguished from [[function overloading]], in which static typing information, such as a term's declared or inferred type (or base type in a language with subtyping) is used to determine which of several possibilities will be used at a given call site, and that determination is made at compile or link time (or some other time before program execution starts) and is thereafter invariant for a given deployment or run of the program.  Many languages such as C++ offer robust function overloading but do not offer dynamic multiple dispatch (C++ only permits dynamic single dispatch through use of virtual functions).

=== Data types ===
When working with languages that can discriminate [[data type]]s at [[compile time]], selecting among the alternatives can occur then. The act of creating such alternative functions for compile time selection is usually referred to as [[Function overloading|overloading]] a function.

In programming languages that defer data type identification until run time (i.e., [[late binding]]), selection among alternative functions must occur then, based on the dynamically determined types of function arguments. Functions whose alternative implementations are selected in this manner are referred to most generally as ''multimethods''.

There is some run-time cost associated with dynamically dispatching function calls. In some languages,{{Citation needed|date=December 2007}} the distinction between overloading and multimethods can be blurred, with the compiler determining whether compile time selection can be applied to a given function call, or whether slower run time dispatch is needed.

=== Issues ===

There are several known issues with dynamic-dispatch, both single and multiple.  While many of these issues are solved for single-dispatch, which has been a standard feature in object-oriented programming languages for decades, these issues become more complicated in the multiple-dispatch case.

====Expressiveness and modularity====

In most popular programming languages, source code is delivered and deployed in granules of functionality which we will here call ''packages''; actual terminology for this concept varies between language.  Each package may contain multiple type, value, and function definitions, packages are often compiled separately in languages with a compilation step, and a non-cyclical dependency relationship may exist.  A complete program is a set of packages, with a ''main package'' which may depend on several other packages, and the whole program consisting of the transitive closure of the dependency relationship.

The so-called [[expression problem]] relates to the ability for code in a depending package to extend behaviors (functions or datatypes) defined in a base package from within an including package, without modifying the source to the base package.  Traditional single-dispatch OO languages make it trivial to add new datatypes but not new functions; traditional functional languages tend to have the opposite effect, and multiple dispatch, if implemented correctly, allows both.  It is desirable for an implementation of multiple dispatch to have the following properties:

* It is possible to define different "cases" of a multi-method from within different packages without modifying the source of a base package.
* Inclusion of another package in the program should not change the behavior of a given multi-method call, when the call does not use any datatypes defined in the package.
* Conversely, if a datatype is defined in a given package, and a multi-method extension using that type is also defined in the same package, and a value of that type is passed (through a base type reference or into a generic function) into another package with no dependency on that package, and then the multi-method is invoked with that value as an argument, the multi-method case defined in the package which includes the type should be employed.  To put it another way—within a given program, the same multi-method invoked with the same set of arguments should resolve to the same implementation, regardless of the location of the call site, and whether or not a given definition is "in scope" or "visible" at the point of the method call.

====Ambiguity====

It is generally desirable that for any given invocation of a multi-method, there be at most one "best" candidate among implementation cases of the multi-method, and/or that if there is not, that this be resolved in a predictable and deterministic fashion, including failure.  Non-deterministic behavior is undesirable.  Assuming a set of types with a non-circular subtyping relationship, one can define that one implementation of a multi-method is "better" (more specific) if all dynamically-dispatched arguments in the first are subtypes of all dynamically-dispatched arguments specified in the second, and at least one is a strict subtype.  With single dispatch and in the absence of [[multiple inheritance]], this condition is trivially satisfied, but with multiple dispatch, it is possible for two or more candidates to satisfy a given actual argument list, but neither is more specific than the other (one dynamic argument being the subtype in one case, another being the subtype in the other case).  This particularly can happen if two different packages, neither depending on the other, both extend some multi-method with implementations concerning each package's types, and then a third package that includes both (possibly indirectly) then invokes the multi-method using arguments from both packages.

Possible resolutions include:

* Treating any ambiguous calls as an error.  This might be caught at compile time (or otherwise before deployment), but might not be detected until runtime and produce a runtime error.
* Ordering the arguments, so e.g. the case with the most specific first argument is selected, and subsequent arguments are not considered for ambiguity resolution unless the first argument is insufficient to resolve the issue.
* Construction of other rules for resolving an ambiguity in one direction or another.  Sometimes, such rules might be arbitrary and surprising.  In the rules for static overload resolution in C++, for instance, a type which matches exactly is understandably considered a better match than a type which matches through a base type reference or a generic (template) parameter.  However, if the only possible matches are either through a base type or a generic parameter, the generic parameter is preferred over the base type, a rule that sometimes produces surprising behavior.

====Efficiency====

Efficient implementation of single-dispatch, including in programming languages that are separately compiled to object code and linked with a low-level (not-language-aware) linker, including dynamically at program load/start time or even under the direction of the application code, are well known.  The "[[vtable]]" method developed in C++ and other early OO languages (where each class has an array of function pointers corresponding to that class's virtual functions) is nearly as fast as a static method call, requiring O(1) overhead and only one additional memory lookup even in the un-optimized case.  However, the vtable method uses the function name and not the argument type as its lookup key, and does not scale to the multiple dispatch case.  (It also depends on the object-oriented paradigm of methods being features of classes, not standalone entities independent of any particular datatype).

Efficient implementation of multiple-dispatch remains an ongoing research problem.

=== Use in practice ===
To estimate how often multiple dispatch is used in practice, Muschevici et al.<ref name=Muschevici>{{cite book |last1=Muschevici |first1=Radu |last2=Potanin |first2=Alex |last3=Tempero |first3=Ewan |last4=Noble |first4=James |title=Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications |chapter=Multiple dispatch in practice |year=2008 |series=OOPSLA '08 |pages=563–582 |doi=10.1145/1449764.1449808 |publisher=ACM |location=Nashville, TN, USA |isbn=9781605582153|s2cid=7605233 |url=https://figshare.com/articles/thesis/Multiple_Dispatch_in_Practice/16959112 }}</ref> studied programs that use dynamic dispatch. They analyzed nine applications, mostly compilers, written in six different languages: [[Common Lisp Object System]], [[Dylan (programming language)|Dylan]], [[Cecil (programming language)|Cecil]], MultiJava, Diesel, and Nice. Their results show that 13–32% of generic functions use the dynamic type of one argument, while 2.7–6.5% of them use the dynamic type of multiple arguments. The remaining 65–93% of generic functions have one concrete method (overrider), and thus are not considered to use the dynamic types of their arguments. Further, the study reports that 2–20% of generic functions had two and 3–6% had three concrete function implementations. The numbers decrease rapidly for functions with more concrete overriders.

Multiple dispatch is used much more heavily in [[Julia (programming language)|Julia]], where multiple dispatch was a central design concept from the origin of the language: collecting the same statistics as Muschevici on the average number of methods per generic function, it was found that the Julia [[standard library]] uses more than double the amount of overloading than in the other languages analyzed by Muschevici, and more than 10 times in the case of [[Operator (computer programming)|binary operators]].<ref name=julia-review/>

The data from these papers is summarized in the following table, where the dispatch ratio <code>DR</code> is the average number of methods per generic function; the choice ratio <code>CR</code> is the mean of the square of the number of methods (to better measure the frequency of functions with a large number of methods);<ref name=Muschevici/><ref name=julia-review/> and the degree of specialization <code>DoS</code> is the average number of type-specialized arguments per method (i.e., the number of arguments that are dispatched on):

{| class="wikitable sortable"
|-
! Language
! Average # methods (DR)
! Choice ratio (CR)
! Degree of specialization (DoS)
|-
| [[Cecil (programming language)|Cecil]]<ref name=Muschevici/> 
| 2.33 
| 63.30 
| 1.06
|- 
| [[Common Lisp]] ([[CMU Common Lisp|CMU]])<ref name=Muschevici/> 
| 2.03 
| 6.34 
| 1.17 
|- 
| Common Lisp ([[McCLIM]])<ref name=Muschevici/> 
| 2.32 
| 15.43 
| 1.17 
|-
| Common Lisp ([[Steel Bank Common Lisp|Steel Bank]])<ref name=Muschevici/> 
| 2.37 
| 26.57 
| 1.11 
|-
| Diesel<ref name=Muschevici/> 
| 2.07 
| 31.65 
| 0.71
|- 
| [[Dylan (programming language)|Dylan]] (Gwydion)<ref name=Muschevici/> 
| 1.74 
| 18.27 
| 2.14
|-
| Dylan (OpenDylan)<ref name=Muschevici/> 
| 2.51 
| 43.84 
| 1.23 
|- 
| [[Julia (programming language)|Julia]]<ref name=julia-review/> 
| 5.86 
| 51.44 
| 1.54 
|- 
| Julia (operators only)<ref name=julia-review/> 
| 28.13 
| 78.06 
| 2.01 
|-
| MultiJava<ref name=Muschevici/> 
| 1.50 
| 8.92 
| 1.02
|-
| Nice<ref name=Muschevici/> 
| 1.36 
| 3.46 
| 0.33
|- 
|}

=== Theory ===
The theory of multiple dispatching languages was first developed by Castagna et al., by defining a model for overloaded functions with [[late binding]].<ref>{{cite journal |last1=Castagna |first1=Giuseppe |last2=Ghelli |first2=Giorgio |last3=Longo |first3=Giuseppe |name-list-style=amp |year=1995 |title=A calculus for overloaded functions with subtyping |journal=Information and Computation |volume=117 |issue=1 |pages=115–135 |doi=10.1006/inco.1995.1033 |doi-access=free }}</ref><ref>{{cite book |last=Castagna |first=Giuseppe |title=Object-Oriented Programming: A Unified Foundation |url=https://www.springer.com/birkhauser/computer+science/book/978-0-8176-3905-1 |year=1996 |publisher=Birkhäuser |isbn=978-0-8176-3905-1 |page=384 |series=Progress in Theoretical Computer Science}}</ref> It yielded the first formalization of the [[Covariance and contravariance (computer science)|problem of covariance and contravariance]] of object-oriented languages<ref>{{cite journal |last=Castagna |first=Giuseppe |year=1995 |title=Covariance and contravariance: conflict without a cause |journal=ACM Transactions on Programming Languages and Systems|volume=17 |issue=3 |pages=431–447 |doi=10.1145/203095.203096 |citeseerx=10.1.1.115.5992|s2cid=15402223 }}</ref> and a solution to the problem of binary methods.<ref>{{cite journal |last1=Bruce |first1=Kim |last2=Cardelli |first2=Luca |last3=Castagna |first3=Giuseppe |last4=Leavens |first4=Gary T. |last5=Pierce |first5=Benjamin |year=1995 |title=On binary methods |journal=Theory and Practice of Object Systems |volume=1 |issue=3 |pages= 221–242|doi= 10.1002/j.1096-9942.1995.tb00019.x|url=http://dl.acm.org/citation.cfm?id=230849.230854 |access-date=2013-04-19|url-access=subscription }}</ref>