Editing Mach (kernel) (section)

===Performance issues===
Mach was originally intended to be a replacement for classical monolithic UNIX, and for this reason contained many UNIX-like ideas. For instance, Mach provided a permissions and security system similar to that used by UNIX's file system. Since the kernel was privileged (running in ''kernel-space'') over other OS servers and software, it was possible for malfunctioning or malicious programs to send it commands that would cause damage to the system, and for this reason the kernel checked every message for validity. Additionally most of the operating system functionality was to be located in user-space programs, so this meant there needed to be some way for the kernel to grant these programs additional privileges, e.g. to directly access hardware.

Some of Mach's more esoteric features were also based on this same IPC mechanism. For instance, Mach was able to support multi-processor machines with ease. In a traditional kernel extensive work needs to be carried out to make it [[reentrancy (computing)|reentrant]] or ''interruptible'', as programs running on different processors could call into the kernel at the same time. Under Mach, the bits of the operating system are isolated in servers, which are able to run, like any other program, on any processor. Although in theory the Mach kernel would also have to be reentrant, in practice this is not an issue because its response times are so fast it can simply wait and serve requests in turn. Mach also included a server that could forward messages not just between programs, but even over the network, which was an area of intense development in the late 1980s and early 1990s.

Unfortunately, the use of IPC for almost all tasks turned out to have serious performance impact. Benchmarks on 1997 hardware showed that Mach 3.0-based [[UNIX]] single-server implementations were about 50% slower than native UNIX.<ref name="condict94">{{cite web |author1=M. Condict |author2=D. Bolinger |author3=E. McManus |author4=D. Mitchell |author5=S. Lewontin |title=Microkernel modularity with integrated kernel performance |url=http://www.cs.utah.edu/~lepreau/osdi94/condict/abstract.html |date=April 1994 |access-date=February 19, 2019 |archive-date=June 19, 2017 |archive-url=https://web.archive.org/web/20170619131352/http://www.cs.utah.edu/~lepreau/osdi94/condict/abstract.html |url-status=dead}}</ref><ref name="hartig97p67">{{cite conference |doi= 10.1145/269005.266660|title=The performance of μ-kernel-based systems |conference= 16th ACM symposium on Operating systems principles (SOSP'97)|location=Saint-Malo, France |date=October 1997 |volume= 31 |isbn=0-89791-916-5 |issue= 5 |url=http://os.inf.tu-dresden.de/pubs/sosp97/|page=67 |first1=Hermann |last1=Härtig |first2=Michael |last2=Hohmuth |first3=Jochen |last3=Liedtke |first4=Sebastian |last4=Schönberg |first5=Jean |last5=Wolter |author-link3=Jochen Liedtke|doi-access=free }}</ref>

Study of the exact nature of the performance problems turned up a number of interesting facts. One was that the IPC was not the problem: there was some overhead associated with the memory mapping needed to support it, but this added only a small amount of time to making a call. The rest, 80% of the time being spent, was due to additional tasks the kernel was running on the messages. Primary among these was the port rights checking and message validity. In benchmarks on an [[i486|486]]DX-50, a standard UNIX system call took an average of 21[[microsecond|μs]] to complete, while the equivalent operation with Mach IPC averaged 114μs. Only 18μs of this was hardware related; the rest was the Mach kernel running various routines on the message.<ref name="liedtke93">{{cite conference |author= Jochen Liedtke |title= Improving IPC by Kernel Design |book-title= Proceedings of the 14th ACM Symposium on Operating System Principles (SOSP) |year=1993 |isbn=978-0-89791-632-5 |author-link=Jochen Liedtke |citeseerx=10.1.1.55.9939 |doi=10.1145/168619.168633}}</ref> Given a syscall that does nothing, a full round-trip under BSD would require about 40μs, whereas on a user-space Mach system it would take just under 500μs.

When Mach was first being seriously used in the 2.x versions, performance was slower than traditional monolithic operating systems, perhaps as much as 25%.<ref name="JyEdfG" /> This cost was not considered particularly worrying, however, because the system was also offering multi-processor support and easy portability. Many felt this was an expected and acceptable cost to pay. When Mach 3 attempted to move most of the operating system into user-space, the overhead became higher still: benchmarks between Mach and [[Ultrix]] on a MIPS [[R3000]] showed a performance hit as great as 67% on some workloads.<ref name="chen93">{{cite journal |title= The impact of operating system structure on memory system performance |last1= Chen |first1= J B |last2= Bershad |first2= B N |journal= ACM SIGOPS Operating Systems Review |volume= 27 |page= 133 |year= 1993 |citeseerx= 10.1.1.52.4651 |issue= 5|doi= 10.1145/173668.168629}}</ref>

For example, getting the system time involves an IPC call to the user-space server maintaining [[system clock]]. The caller first traps into the kernel, causing a context switch and memory mapping. The kernel then checks that the caller has required access rights and that the message is valid. If it is, there is another context switch and memory mapping to complete the call into the user-space server. The process must then be repeated to return the results, adding up to a total of four context switches and memory mappings, plus two message verifications. This overhead rapidly compounds with more complex services, where there are often code paths passing through many servers.

This was not the only source of performance problems. Another centered on the problems of trying to handle memory properly when physical memory ran low and paging had to occur. In the traditional monolithic operating systems the authors had direct experience with which parts of the kernel called which others, allowing them to fine-tune their pager to avoid paging out code that was about to be used. Under Mach this was not possible because the kernel had no real idea what the operating system consisted of. Instead they had to use a single one-size-fits-all solution, which added to the performance problems. Mach 3 attempted to address this problem by providing a simple pager, relying on user-space pagers for better specialization. But this turned out to have little effect. In practice, any benefits it had were wiped out by the expensive IPC needed to call it in.

Other performance problems were related to Mach's support for [[multiprocessor]] systems. From the mid-1980s to the early 1990s, commodity CPUs grew in performance at a rate of about 60% a year, but the speed of memory access grew at only 7% a year. This meant that the cost of accessing memory grew tremendously over this period, and since Mach was based on mapping memory around between programs, any "cache miss" made IPC calls slow.