Editing Pipeline (Unix)

{{Short description|Mechanism for inter-process communication using message passing}}
{{About|the original implementation for shells|software pipelines in general|Pipeline (software)}}
[[File:Pipeline.svg|thumb|280px|A pipeline of three program processes run on a text terminal]]
In [[Unix-like]] computer [[operating system]]s, a '''pipeline''' is a mechanism for [[inter-process communication]] using message passing. A pipeline is a set of [[process (computing)|process]]es chained together by their [[standard streams]], so that the output text of each process (''[[stdout]]'') is passed directly as input (''[[stdin]]'') to the next one. The second process is started as the first process is still executing, and they are executed [[concurrency (computer science)|concurrently]].

The concept of pipelines was championed by [[Douglas McIlroy]] at [[Unix]]'s ancestral home of [[Bell Labs]], during the development of Unix, shaping its [[Unix philosophy|toolbox philosophy]]. It is named by analogy to a physical [[pipeline transport|pipeline]]. A key feature of these pipelines is their "hiding of internals". This in turn allows for more clarity and simplicity in the system.

The '''pipes''' in the pipeline are [[anonymous pipe]]s (as opposed to [[named pipe]]s), where data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed. The standard [[Shell (computing)|shell]] syntax for [[anonymous pipe]]s is to list multiple commands, separated by vertical bars ("[[Pipe (character)|pipes]]" in common Unix verbiage).

==History==
The pipeline concept was invented by [[Douglas McIlroy]]<ref name="crea">{{cite web |title=The Creation of the UNIX Operating System |url=http://csdev.cas.upm.edu.ph/~pfalcone/compsci/unix/unix-history1.html |archive-url=https://web.archive.org/web/20040914025332/http://csdev.cas.upm.edu.ph/~pfalcone/compsci/unix/unix-history1.html |archive-date=September 14, 2004 |publisher=Bell Labs}}</ref> and first described in the [[man pages]] of [[Version 3 Unix]].<ref name="reader">{{cite tech report|first1=M. D.|last1=McIlroy|author-link1=Doug McIlroy|year=1987|url=http://www.cs.dartmouth.edu/~doug/reader.pdf|title=A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986|series=CSTR|number=139|institution=Bell Labs}}</ref><ref>{{cite tech report|date=February 1973|vauthors=Thompson K, Ritchie DM|author-link1=Ken Thompson|author-link2=Dennis Ritchie|url=https://dspinellis.github.io/unix-v3man/v3man.pdf#page=178|title=UNIX Programmer's Manual Third Edition|institution=Bell Labs|edition=3rd|page=178}}</ref> McIlroy noticed that much of the time [[Unix shell|command shells]] passed the output file from one program as input to another. The concept of pipelines was championed by [[Douglas McIlroy]] at [[Unix]]'s ancestral home of [[Bell Labs]], during the development of Unix, shaping its [[Unix philosophy|toolbox philosophy]].<ref>{{cite web |last=Mahoney |first=Michael S. |title=The Unix Oral History Project: Release.0, The Beginning |url=http://www.princeton.edu/~hos/Mahoney/expotape.htm |quote=McIlroy: It was one of the only places where I very nearly exerted managerial control over Unix, was pushing for those things, yes.}}</ref><ref>{{cite web |title=Prophetic Petroglyphs |url=http://www.bell-labs.com/usr/dmr/www/mdmpipe.html |url-status=live |archive-url=https://web.archive.org/web/19990508221104/http://cm.bell-labs.com/cm/cs/who/dmr/mdmpipe.html |archive-date=8 May 1999 |access-date=22 May 2022 |website=www.bell-labs.com}}</ref>

His ideas were implemented in 1973 when ("in one feverish night", wrote McIlroy) [[Ken Thompson]] added the <code>pipe()</code> system call and pipes to the [[Thompson shell|shell]] and several utilities in Version 3 Unix. "The next day", McIlroy continued, "saw an unforgettable orgy of one-liners as everybody joined in the excitement of plumbing." McIlroy also credits Thompson with the <code>|</code> notation, which greatly simplified the description of pipe syntax in [[Version 4 Unix|Version 4]].<ref>{{cite web |date=August 23, 2006 |orig-date=Created April 29, 2004 |title=Pipes: A Brief Introduction |url=http://www.linfo.org/pipe.html |access-date=January 7, 2024 |publisher=The Linux Information Project}}</ref>{{r|reader}}

Although developed independently, Unix pipes are related to, and were preceded by, the 'communication files' developed by Ken Lochner <ref>{{cite web |title=Dartmouth Timesharing |url=http://www.cs.rit.edu/~swm/history/DTSS.doc |access-date=January 7, 2024 |website=[[Rochester Institute of Technology]] |format=DOC}}</ref> in the 1960s for the [[Dartmouth Time-Sharing System]].<ref>{{cite web |title=Data |url=https://www.bell-labs.com/usr/dmr/www/hist.html |url-status=live |archive-url=https://web.archive.org/web/19990220165130/http://cm.bell-labs.com/who/dmr/hist.html |archive-date=20 February 1999 |access-date=22 May 2022 |website=www.bell-labs.com}}</ref>

===Other operating systems===
{{Main article|Pipeline (software)}}

This feature of [[Unix]] was borrowed by other operating systems, such as [[MS-DOS]] and the [[CMS Pipelines]] package on [[VM/CMS]] and [[MVS]], and eventually came to be designated the [[pipeline (software)|pipes and filters design pattern]] of [[software engineering]].

=== Further concept development ===
In [[Tony Hoare|Tony Hoare's]] [[communicating sequential processes]] (CSP), McIlroy's pipes are further developed.<ref>{{cite web |last=Cox |first=Russ |title=Bell Labs and CSP Threads |url=https://swtch.com/~rsc/thread/ |access-date=January 7, 2024 |website=Swtchboard}}</ref>

==Implementation==
A pipeline mechanism is used for [[inter-process communication]] using message passing. A pipeline is a set of [[process (computing)|process]]es chained together by their [[standard streams]], so that the output text of each process (''[[stdout]]'') is passed directly as input (''[[stdin]]'') to the next one. The second process is started as the first process is still executing, and they are executed [[concurrency (computer science)|concurrently]]. It is named by analogy to a physical [[pipeline transport|pipeline]]. A key feature of these pipelines is their "hiding of internals".<ref>Ritchie & Thompson, 1974</ref> This in turn allows for more clarity and simplicity in the system.

In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected<!--details please: buffering-->, and managed by the [[Scheduling (computing)|scheduler]] together with all other processes running on the machine. <!--death of a process, broken pipes, signal handling, etc.-->  An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of [[Buffer (computer science)|buffering]]: for example a sending program may produce 5000 [[bytes]] per [[second]], and a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in the buffer. When the receiving program is ready to read data, the next program in the pipeline reads from the buffer. If the buffer is filled, the sending program is stopped (blocked) until at least some data is removed from the buffer by the receiver.  In Linux, the size of the buffer is 65,536 bytes (64KiB). An open source third-party filter called [https://linux.die.net/man/1/bfr bfr] is available to provide larger buffers if required.

===Network pipes===
Tools like [[netcat]] and [[socat]] can connect pipes to TCP/IP [[Internet socket|sockets]].

== Pipelines in command line interfaces ==
{{anchor|pipe character}}
All widely used Unix shells have a special syntax construct for the creation of pipelines. In all usage one writes the commands in sequence, separated by the [[ASCII]] [[vertical bar]] character <code>|</code> (which, for this reason, is often called "pipe character"). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of [[buffer (computer science)|buffer]] storage).

The pipeline uses [[anonymous pipe]]s. For anonymous pipes, data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed; this differs from [[named pipe]]s, where messages are passed to or from a pipe that is named by making it a file, and remains after the processes are completed. The standard [[Shell (computing)|shell]] syntax for [[anonymous pipe]]s is to list multiple commands, separated by [[vertical bar]]s ("pipes" in common Unix verbiage): <syntaxhighlight lang="bash" style="width: 50%">command1 | command2 | command3</syntaxhighlight>

For example, to list files in the current directory ({{mono|[[ls]]}}), retain only the lines of {{mono|ls}} output containing the string {{mono|"key"}} ({{mono|[[grep]]}}), and view the result in a scrolling page ({{mono|[[Less (Unix)|less]]}}), a user types the following into the command line of a terminal:

<syntaxhighlight lang="bash" style="width: 50%">ls -l | grep key | less</syntaxhighlight>

The command <code>ls -l</code> is executed as a process, the output (stdout) of which is piped to the input (stdin) of the process for <code>grep key</code>; and likewise for the process for <code>less</code>. Each [[process (computing)|process]] takes input from the previous process and produces output for the next process via  ''[[standard streams]]''. Each <code>|</code> tells the shell to connect the standard output of the command on the left to the standard input of the command on the right by an [[inter-process communication]] mechanism called an [[anonymous pipe|(anonymous) pipe]], implemented in the operating system. Pipes are unidirectional; data flows through the pipeline from left to right.<!-- Shouldn't this be in the shell article? As with all shell commands, a command line can be extended over multiple physical lines by using a '\' character before the newline. -->

=== Example ===
Below is an example of a pipeline that implements a kind of [[spell checker]] for the [[World Wide Web|web]] resource indicated by a [[Uniform Resource Locator|URL]]. An explanation of what it does follows.

<syntaxhighlight lang="bash" line="">
curl 'https://en.wikipedia.org/wiki/Pipeline_(Unix)' |
sed 's/[^a-zA-Z ]/ /g' |
tr 'A-Z ' 'a-z\n' |
grep '[a-z]' |
sort -u |
comm -23 - <(sort /usr/share/dict/words) |
less
</syntaxhighlight>

# '''<code>[[CURL|curl]]</code>''' obtains the [[HTML]] contents of a web page (could use <code>[[wget]]</code> on some systems).
# '''<code>[[sed]]</code>''' replaces all characters (from the web page's content) that are not spaces or letters, with spaces. ([[Newline]]s are preserved.)
# '''<code>[[tr (program)|tr]]</code>''' changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
# '''<code>[[grep]]</code>''' includes only lines that contain at least one lowercase [[alphabetical]] character (removing any blank lines).
# '''<code>[[Sort (Unix)|sort]]</code>''' sorts the list of 'words' into alphabetical order, and the <code>-u</code> switch removes duplicates.
# '''<code>[[comm (Unix)|comm]]</code>''' finds lines in common between two files, <code>-23</code> suppresses lines unique to the second file, and those that are common to both, leaving only those that are found only in the first file named.  The <code>-</code> in place of a filename causes <code>comm</code> to use its standard input (from the pipe line in this case). <code>sort /usr/share/dict/words</code> sorts the contents of the <code>words</code> file alphabetically, as <code>comm</code> expects, and <code>&lt;( ... )</code> outputs the results to a temporary file (via [[process substitution]]), which <code>comm</code> reads. The result is a list of words (lines) that are not found in /usr/share/dict/words.
# '''<code>[[less (Unix)|less]]</code>''' allows the user to page through the results.

===Error stream===
By default, the [[standard error stream]]s ("[[stderr]]") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the [[system console|console]]. However, many shells have additional syntax for changing this behavior. In the [[C shell|csh]] shell, for instance, using <code>|&</code> instead of <code>|</code> signifies that the standard error stream should also be merged with the standard output and fed to the next process. The [[Bash (Unix shell)|Bash]] shell can also merge standard error with <code>|&</code> since version 4.0<ref>{{cite web |title=Bash release notes |url=https://tiswww.case.edu/php/chet/bash/NEWS |access-date=2017-06-14 |website=tiswww.case.edu}}</ref> or using <code>2>&1</code>, as well as redirect it to a different file.

===Pipemill===
<!-- section header used in redirect -->
In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process.  Thus the shell itself is doing no direct processing of the data flowing through the pipeline.

However, it's possible for the shell to perform processing directly, using a so-called '''mill''' or '''pipemill''' (since a <code lang="bash">while</code> command is used to "mill" over the results from the initial command). This construct generally looks something like:
<syntaxhighlight lang="bash">
command | while read -r var1 var2 ...; do
    # process each line, using variables as parsed into var1, var2, etc
    # (note that this may be a subshell: var1, var2 etc will not be available
    # after the while loop terminates; some shells, such as zsh and newer
    # versions of Korn shell, process the commands to the left of the pipe
    # operator in a subshell)
    done
</syntaxhighlight>

Such pipemill may not perform as intended if the body of the loop includes commands, such as <code>cat</code> and <code>ssh</code>, that read from <code>[[stdin]]</code>:<ref>{{cite web |date=6 March 2012 |title=Shell Loop Interaction with SSH |url=http://72.14.189.113/howto/shell/while-ssh/ |url-status=dead |archive-url=https://web.archive.org/web/20120306135439/http://72.14.189.113/howto/shell/while-ssh/ |archive-date=6 March 2012}}</ref> on the loop's first iteration, such a program (let's call it ''the drain'') will read the remaining output from <code>command</code>, and the loop will then terminate (with results depending on the specifics of the drain).  There are a couple of possible ways to avoid this behavior.  First, some drains support an option to disable reading from <code>stdin</code> (e.g. <code>ssh -n</code>).  Alternatively, if the drain does not ''need'' to read any input from <code>stdin</code> to do something useful, it can be given <code>&lt; /dev/null</code> as input.

As all components of a pipe are run in parallel, a shell typically forks a subprocess (a subshell) to handle its contents, making it impossible to propagate variable changes to the outside shell environment. To remedy this issue, the "pipemill" can instead be fed from a [[here document]] containing a [[command substitution]], which waits for the pipeline to finish running before milling through the contents. Alternatively, a [[named pipe]] or a [[process substitution]] can be used for parallel execution. [[GNU bash]] also has a {{code|lastpipe}} option to disable forking for the last pipe component.<ref>{{cite web |author=John1024 |title=How can I store the "find" command results as an array in Bash |url=https://stackoverflow.com/a/23357277 |website=Stack Overflow}}</ref>

==Creating pipelines programmatically==
Pipelines can be created under program control. The Unix <code>pipe()</code> [[system call]] asks the operating system to construct a new [[anonymous pipe]] object. This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end. The pipe ends appear to be normal, anonymous [[file descriptor]]s, except that they have no ability to seek.

To avoid [[Deadlock (computer science)|deadlock]] and exploit parallelism, the Unix process with one or more new pipes will then, generally, call <code>[[fork (system call)|fork()]]</code> to create new processes. Each process will then close the end(s) of the pipe that it will not be using before producing or consuming any data. Alternatively, a process might create new [[pthreads|threads]] and use the pipe to communicate between them.

''[[Named pipe]]s'' may also be created using <code>mkfifo()</code> or <code>[[mknod]]()</code> and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with <code>[[tee (command)|tee]]</code>.

== Popular culture ==
<!-- Image with inadequate rationale removed: [[File:Automator Icon.png|thumb|75px|right|Apple Automator logo]]  -->
The robot in the icon for [[Apple Computer|Apple]]'s [[Automator (software)|Automator]], which also uses a pipeline concept to chain repetitive commands together, holds a pipe in homage to the original Unix concept.

==See also==
* [[Everything is a file]] – describes one of the defining features of Unix; pipelines act on "files" in the Unix sense
* [[Anonymous pipe]] – a FIFO structure used for interprocess communication
* [[GStreamer]] – a pipeline-based multimedia framework
* [[CMS Pipelines]]
* [[Iteratee]]
* [[Named pipe]] – persistent pipes used for interprocess communication
* [[Process substitution]] — shell syntax for connecting multiple pipes to a process
* [[GNU parallel]]
* [[Pipeline (computing)]] – other computer-related pipelines
* [[Redirection (computing)]]
* [[Tee (command)]] – a general command for tapping data from a pipeline
* [[XML pipeline]] – for processing of XML files
* [[xargs]]

==References==
{{reflist|30em}}

{{Refbegin}}
* [[Sal Soghoian]] on [[MacBreak]] Episode 3 "Enter the Automatrix"
{{Refend}}

==External links==
* [https://www.bell-labs.com/usr/dmr/www/hist.html#pipes History of Unix pipe notation] {{Webarchive|url=https://web.archive.org/web/20150408054606/http://cm.bell-labs.com/cm/cs/who/dmr/hist.html#pipes |date=2015-04-08 }}
** [http://doc.cat-v.org/unix/pipes/ Doug McIlroy's original 1964 memo], proposing the concept of a pipe for the first time
* {{man|sh|pipe|SUS|create an interprocess channel}}
* [http://www.linfo.org/pipe.html Pipes: A Brief Introduction] by The Linux Information Project (LINFO)
* [http://www.softpanorama.org/Scripting/pipes.shtml Unix Pipes – powerful and elegant programming paradigm (Softpanorama)]
* [http://en.wikibooks.org/w/index.php?title=Ad_Hoc_Data_Analysis_From_The_Unix_Command_Line ''Ad Hoc Data Analysis From The Unix Command Line'' at Wikibooks] – Shows how to use pipelines composed of simple filters to do complex data analysis.
* [https://web.archive.org/web/20170911203245/https://debian-administration.org/article/145/use_and_abuse_of_pipes_with_audio_data Use And Abuse Of Pipes With Audio Data] – Gives an introduction to using and abusing pipes with netcat, nettee and fifos to play audio across a network.
* [https://stackoverflow.com/questions/19122/bash-pipe-handling stackoverflow.com] – A Q&A about bash pipeline handling.

[[Category:Inter-process communication]]
[[Category:Unix]]

[[sv:Vertikalstreck#Datavetenskap]]