Editing Dataflow programming (section)

== Considerations ==
Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential,<ref name=advances>{{cite journal|last=Johnston|first=Wesley M. |author2=J.R. Paul Hanna |author3=Richard J. Millar|title=Advances in Dataflow Programming Languages|journal=ACM Computing Surveys|date=March 2004|volume=36|pages=1–34|url=http://www.cs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/DataFlowProgrammingLanguages.pdf|access-date=15 August 2013|doi=10.1145/1013208.1013209|s2cid=5257722 }}</ref>{{rp|p.3}}
procedural,<ref name=lucid>{{cite book|last=Wadge|first=William W.|title=Lucid, the Dataflow Programming Language|year=1985|publisher=Academia Press|isbn=9780127296500|url=https://archive.org/details/luciddataflowpro00wadg_0|edition=illustrated|author2=Edward A. Ashcroft|access-date=15 August 2013|url-access=registration}}</ref>
[[control flow]]<ref name=lucid/> (indicating that the program chooses a specific path), or [[imperative programming]]. The program focuses on commands, in line with the [[John von Neumann|von Neumann]]<ref name=advances/>{{rp|p.3}} vision of sequential programming, where data is normally "at rest".<ref name=lucid/>{{rp|p.7}}

In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like [[black box]]es.<ref name=lucid/>{{rp|p.2}} An operation runs as soon as all of its inputs become valid.<ref name=labview>{{cite web|title=Dataflow Programming Basics|url=http://www.ni.com/gettingstarted/labviewbasics/dataflow.htm|work=Getting Started with NI Products|publisher=National Instruments Corporation|access-date=15 August 2013}}</ref> Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.<ref name=advances/>{{rp|p.3}}<ref>{{cite web|last=Harter|first=Richard|title=Data Flow languages and programming - Part I|url=http://richardhartersworld.com/cri/2009/dataflow1.html|work=Richard Harter's World|access-date=15 August 2013|archive-url=https://web.archive.org/web/20151208165213/http://richardhartersworld.com/cri/2009/dataflow1.html|archive-date=8 December 2015|url-status=dead}}</ref>
<ref>{{cite web|title=Why Dataflow Programming Languages are Ideal for Programming Parallel Hardware|url=http://www.ni.com/white-paper/6098/en/|work=Multicore Programming Fundamentals Whitepaper Series|publisher=National Instruments Corporation|access-date=15 August 2013}}</ref>

===State===
One of the key concepts in computer programming is the idea of [[State (computer science)|state]], essentially a snapshot of various conditions in the system. Most programming languages require a considerable amount of state information, which is generally hidden from the programmer. Often, the computer itself has no idea which piece of information encodes the enduring state. This is a serious problem, as the state information needs to be shared across multiple processors in [[Parallel computing|parallel processing]] machines. Most languages force the programmer to add extra code to indicate which data and parts of the code are important to the state. This code tends to be both expensive in terms of performance, as well as difficult to read or debug. [[Explicit parallelism]] is one of the main reasons for the poor performance of [[Enterprise Java Beans]] when building data-intensive, non-[[Online transaction processing|OLTP]] applications.{{Citation needed|date=March 2017}}

Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an [[assembly line]], each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time.

===Representation===
Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the [[command line]] parameters, and illustrate how that data is used and modified. The flow of data is explicit, often visually illustrated as a line or pipe.

In terms of encoding, a dataflow program might be implemented as a [[hash table]], with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid.

For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's [[Run-time system|runtime]]. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.

=== Incremental updates ===
Some recent dataflow libraries such as [[Differential Dataflow|Differential]]/[[Timely Dataflow|Timely]] Dataflow have used [[incremental computing]] for much more efficient data processing.<ref name="sigops" /><ref name="differential-paper">{{cite web |last1=McSherry |first1=Frank |last2=Murray |first2=Derek |last3=Isaacs |first3=Rebecca |last4=Isard |first4=Michael |title=Differential dataflow |website=[[Microsoft]] |url=https://www.microsoft.com/en-us/research/publication/differential-dataflow/ |access-date=31 July 2022 |date=5 January 2013}}</ref><ref name="differential-github">{{cite web |title=Differential Dataflow |url=https://github.com/TimelyDataflow/differential-dataflow |publisher=Timely Dataflow |access-date=31 July 2022 |date=30 July 2022}}</ref>