Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Dataflow programming
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Computer programming paradigm}} In [[computer programming]], '''dataflow programming''' is a [[programming paradigm]] that models a program as a [[directed graph]] of the data flowing between operations, thus implementing [[dataflow]] principles and architecture.<ref name="sigops">{{cite web |last1=Schwarzkopf |first1=Malte |title=The Remarkable Utility of Dataflow Computing |url=https://www.sigops.org/2020/the-remarkable-utility-of-dataflow-computing/ |website=ACM SIGOPS |access-date=31 July 2022 |date=7 March 2020}}</ref> Dataflow [[programming language]]s share some features of [[functional language]]s, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term ''datastream'' instead of ''[[dataflow]]'' to avoid confusion with dataflow computing or [[dataflow architecture]], based on an indeterministic machine paradigm. Dataflow programming was pioneered by [[Jack Dennis]] and his graduate students at MIT in the 1960s. == Considerations == Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential,<ref name=advances>{{cite journal|last=Johnston|first=Wesley M. |author2=J.R. Paul Hanna |author3=Richard J. Millar|title=Advances in Dataflow Programming Languages|journal=ACM Computing Surveys|date=March 2004|volume=36|pages=1β34|url=http://www.cs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/DataFlowProgrammingLanguages.pdf|access-date=15 August 2013|doi=10.1145/1013208.1013209|s2cid=5257722 }}</ref>{{rp|p.3}} procedural,<ref name=lucid>{{cite book|last=Wadge|first=William W.|title=Lucid, the Dataflow Programming Language|year=1985|publisher=Academia Press|isbn=9780127296500|url=https://archive.org/details/luciddataflowpro00wadg_0|edition=illustrated|author2=Edward A. Ashcroft|access-date=15 August 2013|url-access=registration}}</ref> [[control flow]]<ref name=lucid/> (indicating that the program chooses a specific path), or [[imperative programming]]. The program focuses on commands, in line with the [[John von Neumann|von Neumann]]<ref name=advances/>{{rp|p.3}} vision of sequential programming, where data is normally "at rest".<ref name=lucid/>{{rp|p.7}} In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like [[black box]]es.<ref name=lucid/>{{rp|p.2}} An operation runs as soon as all of its inputs become valid.<ref name=labview>{{cite web|title=Dataflow Programming Basics|url=http://www.ni.com/gettingstarted/labviewbasics/dataflow.htm|work=Getting Started with NI Products|publisher=National Instruments Corporation|access-date=15 August 2013}}</ref> Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.<ref name=advances/>{{rp|p.3}}<ref>{{cite web|last=Harter|first=Richard|title=Data Flow languages and programming - Part I|url=http://richardhartersworld.com/cri/2009/dataflow1.html|work=Richard Harter's World|access-date=15 August 2013|archive-url=https://web.archive.org/web/20151208165213/http://richardhartersworld.com/cri/2009/dataflow1.html|archive-date=8 December 2015|url-status=dead}}</ref> <ref>{{cite web|title=Why Dataflow Programming Languages are Ideal for Programming Parallel Hardware|url=http://www.ni.com/white-paper/6098/en/|work=Multicore Programming Fundamentals Whitepaper Series|publisher=National Instruments Corporation|access-date=15 August 2013}}</ref> ===State=== One of the key concepts in computer programming is the idea of [[State (computer science)|state]], essentially a snapshot of various conditions in the system. Most programming languages require a considerable amount of state information, which is generally hidden from the programmer. Often, the computer itself has no idea which piece of information encodes the enduring state. This is a serious problem, as the state information needs to be shared across multiple processors in [[Parallel computing|parallel processing]] machines. Most languages force the programmer to add extra code to indicate which data and parts of the code are important to the state. This code tends to be both expensive in terms of performance, as well as difficult to read or debug. [[Explicit parallelism]] is one of the main reasons for the poor performance of [[Enterprise Java Beans]] when building data-intensive, non-[[Online transaction processing|OLTP]] applications.{{Citation needed|date=March 2017}} Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an [[assembly line]], each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time. ===Representation=== Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the [[command line]] parameters, and illustrate how that data is used and modified. The flow of data is explicit, often visually illustrated as a line or pipe. In terms of encoding, a dataflow program might be implemented as a [[hash table]], with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid. For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's [[Run-time system|runtime]]. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime. === Incremental updates === Some recent dataflow libraries such as [[Differential Dataflow|Differential]]/[[Timely Dataflow|Timely]] Dataflow have used [[incremental computing]] for much more efficient data processing.<ref name="sigops" /><ref name="differential-paper">{{cite web |last1=McSherry |first1=Frank |last2=Murray |first2=Derek |last3=Isaacs |first3=Rebecca |last4=Isard |first4=Michael |title=Differential dataflow |website=[[Microsoft]] |url=https://www.microsoft.com/en-us/research/publication/differential-dataflow/ |access-date=31 July 2022 |date=5 January 2013}}</ref><ref name="differential-github">{{cite web |title=Differential Dataflow |url=https://github.com/TimelyDataflow/differential-dataflow |publisher=Timely Dataflow |access-date=31 July 2022 |date=30 July 2022}}</ref> ==History== A pioneer dataflow language was BLOck DIagram ([[BLODI]]), published in 1961 by [[John Larry Kelly, Jr.]], Carol Lochbaum and [[Victor A. Vyssotsky]] for specifying [[sampled data systems]].<ref name="Kelly1961">{{Cite journal|journal=Bell System Tech. J.|title=A block diagram compiler|author1=John L. Kelly Jr. |author2=Carol Lochbaum |author3=V. A. Vyssotsky |volume=40|issue=3|year=1961|pages=669β678|doi=10.1002/j.1538-7305.1961.tb03236.x}}</ref> A BLODI specification of functional units (amplifiers, adders, delay lines, etc.) and their interconnections was compiled into a single loop that updated the entire system for one clock tick. In a 1966 Ph.D. thesis, ''The On-line Graphical Specification of Computer Procedures'',<ref name="sutherland1966">{{cite thesis |last= Sutherland |first= William Robert |author-link=Bert Sutherland |date= January 1966 |title= The on-line graphical specification of computer procedures |publisher= [[Massachusetts Institute of Technology|MIT]] |hdl= 1721.1/13474 |type= PhD thesis |url= https://dspace.mit.edu/handle/1721.1/13474 |access-date=2022-08-25}}</ref> [[Bert Sutherland]] created one of the first graphical dataflow programming frameworks in order to make parallel programming easier. Subsequent dataflow languages were often developed at the large [[supercomputer]] labs. POGOL, an otherwise conventional data-processing language developed at [[NSA]], compiled large-scale applications composed of multiple file-to-file operations, e.g. merge, select, summarize, or transform, into efficient code that eliminated the creation of or writing to intermediate files to the greatest extent possible.<ref>{{Cite conference | book-title=POPL '73: Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages | author=Gloria Lambert | title=Large scale file processing: POGOL | publisher=[[Association for Computing Machinery|ACM]] | year=1973 |pages=226β234}}</ref> [[SISAL]], a popular dataflow language developed at [[Lawrence Livermore National Laboratory]], looks like most statement-driven languages, but variables should be [[single assignment|assigned once]]. This allows the [[compiler]] to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including [[SAC programming language|SAC]], ''Single Assignment C'', which tries to remain as close to the popular [[C (programming language)|C programming language]] as possible. The United States Navy funded development of signal processing graph notation (SPGN) and ACOS starting in the early 1980s. This is in use on a number of platforms in the field today.<ref>Underwater Acoustic Data Processing, Y.T. Chan</ref> A more radical concept is [[Prograph]], in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Prograph was originally written on the [[Apple Macintosh|Macintosh]], which remained single-processor until the introduction of the [[DayStar Digital|DayStar Genesis MP]] in 1996.{{citation needed|date=September 2023}} There are many hardware architectures oriented toward the efficient implementation of dataflow programming models.{{vague|date=September 2023}} MIT's tagged token dataflow architecture was designed by [[Greg Papadopoulos]].{{undue inline|date=September 2023}} Data flow has been proposed{{by whom|date=September 2023}} as an abstraction for specifying the global behavior of distributed system components: in the [[live distributed object]]s programming model, [[distributed data flow]]s are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages{{OR|date=January 2025}}. ==Languages== {{more citations needed|section|date=February 2019}} Dataflow programming languages include: *[[CΓ©u (programming language)]] *[[ETAS Group#ASCET|ASCET]] *[[AviSynth]] scripting language, for video processing *[[BMDFM]] Binary Modular Dataflow Machine *[[CAL Actor Language|CAL]] *[[Cuneiform (programming language)|Cuneiform]], a [[Functional Programming|functional]] workflow language. *[[CMS Pipelines]] *[[Hume (programming language)|Hume]] *[[Joule (programming language)|Joule]] *[[Keysight VEE]] *[[KNIME]] is a free and open-source data analytics, reporting and integration platform *[[LabVIEW]], G<ref name=labview /> *[[Linda (coordination language)|Linda]] *[[Lucid (programming language)|Lucid]]<ref name=lucid/> *[[Lustre (programming language)|Lustre]] *[[Max/MSP]] *[[Microsoft Visual Programming Language]] - A component of [[Microsoft Robotics Studio]] designed for [[robotics]] programming *[[Nextflow]]: a workflow language *[[Orange (software)|Orange]] - An open-source, visual programming tool for [[data mining]], statistical [[data analysis]], and [[machine learning]]. *[[Oz (programming language)|Oz]] now also distributed since 1.4.0 *[[Pipeline Pilot]] *[[Prograph]] *[[Pure Data]] *[[Quartz Composer]] - Designed by [[Apple Inc.|Apple]]; used for graphic animations and effects *[[SAC programming language|SAC]] Single assignment C *[[SIGNAL programming language|SIGNAL]] (a dataflow-oriented synchronous language enabling multi-clock specifications) *[[Simulink]] *[[SISAL]] *[[SystemVerilog]] - A hardware description language *[[Verilog]] - A hardware description language absorbed into the SystemVerilog standard in 2009 *[[VisSim]] - A block diagram language for simulation of dynamic systems and automatic firmware generation *[[VHDL]] - A hardware description language *Wapice IOT-TICKET implements an unnamed visual dataflow programming language for [[Internet_of_things|IoT]] data analysis and reporting. *[[XEE (Starlight)]] XML engineering environment *[[XProc]] == Libraries == * [[Apache Beam]]: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink, Google Dataflow etc.) * [[Apache Flink]]: Java/Scala library that allows streaming (and batch) computations to be run atop a distributed Hadoop (or other) cluster * [[Apache Spark]] * [[SystemC]]: Library for C++, mainly aimed at hardware design. * [[TensorFlow]]: A machine-learning library based on dataflow programming. ==See also== * [[Actor model]] * [[Data-driven programming]] * [[Digital signal processing]] * [[Event-driven programming]] * [[Flow-based programming]] * [[Functional reactive programming]] * [[Glossary of reconfigurable computing]] * [[High-performance reconfigurable computing]] * [[Incremental computing]] * [[Parallel programming model]] * [[Partitioned global address space]] * [[Pipeline (Unix)]] * [[Quantum circuit]] * [[Signal programming]] * [[Stream processing]] * [[Yahoo Pipes]] ==References== {{Reflist}} ==External links== *[https://web.archive.org/web/20131017033905/http://deepfriedcode.com/ Book: Dataflow and Reactive Programming Systems] *[http://www.codeproject.com/Articles/107121/Basics-of-Dataflow-Programming-in-F-and-C Basics of Dataflow Programming in F# and C#] * [http://paginas.fe.up.pt/~prodei/dsie12/papers/paper_17.pdf Dataflow Programming - Concept, Languages and Applications] *[http://ptolemy.eecs.berkeley.edu/publications/papers/87/staticscheduling/ Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing] * [http://drdobbs.com/database/231400148 Handling huge loads without adding complexity] The basic concepts of dataflow programming, Dr. Dobb's, Sept. 2011 {{Programming paradigms navbox}} {{Types of programming languages}} [[Category:Concurrent programming languages]] [[Category:Programming paradigms]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:By whom
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite thesis
(
edit
)
Template:Cite web
(
edit
)
Template:More citations needed
(
edit
)
Template:OR
(
edit
)
Template:Programming paradigms navbox
(
edit
)
Template:Reflist
(
edit
)
Template:Rp
(
edit
)
Template:Short description
(
edit
)
Template:Types of programming languages
(
edit
)
Template:Undue inline
(
edit
)
Template:Vague
(
edit
)