Editing Pipeline (computing) (section)

== New technologies ==
It's true that in recent years the demands on applications and their underlying hardware have been significant. For example, building pipelines with single node applications that trawl through the data row by row is no longer feasible with the volume and variety of [[big data]]. However, with the advent of data analytics engines such as [[Hadoop]], or more recently [[Apache Spark]], it's been possible to distribute large datasets across multiple processing nodes, allowing applications to reach heights of efficiency several hundred times greater than was thought possible before. The effect of this today is that even a mid-level PC using distributed processing in this fashion can handle the building and running of big data pipelines.<ref>[https://www.datapipelines.com/blog/what-is-a-data-pipeline/ What is a Data Pipeline?] Published by Data Pipelines, retrieved 11 March 2021</ref>