Editing Batch processing

{{short description|Method of running software}}
Computerized '''batch processing''' is a method of running software programs called '''[[Job (computing)|jobs]]''' in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch.  Batches may automatically be run at scheduled times as well as being run contingent on the availability of computer resources.

==History==
The term "batch processing" originates in the traditional classification of [[methods of production]] as [[job production]] (one-off production), [[batch production]] (production of a "batch" of multiple items at once, one stage at a time), and [[flow production]] (mass production, all stages in process at once).

===Early history===
Early computers were capable of running only one program at a time. Each user had sole control of the machine for a scheduled period of time. They would arrive at the computer with program and data, often on [[punch card|punched paper card]]s and magnetic or paper tape, and would load their program, run and debug it, and carry off their output when done.

As computers became faster the setup and takedown time became a larger percentage of available computer time. Programs called ''monitors'', the forerunners of [[operating system]]s, were developed which could process a series, or "batch", of programs, often from [[magnetic-tape data storage|magnetic tape]] prepared offline. The monitor would be loaded into the computer and run the first job of the batch. At the end of the job it would regain control and load and run the next until the batch was complete. Often the output of the batch would be written to magnetic tape and printed or punched offline. Examples of monitors were IBM's ''Fortran Monitor System'', SOS (Share Operating System), and finally [[IBM 7090/94 IBSYS|IBSYS]] for IBM's [[IBM 700/7000 series|709x]] systems in 1960.<ref>{{cite web |title=The Direct Couple for the IBM 7090 |website=SoftwarePreservationGroup.org |url=http://www.softwarepreservation.org/projects/os/dc.html |quote=IBSYS was an operating system for the 7090 that evolved from SOS (SHARE Operating System)}}</ref><ref>{{cite web |title=History of Operating Systems |url=https://courses.cs.washington.edu/courses/cse451/16wi/readings/lecture_readings/LCM_OperatingSystemsTimeline_Color_acd_newsize.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://courses.cs.washington.edu/courses/cse451/16wi/readings/lecture_readings/LCM_OperatingSystemsTimeline_Color_acd_newsize.pdf |archive-date=2022-10-09}}</ref>

===Third-generation systems===
{{clarify|reason=Identify what became ubiquitous in the 3rd generation but originated in the 2nd.|text=[[Third-generation computer]]s|date=March 2022}}<ref>{{cite news  |newspaper=[[The Register]]
   |url=https://www.theregister.co.uk/2014/04/07/ibm_s_360_50_anniversary
   |title=Why won't you DIE? IBM's S/360 and its legacy at 50
   |date=April 7, 2014}}</ref> capable of [[multiprogramming]] began to appear in the 1960s. Instead of running one batch job at a time, these systems can have multiple batch programs running at the same time in order to keep the system as busy as possible. One or more programs might be awaiting input, one actively running on the CPU, and others generating output. Instead of offline input and output, programs called [[spooler]]s read jobs from cards, disk, or remote terminals and place them in a [[batch queue|job queue]] to be run. In order to prevent [[deadlock (computer science)|deadlock]]s the [[job scheduler]] needs to know each job's resource requirements—memory, magnetic tapes, mountable [[disk storage|disks]], etc., so various scripting languages were developed to supply this information in a structured way. Probably the most well-known is IBM's ''[[Job Control Language]]'' (JCL). Job schedulers select jobs to run according to a variety of criteria, including priority, memory size, etc.  [[Remote job entry|Remote batch]] is a procedure for submitting batch jobs from remote terminals, often equipped with a [[punch card reader]] and a [[line printer]].<ref>{{cite web  |website=BitSavers
   |url=http://bitsavers.org/pdf/cdc/terminal/82128000_200_User_Terminal_Hardware_Reference_Jul68.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://bitsavers.org/pdf/cdc/terminal/82128000_200_User_Terminal_Hardware_Reference_Jul68.pdf |archive-date=2022-10-09 |url-status=live
   |title=CDC User Terminal Hardware Reference manual}}</ref> Sometimes [[asymmetric multiprocessing]] is used to spool batch input and output for one or more large computers using an attached smaller and less-expensive system, as in the IBM System/360 [[Attached Support Processor]].{{efn|Use of satellite computers for this purpose began earlier, e.g., in IBM [[IBM 7090#7094/7044 Direct Coupled System|7094/7044 Direct Coupled System]].}}

===Later history===
[[File:CDC NOS batch file.jpg|right|thumb|[[Control Data Corporation|CDC]] [[NOS (operating system)|NOS]] batch file to get the file STARTRK and output it to the card punch]]
The first general purpose time sharing system, [[Compatible Time-Sharing System]] (CTSS), was compatible with batch processing.  This facilitated transitioning from batch processing to [[interactive computing]].<ref>{{cite web |url=https://multicians.org/thvv/compatible-time-sharing-system.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://multicians.org/thvv/compatible-time-sharing-system.pdf |archive-date=2022-10-09 |url-status=live |title=Compatible Time-Sharing System (1961-1973): Fiftieth Anniversary Commemorative Overview |editor-last1=Walden |editor-first1=David |editor-last2=Van Vleck |editor-first2=Tom |editor2-link=Tom Van Vleck |date=2011 |publisher=IEEE Computer Society |access-date=February 20, 2022 |quote=CTSS was called “compatible” in the sense that [[History of IBM mainframe operating systems#FORTRAN Monitor System|FMS]] could be run in B-core as a “back-ground” user, nearly as efficiently as on a bare machine, and also because programs compiled for FMS batch could be loaded and executed in the “foreground” time-sharing environment (with some limitations). ... This feature allowed the Computation Center to make the transition from batch to timesharing gradually}}</ref>

From the late 1960s onwards, interactive computing such as via text-based [[computer terminal]] interfaces (as in [[Unix shell]]s or [[read-eval-print loop]]s), and later [[graphical user interface]]s became common. Non-interactive computation, both one-off jobs such as compilation, and processing of multiple items in batches, became retrospectively referred to as ''batch processing'', and the term ''batch job'' (in early use often "batch ''of'' jobs") became common. Early use is particularly found at the [[University of Michigan]], around the [[Michigan Terminal System]] (MTS).
<ref>{{cite journal
|journal=Research News
|publisher=University of Michigan
|title=The Computing Center: Coming to Terms with the IBM System/360 Model 67
|volume=20
|year=1969
|issue=Nov/Dec
|page=[https://books.google.com/books?id=Qs9VAAAAMAAJ&pg=RA1-PA122&dq=%22batch+job%22 10]
}}</ref>

Although timesharing did exist, its use was not robust enough for corporate data processing; none of this was related to the earlier [[unit record equipment]], which was human-operated.

===Ongoing===
Non-interactive computation remains pervasive in computing, both for general data processing and for system "housekeeping" tasks (using [[system software]]). A high-level program (executing multiple programs, with some additional "glue" logic) is today most often called a ''script'', and written in [[scripting language]]s, particularly [[shell script]]s for system tasks; in [[IBM PC DOS]] and [[MS-DOS]] this is instead known as a [[batch file]]. That includes [[Unix|UNIX]]-based computers, [[Microsoft Windows]], [[macOS]] (whose foundation is the [[Berkeley Software Distribution|BSD]] Unix kernel), and even [[smartphone]]s. A running script, particularly one executed from an interactive [[login session]], is often known as a [[job (Unix)|job]], but that term is used very ambiguously.

"There is no direct counterpart to z/OS batch processing in PC or UNIX systems. Batch jobs are typically executed at a scheduled time or on an as-needed basis. Perhaps the closest comparison is with processes run by an [[at (command)|at]] or [[cron]] command in UNIX, although the differences are significant."<ref name="whatis">{{cite web|last1=IBM Corporation|title=What is batch processing?|url=https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zconcepts/zconc_whatisbatch.htm|access-date=Oct 10, 2019|website=zOS Concepts}}</ref>

== Modern systems ==
Batch applications are still critical in most organizations in large part because many common business processes are amenable to batch processing. While online systems can also function when manual intervention is not desired, they are not typically optimized to perform high-volume, repetitive tasks. Therefore, even new systems usually contain one or more batch applications for updating information at the end of the day, generating reports, printing documents, and other non-interactive tasks that must complete reliably within certain business deadlines.

Some applications are amenable to flow processing, namely those that only need data from a single input at once (not totals, for instance): start the next step for each input as it completes the previous step. In this case flow processing lowers [[latency (engineering)|latency]] for individual inputs, allowing them to be completed without waiting for the entire batch to finish. However, many applications require data from all records, notably computations such as totals. In this case the entire batch must be completed before one has a usable result: partial results are not usable.

Modern batch applications make use of modern batch frameworks such as Jem The Bee, [[Spring Batch]]<ref>{{Cite book |last=Minella |first=Michael |url=https://books.google.com/books?id=2tOSxcKmdyoC&dq=history+of+batch+processing&pg=PA2 |title=Pro Spring Batch |date=2011-10-13 |publisher=Apress |isbn=978-1-4302-3453-1 |language=en}}</ref> or implementations of [[Java Specification Request|JSR]] 352<ref>{{cite web|url=https://www.jcp.org/en/jsr/detail?id=352|title=Batch Applications for the Java Platform|publisher=Java Community Process|access-date=2015-08-03}}</ref> written for [[Java (programming language)|Java]], and other frameworks for other programming languages, to provide the [[fault tolerance]] and [[scalability]] required for high-volume processing.  In order to ensure high-speed processing, batch applications are often integrated with [[grid computing]] solutions to [[partition of a set|partition]] a batch job over a large number of processors, although there are significant programming challenges in doing so. High volume batch processing places particularly heavy demands on system and application architectures as well. Architectures that feature strong [[input/output]] performance and vertical [[scalability]], including modern [[mainframe computers]], tend to provide better batch performance than alternatives.

[[Scripting languages]] became popular as they evolved along with batch processing.<ref>{{cite web
|publisher=IBM.com
|title=JSR352 null
|url=https://www-01.ibm.com/support/docview.wss?uid=tss1wp102544&aid=5
|quote=JSR 352, the open standard specification for Java batch processing. ... The programming languages used evolved over time based on what was available
|access-date=2018-10-19
|archive-date=2018-10-20
|archive-url=https://web.archive.org/web/20181020011704/https://www-01.ibm.com/support/docview.wss?uid=tss1wp102544&aid=5
|url-status=dead
}}</ref>

==Batch window== 
A ''batch window'' is "a period of less-intensive online activity",<ref>{{cite web
|publisher=IBM Corporation |title=Mainframes working after hours: Batch processing
|url=http://publib.boulder.ibm.com/infocenter/zos/basics/index.jsp?topic=/com.ibm.zos.zmainframe/zconc_batchproc.htm|work=Mainframe concepts|access-date=June 20, 2013}}</ref> when the computer system is able to run batch jobs without interference from, or with, interactive online systems.

A bank's ''end-of-day (EOD)'' jobs require the concept of ''cutover'', where transaction and data are cut off for a particular day's batch activity ("deposits after 3 PM will be processed the next day").

As requirements for online systems uptime expanded to support [[globalization]], the [[Internet]], and other business needs, the batch window shrank<ref name=Rbook>{{cite book 
|title=Batch Processing: Design – Build – Run: Applied Practices and Principles
|url=https://www.oreilly.com/library/view/design-build/9780470257630/9780470257630_batch.html
|isbn=9780470257630 |publisher=Oreilly|date=2009-02-24
}}</ref><ref>"Traditionally batch was an overnight activity, with jobs processing millions of ... Today the batch window is ever decreasing with 24/7 availability requirements."</ref> and increasing emphasis was placed on techniques that would require online data to be available for a maximum amount of time.

==Batch size==
The ''batch size'' refers to the number of work units to be processed within one batch operation. Some examples are:
* The number of lines from a file to load into a database before [[Commit_(data_management)|committing]] the transaction.
* The number of messages to dequeue from a queue.
* The number of requests to send within one payload.

==Common batch processing usage==
* Efficient bulk database updates and automated [[transaction processing]], as contrasted to interactive [[online transaction processing]] (OLTP) applications. The [[extract, transform, load]] (ETL) step in populating [[data warehouses]] is inherently a batch process in most implementations.
* Performing bulk operations on [[digital image]]s such as resizing, conversion, watermarking, or otherwise editing a group of image files.
* Converting computer files from one format to another. For example, a batch job may convert proprietary and legacy files to common standard formats for end-user queries and display.
* Training [[Machine learning|Machine Learning]] models. For example, an [[e-commerce]] website might want to process customer transactions in a hourly batch to update the model that produces related [[product recommendations]], in order to save [[Computational resource|computational resources]].<ref>{{Cite web |last=Gutkovich |first=Ben |date=10 February 2023 |title=Why Real-Time Machine Learning will be the Buzzword of 2023 |url=https://superlinked.com/insights/why-real-time-machine-learning-will-be-the-buzzword-of-2023 |access-date=11 April 2023 |website=Superlinked}}</ref> 

== Notable batch scheduling and execution environments ==
The [[IBM mainframe]] [[z/OS]] [[operating system]] or platform has arguably the most highly refined and evolved set of batch processing facilities owing to its origins, long history, and continuing evolution. Today such systems commonly support hundreds or even thousands of concurrent online and batch tasks within a single [[operating system]] image. Technologies that aid concurrent batch and online processing include [[Job Control Language]] (JCL), scripting languages such as [[REXX]], Job Entry Subsystem ([[JES2]] and [[JES3]]), [[Workload Manager]] (WLM), Automatic Restart Manager (ARM), Resource Recovery Services (RRS), [[IBM Db2]] data sharing, [[Parallel Sysplex]], unique performance optimizations such as [[HiperDispatch]], [[Channel I/O|I/O channel architecture]], and several others.

The Unix programs <code>[[cron (Unix)|cron]]</code>, <code>[[at (command)|at]]</code>, and <code>[[batch (Unix)|batch]]</code> (today <code>batch</code> is a variant of <code>at</code>) allow for complex scheduling of jobs. Windows has a [[job scheduler]]. Most [[high-performance computing]] [[Cluster (computing)|clusters]] use batch processing to maximize cluster usage.<ref>{{cite web 
|date=January 25, 2018
|url=https://www.slideshare.net/raamana/high-performance-computing-with-checklist-and-tips-optimal-cluster-usage
|title=High performance computing tutorial, with checklist and tips to optimize
|quote=a multi-user, shared and smart batch processing system improves the scale ..... Most ''HPC'' clusters are in Linux}}</ref>

==See also==
<!-- New links in alphabetical order please -->
* [[Background process]]
* [[Batch file]]
* [[Batch renaming]] – to rename lots of files automatically without human intervention, in order to save time and effort
* [[BatchPipes]] – for utility that increases batch performance
* [[Processing modes]]
* [[Production support]] – for batch job/schedule/stream support
* [[High-throughput computing]]

==Notes==
{{Notelist}}

==References==
{{Reflist}}

[[Category:Job scheduling]]