Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
AWK
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Text processing programming language}} {{About|the programming language}} {{Infobox programming language | logo = The-AWK-Programming-Language.svg | logo_size = 200px | screenshot = Awk-example-usage-gimp.gif | screenshot_size = 250px | screenshot caption = Usage of AWK in shell to check matching fields in two files | name = AWK | paradigm = [[scripting language|Scripting]], [[procedural programming|procedural]], [[data-driven programming|data-driven]]<ref name=developerworks>{{cite web|url=https://www6.software.ibm.com/developerworks/education/au-gawk/au-gawk-a4.pdf|title=Get started with GAWK: AWK language fundamentals|last=Stutz|first=Michael|date=September 19, 2006|work=developerWorks|publisher=[[IBM]]|access-date=2015-01-29|quote=[AWK is] often called a data-driven language -- the program statements describe the input data to match and process rather than a sequence of program steps|archive-date=2015-04-27|archive-url=https://web.archive.org/web/20150427143548/https://www6.software.ibm.com/developerworks/education/au-gawk/au-gawk-a4.pdf|url-status=live}}</ref> | year = {{start date and age|1977}} | latest_release_version = [http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html IEEE Std 1003.1-2008] (POSIX) / 1985 | designer = [[Alfred Aho]], [[Peter J. Weinberger|Peter Weinberger]], and [[Brian Kernighan]] | typing = none; can handle strings, integers and floating-point numbers; regular expressions | implementations = awk, GNU Awk, mawk, nawk, MKS AWK, Thompson AWK (compiler), Awka (compiler) | dialects = ''old awk'' oawk 1977, ''new awk'' nawk 1985, ''GNU Awk'' gawk | influenced_by = [[C (programming language)|C]], [[sed]], [[SNOBOL]]<ref>{{cite book |title=UNIX Workshop |publisher=Macmillan International Higher Education |author=Andreas J. Pilavakis |year=1989 |pages=196 }}</ref><ref>{{cite book |title=Effective Awk Programming: Universal Text Processing and Pattern Matching |edition=4th |publisher=O'Reilly Media |author=Arnold Robbins |year=2015 |pages=560}}</ref> | influenced = [[Tcl]], [[AMPL]], [[Perl]]<!--1987-->, [[Korn Shell]] (''ksh93''<!--1993-->, ''dtksh'', ''tksh''), [[Lua]]<!--1993--> | operating_system = [[Cross-platform]] | website = }} '''AWK''' ({{IPAc-en|ΙΛ|k}}<ref name=awkLC.DR/>) is a [[domain-specific language]] designed for text processing and typically used as a [[data extraction]] and reporting tool. Like [[sed]] and [[grep]], it is a [[filter (software)|filter]],<ref name=awkLC.DR>{{cite magazine |magazine=Digital Review |date=May 2, 1988 |page=91 |author=James W. Livingston |title=The Great awk Program is No Birdbrain}}</ref> and it is a standard feature of most [[Unix-like|Unix-like operating systems]]. The AWK language is a [[data-driven programming|data-driven]] [[scripting language]] consisting of a set of actions to be taken against [[Stream (computing)|streams]] of textual data β either run directly on files or used as part of a [[pipeline (Unix)|pipeline]] β for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the [[string (computer science)|string]] [[datatype]], [[associative array]]s (that is, arrays indexed by key strings), and [[regular expression]]s. While AWK has a limited intended [[Domain (software engineering)|application domain]] and was especially designed to support [[one-liner program]]s, the language is [[Turing-complete]], and even the early Bell Labs users of AWK often wrote well-structured large AWK programs.<ref>{{cite web | url = http://www.faqs.org/docs/artu/ch08s02.html#awk | title = Applying Minilanguages | first = Eric S. | last = Raymond | author-link = Eric S. Raymond | work = The Art of Unix Programming | at = Case Study: awk | archive-url = https://web.archive.org/web/20080730063308/http://www.faqs.org/docs/artu/ch08s02.html#awk | archive-date = July 30, 2008 | access-date = May 11, 2010 | quote = The awk action language is Turing-complete, and can read and write files. }}</ref> AWK was created at [[Bell Labs]] in the 1970s,<ref>{{cite tech report |url=https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1299 |title=Awk β A Pattern Scanning and Processing Language (Second Edition) |last1=Aho |first1=Alfred V. |author-link1=Alfred Aho |last2=Kernighan |first2=Brian W. |author-link2=Brian Kernighan |last3=Weinberger |first3=Peter J. |author-link3=Peter J. Weinberger |date=September 1, 1978 |publisher=Bell Telephone Laboratories, Inc. |series=Unix Seventh Edition Manual, Volume 2 |access-date=February 1, 2020}}</ref> and its name is derived from the surnames of its authors: [[Alfred Aho]] (author of [[egrep]]), [[Peter J. Weinberger|Peter Weinberger]] (who worked on tiny relational databases), and [[Brian Kernighan]]. The acronym is pronounced the same as the name of the bird species [[auk]], which is illustrated on the cover of ''[[The AWK Programming Language]]''.<ref name="AWK1" /> When written in all lowercase letters, as <code>awk</code>, it refers to the [[Unix]] or [[Plan 9 from Bell Labs|Plan 9]] program that runs scripts written in the AWK programming language. ==History== According to Brian Kernighan, one of the goals of AWK was to have a tool that would easily manipulate both numbers and strings. AWK was also inspired by [[Marc Rochkind]]'s programming language that was used to search for patterns in input data, and was implemented using [[yacc]].<ref>{{cite web|url=https://www.youtube.com/watch?v=vT_J6xc-Az0 | archive-url=https://ghostarchive.org/varchive/youtube/20211122/vT_J6xc-Az0| archive-date=2021-11-22 | url-status=live|title=UNIX Special: Profs Kernighan & Brailsford |work=Computerphile |date=September 30, 2015 }}{{cbignore}}</ref> As one of the early tools to appear in [[Version 7 Unix]], AWK added computational features to a Unix [[pipeline (Unix)|pipeline]] besides the [[Bourne shell]], the only scripting language available in a standard Unix environment. It is one of the mandatory utilities of the [[Single UNIX Specification]],<ref>{{Cite web |url=http://www.unix.org/version3/apis/cu.html |title=The Single UNIX Specification, Version 3, Utilities Interface Table |access-date=2005-12-18 |archive-url=https://web.archive.org/web/20180105030249/http://www.unix.org/version3/apis/cu.html |archive-date=2018-01-05 |url-status=dead }}</ref> and is required by the [[Linux Standard Base]] specification.<ref>{{cite tech report |url=https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic.html#COMMAND |title=Linux Standard Base Core Specification 4.0 |chapter=Chapter 15. Commands and Utilities |institution=Linux Foundation |date=2008 |access-date=2020-02-01 |archive-date=2019-10-16 |archive-url=https://web.archive.org/web/20191016015828/https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic.html#COMMAND |url-status=live }}</ref> In 1983, AWK was one of several UNIX tools available for Charles River Data Systems' [[UNOS (operating system)|UNOS]] operating system under [[Bell Laboratories]] license.<ref>{{Cite book|year=1983|title=The Insider's Guide To The Universe|publisher=Charles River Data Systems, Inc.|url=https://www.1000bit.it/ad/bro/charles/CharlesRiverSystem-Universe.pdf|page=13}}</ref> AWK was significantly revised and expanded in 1985β88, resulting in the [[Gawk (GNU package)|GNU AWK]] implementation written by Paul Rubin, [[Jay Fenlason]], and [[Richard Stallman]], released in 1988.<ref name=robbins/> GNU AWK may be the most widely deployed version<ref>{{cite book|last1=Dougherty|first1=Dale|last2=Robbins|first2=Arnold|title=sed & awk|date=1997|publisher=O'Reilly|location=Sebastopol, CA|isbn=1-565-92225-5|page=221|edition=2nd}}</ref> because it is included with GNU-based Linux packages. GNU AWK has been maintained solely by [[Arnold Robbins]] since 1994.<ref name=robbins>{{cite web |url=http://www.skeeve.com/gnu-awk-and-me-2014.pdf |title=The GNU Project and Me: 27 Years with GNU AWK |website=skeeve.com |first=Arnold |last=Robbins |date=March 2014 |access-date=October 4, 2014 |archive-date=October 6, 2014 |archive-url=https://web.archive.org/web/20141006081656/http://www.skeeve.com/gnu-awk-and-me-2014.pdf |url-status=live }}</ref> [[Brian Kernighan]]'s [[nawk]] (New AWK) source was first released in 1993 unpublicized, and publicly since the late 1990s; many BSD systems use it to avoid the GPL license.<ref name=robbins/> AWK was preceded by [[sed]] (1974). Both were designed for text processing. They share the line-oriented, data-driven paradigm, and are particularly suited to writing [[one-liner program]]s, due to the implicit [[main loop]] and current line variables. The power and terseness of early AWK programs β notably the powerful regular expression handling and conciseness due to implicit variables, which facilitate one-liners β together with the limitations of AWK at the time, were important inspirations for the [[Perl]] language (1987). In the 1990s, Perl became very popular, competing with AWK in the niche of Unix text-processing languages. == Structure of AWK programs == [[File:POSIX awk.pdf|thumb]] {{Blockquote|AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.|author= Alfred V. Aho<ref>{{cite web |first=Naomi |last=Hamilton |url=https://www.computerworld.com/article/2535126/the-a-z-of-programming-languages--awk.html |title=The A-Z of Programming Languages: AWK |date=May 30, 2008 |work=[[Computerworld]] |access-date=2008-12-12 |archive-date=2020-02-01 |archive-url=https://web.archive.org/web/20200201095859/https://www.computerworld.com/article/2535126/the-a-z-of-programming-languages--awk.html |url-status=live }}</ref>}} An AWK program is a series of pattern action pairs, written as: <syntaxhighlight lang="awk"> condition { action } condition { action } ... </syntaxhighlight> where ''condition'' is typically an expression and ''action'' is a series of commands. The input is split into records, where by default records are separated by newline characters so that the input is split into lines. The program tests each record against each of the conditions in turn, and executes the ''action'' for each expression that is true. Either the condition or the action may be omitted. The condition defaults to matching every record. The default action is to print the record. This is the same pattern-action structure as sed. In addition to a simple AWK expression, such as <code>foo == 1</code> or <code>/^foo/</code>, the condition can be <code>BEGIN</code> or <code>END</code> causing the action to be executed before or after all records have been read, or ''pattern1, pattern2'' which matches the range of records starting with a record that matches ''pattern1'' up to and including the record that matches ''pattern2'' before again trying to match against ''pattern1'' on subsequent lines. In addition to normal arithmetic and logical operators, AWK expressions include the tilde operator, <code>~</code>, which matches a [[regular expression]] against a string. As handy [[syntactic sugar]], ''/regexp/'' without using the tilde operator matches against the current record; this syntax derives from [[sed]], which in turn inherited it from the [[Ed (text editor)|ed]] editor, where <code>/</code> is used for searching. This syntax of using slashes as [[delimiter]]s for regular expressions was subsequently adopted by [[Perl]] and [[ECMAScript]], and is now common. The tilde operator was also adopted by Perl. == Commands == AWK commands are the statements that are substituted for ''action'' in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of [[dynamically linked library|dynamically linked libraries]], which can also provide more functions. === The ''print'' command === The ''print'' command is used to output text. The output text is always terminated with a predefined string called the output record separator (ORS) whose default value is a newline. The simplest form of this command is: ; <code>print</code> :This displays the contents of the current record. In AWK, records are broken down into ''fields'', and these can be displayed separately: ; <code>print $1</code> : Displays the first field of the current record ; <code>print $1, $3</code> : Displays the first and third fields of the current record, separated by a predefined string called the output field separator (OFS) whose default value is a single space character Although these fields (''$X'') may bear resemblance to variables (the $ symbol indicates variables in the usual Unix shells and in [[Perl]]), they actually refer to the fields of the current record. A special case, ''$0'', refers to the entire record. In fact, the commands "<code>print</code>" and "<code>print $0</code>" are identical in functionality. The ''print'' command can also display the results of calculations and/or function calls: <syntaxhighlight lang="awk"> /regex_pattern/ { # Actions to perform in the event the record (line) matches the above regex_pattern print 3+2 print foobar(3) print foobar(variable) print sin(3-2) } </syntaxhighlight> Output may be sent to a file: <syntaxhighlight lang="awk"> /regex_pattern/ { # Actions to perform in the event the record (line) matches the above regex_pattern print "expression" > "file name" } </syntaxhighlight> or through a [[pipe (Unix)|pipe]]: <syntaxhighlight lang="awk"> /regex_pattern/ { # Actions to perform in the event the record (line) matches the above regex_pattern print "expression" | "command" } </syntaxhighlight> === Built-in variables === AWK's built-in variables include the field variables: $1, $2, $3, and so on ($0 represents the entire record). They hold the text or values in the individual text-fields in a record. Other variables include: * <code>NR</code>: Number of Records. Keeps a current count of the number of input records read so far from all data files. It starts at zero, but is never automatically reset to zero.<ref name="GNU.org Records">{{Cite book|chapter-url=https://www.gnu.org/software/gawk/manual/html_node/Records.html#index-FNR-variable|chapter=Records|url=https://www.gnu.org/software/gawk/manual/|title=GAWK: Effective AWK Programming: A Userβs Guide for GNU Awk|date=September 2024|edition=5.3|access-date=2025-01-24}}</ref> * <code>FNR</code>: File Number of Records. Keeps a current count of the number of input records read so far ''in the current file.'' This variable is automatically reset to zero each time a new file is started.<ref name="GNU.org Records" /> * <code>NF</code>: Number of Fields. Contains the number of fields in the current input record. The last field in the input record can be designated by $NF, the 2nd-to-last field by $(NF-1), the 3rd-to-last field by $(NF-2), etc. * <code>FILENAME</code>: Contains the name of the current input-file. * <code>FS</code>: Field Separator. Contains the "field separator" used to divide fields in the input record. The default, "white space", allows any sequence of space and tab characters. FS can be reassigned with another character or character sequence to change the field separator. * <code>RS</code>: Record Separator. Stores the current "record separator" character. Since, by default, an input line is the input record, the default record separator character is a "newline". * <code>OFS</code>: Output Field Separator. Stores the "output field separator", which separates the fields when awk prints them. The default is a "space" character. * <code>ORS</code>: Output Record Separator. Stores the "output record separator", which separates the output records when awk prints them. The default is a "newline" character. * <code>OFMT</code>: Output Format. Stores the format for numeric output. The default format is "%.6g". === Variables and syntax === Variable names can use any of the characters [A-Za-z0-9_], with the exception of language keywords, and cannot begin with a numeric digit. The operators ''+ - * /'' represent addition, subtraction, multiplication, and division, respectively. For string [[concatenation]], simply place two variables (or string constants) next to each other. It is optional to use a space in between if string constants are involved, but two variable names placed adjacent to each other require a space in between. Double quotes [[delimit]] string constants. Statements need not end with semicolons. Finally, comments can be added to programs by using ''#'' as the first character on a line, or behind a command or sequence of commands. === User-defined functions === In a format similar to [[C (programming language)|C]], function definitions consist of the keyword <code>function</code>, the function name, argument names and the function body. Here is an example of a function. <syntaxhighlight lang="awk"> function add_three(number) { return number + 3 } </syntaxhighlight> This statement can be invoked as follows: <syntaxhighlight lang="awk"> (pattern) { print add_three(36) # Outputs '''39''' } </syntaxhighlight> Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some [[whitespace character|whitespace]] in the argument list before the local variables, to indicate where the parameters end and the local variables begin. == Examples == === Hello, World! === Here is the customary [["Hello, World!" program]] written in AWK: <syntaxhighlight lang="awk"> BEGIN { print "Hello, world!" exit } </syntaxhighlight> === Print lines longer than 80 characters === Print all lines longer than 80 characters. The default action is to print the current line. <syntaxhighlight lang="awk"> length($0) > 80 </syntaxhighlight> === Count words === Count words in the input and print the number of lines, words, and characters (like [[wc (Unix)|wc]]): <syntaxhighlight lang="awk"> { words += NF chars += length + 1 # add one to account for the newline character at the end of each record (line) } END { print NR, words, chars } </syntaxhighlight> As there is no pattern for the first line of the program, every line of input matches by default, so the increment actions are executed for every line. <code>words += NF</code> is shorthand for <code>words = words + NF</code>. === Sum last word === <syntaxhighlight lang="awk"> { s += $NF } END { print s + 0 } </syntaxhighlight> <code>s</code> is incremented by the numeric value of <code>$NF</code>, which is the last word on the line as defined by AWK's field separator (by default, white-space). <code>NF</code> is the number of fields in the current line, e.g. 4. Since <code>$4</code> is the value of the fourth field, <code>$NF</code> is the value of the last field in the line regardless of how many fields this line has, or whether it has more or fewer fields than surrounding lines. <code>$</code> is actually a unary operator with the highest [[operator precedence]]. (If the line has no fields, then <code>NF</code> is 0, <code>$0</code> is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0.) At the end of the input, the <code>END</code> pattern matches, so <code>s</code> is printed. However, since there may have been no lines of input at all, in which case no value has ever been assigned to <code>s</code>, <code>s</code> will be an empty string by default. Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value. This results from AWK's arithmetic operators, like addition, [[Type conversion|implicitly casting]] their operands to numbers before computation as required. (Similarly, concatenating a variable with an empty string coerces from a number to a string, e.g., <code>s ""</code>. Note, there is no operator to concatenate strings, they are just placed adjacently.) On an empty input, the coercion in <code>{ print s + 0 }</code> causes the program to print <code>0</code>, whereas with just the action <code>{ print s }</code>, an empty line would be printed. === Match a range of input lines === <syntaxhighlight lang="awk"> NR % 4 == 1, NR % 4 == 3 { printf "%6d %s\n", NR, $0 } </syntaxhighlight> The action statement prints each line numbered. The printf function emulates the standard C [[printf]] and works similarly to the print command described above. The pattern to match, however, works as follows: ''NR'' is the number of records, typically lines of input, AWK has so far read, i.e. the current line number, starting at 1 for the first line of input. ''%'' is the [[modulo operation|modulo]] operator. ''NR % 4 == 1'' is true for the 1st, 5th, 9th, etc., lines of input. Likewise, ''NR % 4 == 3'' is true for the 3rd, 7th, 11th, etc., lines of input. The range pattern is false until the first part matches, on line 1, and then remains true up to and including when the second part matches, on line 3. It then stays false until the first part matches again on line 5. Thus, the program prints lines 1,2,3, skips line 4, and then 5,6,7, and so on. For each line, it prints the line number (on a 6 character-wide field) and then the line contents. For example, when executed on this input: Rome Florence Milan Naples Turin Venice The previous program prints: 1 Rome 2 Florence 3 Milan 5 Turin 6 Venice ==== Printing the initial or the final part of a file ==== As a special case, when the first part of a range pattern is constantly true, e.g. ''1'', the range will start at the beginning of the input. Similarly, if the second part is constantly false, e.g. ''0'', the range will continue until the end of input. For example, <syntaxhighlight lang="awk"> /^--cut here--$/, 0 </syntaxhighlight> prints lines of input from the first line matching the regular expression ''^--cut here--$'', that is, a line containing only the phrase "--cut here--", to the end. === Calculate word frequencies === [[Word frequency]] using [[associative array]]s: <syntaxhighlight lang="awk"> BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) words[tolower($i)]++ } END { for (i in words) print i, words[i] } </syntaxhighlight> The BEGIN block sets the field separator to any sequence of non-alphabetic characters. Separators can be regular expressions. After that, we get to a bare action, which performs the action on every input line. In this case, for every field on the line, we add one to the number of times that word, first converted to lowercase, appears. Finally, in the END block, we print the words with their frequencies. The line for (i in words) creates a loop that goes through the array ''words'', setting ''i'' to each ''subscript'' of the array. This is different from most languages, where such a loop goes through each ''value'' in the array. The loop thus prints out each word followed by its frequency count. <code>tolower</code> was an addition to the One True awk (see below) made after the book was published. === Match pattern from command line === This program can be represented in several ways. The first one uses the [[Bourne shell]] to make a shell script that does everything. It is the shortest of these methods: <syntaxhighlight lang="bash"> #!/bin/sh pattern="$1" shift awk '/'"$pattern"'/ { print FILENAME ":" $0 }' "$@" </syntaxhighlight> The <code>$pattern</code> in the awk command is not protected by single quotes so that the shell does expand the variable but it needs to be put in double quotes to properly handle patterns containing spaces. A pattern by itself in the usual way checks to see if the whole line (<code>$0</code>) matches. <code>FILENAME</code> contains the current filename. awk has no explicit concatenation operator; two adjacent strings concatenate them. <code>$0</code> expands to the original unchanged input line. There are alternate ways of writing this. This shell script accesses the environment directly from within awk: <syntaxhighlight lang="bash"> #!/bin/sh export pattern="$1" shift awk '$0 ~ ENVIRON["pattern"] { print FILENAME ":" $0 }' "$@" </syntaxhighlight> This is a shell script that uses <code>ENVIRON</code>, an array introduced in a newer version of the One True awk after the book was published. The subscript of <code>ENVIRON</code> is the name of an environment variable; its result is the variable's value. This is like the [[getenv]] function in various standard libraries and [[POSIX]]. The shell script makes an environment variable <code>pattern</code> containing the first argument, then drops that argument and has awk look for the pattern in each file. <code>~</code> checks to see if its left operand matches its right operand; <code>!~</code> is its inverse. A regular expression is just a string and can be stored in variables. The next way uses command-line variable assignment, in which an argument to awk can be seen as an assignment to a variable: <syntaxhighlight lang="bash"> #!/bin/sh pattern="$1" shift awk '$0 ~ pattern { print FILENAME ":" $0 }' pattern="$pattern" "$@" </syntaxhighlight> Or You can use the ''-v var=value'' command line option (e.g. ''awk -v pattern="$pattern" ...''). Finally, this is written in pure awk, without help from a shell or without the need to know too much about the implementation of the awk script (as the variable assignment on command line one does), but is a bit lengthy: <syntaxhighlight lang="awk"> BEGIN { pattern = ARGV[1] for (i = 1; i < ARGC; i++) # remove first argument ARGV[i] = ARGV[i + 1] ARGC-- if (ARGC == 1) { # the pattern was the only thing, so force read from standard input (used by book) ARGC = 2 ARGV[1] = "-" } } $0 ~ pattern { print FILENAME ":" $0 } </syntaxhighlight> The <code>BEGIN</code> is necessary not only to extract the first argument, but also to prevent it from being interpreted as a filename after the <code>BEGIN</code> block ends. <code>ARGC</code>, the number of arguments, is always guaranteed to be β₯1, as <code>ARGV[0]</code> is the name of the command that executed the script, most often the string <code>"awk"</code>. <code>ARGV[ARGC]</code> is the empty string, <code>""</code>. <code>#</code> initiates a comment that expands to the end of the line. Note the <code>if</code> block. awk only checks to see if it should read from standard input before it runs the command. This means that awk 'prog' only works because the fact that there are no filenames is only checked before <code>prog</code> is run! If you explicitly set <code>ARGC</code> to 1 so that there are no arguments, awk will simply quit because it feels there are no more input files. Therefore, you need to explicitly say to read from standard input with the special filename <code>-</code>. == Self-contained AWK scripts == On Unix-like operating systems self-contained AWK scripts can be constructed using the [[shebang (Unix)|shebang]] syntax. For example, a script that sends the content of a given file to standard output may be built by creating a file named <code>print.awk</code> with the following content: <syntaxhighlight lang="awk"> #!/usr/bin/awk -f { print $0 } </syntaxhighlight> It can be invoked with: <code>./print.awk <filename></code> The <code>-f</code> tells awk that the argument that follows is the file to read the AWK program from, which is the same flag that is used in sed. Since they are often used for one-liners, both these programs default to executing a program given as a command-line argument, rather than a separate file. == Versions and implementations == AWK was originally written in 1977 and distributed with [[Version 7 Unix]]. In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book ''[[The AWK Programming Language]]'', published 1988, and its implementation was made available in releases of [[UNIX System V]]. To avoid confusion with the incompatible older version, this version was sometimes called "new awk" or ''nawk''. This implementation was released under a [[free software license]] in 1996 and is still maintained by Brian Kernighan (see external links below).{{citation needed|date=February 2020}} Old versions of Unix, such as [[UNIX/32V]], included <code>awkcc</code>, which converted AWK to C. Kernighan wrote a program to turn awk into {{nowrap|C++}}; its state is not known.<ref>{{cite conference |first=Brian W. |last=Kernighan |date=April 24β25, 1991 |url=https://www.cs.princeton.edu/~bwk/btl.mirror/awkc++.pdf |title=An AWK to C++ Translator |event=Usenix C++ Conference |location=Washington, DC |pages=217β228 |conference= |access-date=2020-02-01 |archive-date=2020-06-22 |archive-url=https://web.archive.org/web/20200622061725/https://www.cs.princeton.edu/~bwk/btl.mirror/awkc++.pdf |url-status=live }}</ref> * '''BWK awk''', also known as '''nawk''', refers to the version by [[Brian Kernighan]]. It has been dubbed the "One True AWK" because of the use of the term in association with the book that originally described the language and the fact that Kernighan was one of the original authors of AWK.<ref name = "AWK1">{{cite book | title= The AWK Programming Language |first1=Alfred V. |last1=Aho |first2=Brian W. |last2=Kernighan |first3=Peter J. |last3=Weinberger | year= 1988 | publisher= Addison-Wesley Publishing Company | isbn= 9780201079814 |url = https://archive.org/details/pdfy-MgN0H1joIoDVoIC7 | access-date = 16 May 2015 }}</ref> FreeBSD refers to this version as ''one-true-awk''.<ref>{{cite web |url=http://www.freebsd.org/cgi/cvsweb.cgi/src/contrib/one-true-awk/FREEBSD-upgrade?rev=1.9&content-type=text/x-cvsweb-markup |archive-url=https://web.archive.org/web/20130908180035/http://www.freebsd.org/cgi/cvsweb.cgi/src/contrib/one-true-awk/FREEBSD-upgrade?rev=1.9&content-type=text%2Fx-cvsweb-markup |archive-date=September 8, 2013 |title=FreeBSD's work log for importing BWK awk into FreeBSD's core |date=May 16, 2005 |access-date=September 20, 2006 |url-status=live }}</ref> This version also has features not in the book, such as <code>tolower</code> and <code>ENVIRON</code> that are explained above; see the FIXES file in the source archive for details. This version is used by, for example, [[Android (operating system)|Android]], [[FreeBSD]], [[NetBSD]], [[OpenBSD]], [[macOS]], and [[illumos]]. Brian Kernighan and Arnold Robbins are the main contributors to a source repository for ''nawk'': {{URL|https://github.com/onetrueawk/awk}}. * '''gawk''' ([[GNU]] awk) is another free-software implementation and the only implementation that makes serious progress implementing [[internationalization and localization]] and TCP/IP networking. It was written before the original implementation became freely available. It includes its own debugger, and its [[profiling (computer programming)|profiler]] enables the user to make measured performance enhancements to a script. It also enables the user to extend functionality with shared libraries. Some [[Linux distribution]]s include ''gawk'' as their default AWK implementation.{{Citation needed|date=September 2018}} As of version 5.2 (September 2022) ''gawk'' includes a persistent memory feature that can remember script-defined variables and functions from one invocation of a script to the next and pass data between unrelated scripts, as described in the Persistent-Memory ''gawk'' User Manual: {{URL|https://www.gnu.org/software/gawk/manual/pm-gawk/}}. ** '''gawk-csv'''. The [[Comma-separated values|CSV]] extension of ''gawk'' provides facilities for inputting and outputting CSV formatted data.<ref>{{cite web | title=CSV Processing With gawk (using the gawk-csv extension)| website=gawkextlib |year=2018 | url=https://gawkextlib.sourceforge.net/csv/gawk-csv.htmlurl-statuse=live|archive-url=https://web.archive.org/web/20200325201153/http://gawkextlib.sourceforge.net/csv/gawk-csv.html |archive-date=2020-03-25}}</ref> * '''mawk''' is a very fast AWK implementation by Mike Brennan based on a [[bytecode]] interpreter. * '''libmawk''' is a fork of mawk, allowing applications to embed multiple parallel instances of awk interpreters. * '''awka''' (whose front end is written atop the ''mawk'' program) is another translator of AWK scripts into C code. When compiled, statically including the author's libawka.a, the resulting executables are considerably sped up and, according to the author's tests, compare very well with other versions of AWK, [[Perl]], or [[Tcl]]. Small scripts will turn into programs of 160β170 kB. * '''tawk''' (Thompson AWK) is an AWK [[compiler]] for [[Solaris (operating system)|Solaris]], [[DOS]], [[OS/2]], and [[Microsoft Windows|Windows]], previously sold by Thompson Automation Software (which has ceased its activities).<ref>{{cite news |url=https://www.drdobbs.com/tools/examining-the-tawk-compiler/184410193 |work=[[Dr. Dobb's Journal]] |author=James K. Lawless |date=May 1, 1997 |title=Examining the TAWK Compiler |access-date=February 21, 2020 |archive-date=February 21, 2020 |archive-url=https://web.archive.org/web/20200221191605/https://www.drdobbs.com/tools/examining-the-tawk-compiler/184410193 |url-status=live }}</ref> * '''Jawk''' is a project to implement AWK in [[Java (programming language)|Java]], hosted on SourceForge.<ref>{{Cite web |url=http://sourceforge.net/projects/jawk/ |title=''Jawk'' at SourceForge |access-date=2006-08-23 |archive-date=2007-05-27 |archive-url=https://web.archive.org/web/20070527021808/http://sourceforge.net/projects/jawk |url-status=live }}</ref> Extensions to the language are added to provide access to Java features within AWK scripts (i.e., Java threads, sockets, collections, etc.). * '''xgawk''' is a fork of ''gawk''<ref>{{Cite web |url=http://gawkextlib.sourceforge.net/ |title=''xgawk'' Home Page |access-date=2013-05-07 |archive-date=2013-04-18 |archive-url=https://web.archive.org/web/20130418224130/http://gawkextlib.sourceforge.net/ |url-status=live }}</ref> that extends ''gawk'' with dynamically loadable libraries. The XMLgawk extension was integrated into the official GNU Awk release 4.1.0. * '''QSEAWK''' is an embedded AWK interpreter implementation included in the QSE library that provides embedding [[application programming interface]] (API) for [[C (programming language)|C]] and [[C++]].<ref>{{Cite web |url=https://github.com/hyung-hwan/qse |title=QSEAWK at GitHub |website=[[GitHub]] |access-date=2017-09-06 |archive-date=2018-06-11 |archive-url=https://web.archive.org/web/20180611001042/https://github.com/hyung-hwan/qse |url-status=live }}</ref> * '''libfawk''' is a very small, function-only, reentrant, embeddable interpreter written in C * '''[[BusyBox]]''' includes an AWK implementation written by Dmitry Zakharov. This is a very small implementation suitable for embedded systems. * '''CLAWK''' by Michael Parker provides an AWK implementation in [[Common Lisp]], based upon the regular expression library of the same author.<ref>{{Cite web |url=https://github.com/sharplispers/clawk |title=CLAWK at GitHub |website=[[GitHub]] |access-date=2021-06-01 |archive-date=2021-08-25 |archive-url=https://web.archive.org/web/20210825102602/https://github.com/sharplispers/clawk |url-status=live }}</ref> * '''goawk''' is an AWK implementation in Go with a few convenience extensions by Ben Hoyt, hosted on [https://github.com/benhoyt/goawk Github]. The gawk manual has a list of more AWK implementations.<ref>{{cite book |chapter-url=https://www.gnu.org/software/gawk/manual/html_node/Other-Versions.html |title=GAWK: Effective AWK Programming: A User's Guide for GNU Awk |chapter=B.5 Other Freely Available awk Implementations|date=September 2024|edition=5.3|access-date=2025-01-24}}</ref> == Books == * {{cite book |last1 = Aho |first1 = Alfred V. |author-link1 = Alfred Aho |last2 = Kernighan |first2 = Brian W. |author-link2 = Brian Kernighan |last3 = Weinberger |first3 = Peter J. |author-link3 = Peter J. Weinberger |title = The AWK Programming Language |url = https://archive.org/details/awkprogrammingla00ahoa |access-date = 2017-01-22 |date = 1988-01-01 |publisher = [[Addison-Wesley]] |location = New York, NY |isbn = 0-201-07981-X |url-access = registration }} * {{cite book |last1 = Aho |first1 = Alfred V. |author-link1 = Alfred Aho |last2 = Kernighan |first2 = Brian W. |author-link2 = Brian Kernighan |last3 = Weinberger |first3 = Peter J. |author-link3 = Peter J. Weinberger |title = The AWK Programming Language, Second Edition |date = 2023-09-06 |publisher = [[Addison-Wesley|Addison-Wesley Professional]] |location = Hoboken, New Jersey |isbn = 978-0-13-826972-2 |url = https://awk.dev |url-status = live |access-date = 2023-11-03 |archive-url = https://web.archive.org/web/20231027062708/https://awk.dev/ |archive-date = 2023-10-27 }} * {{cite book | last1 = Robbins | first1 = Arnold | title = Effective awk Programming | url = http://www.oreilly.com/catalog/awkprog3/ | access-date = 2009-04-16 | edition = 3rd | date = 2001-05-15 | publisher = [[O'Reilly Media]] | location = Sebastopol, CA | isbn = 0-596-00070-7 }} * {{cite book | last1 = Dougherty | first1 = Dale | author-link1 = Dale Dougherty | last2 = Robbins | first2 = Arnold | title = sed & awk | url = http://www.oreilly.com/catalog/sed2/ | access-date = 2009-04-16 | edition = 2nd | date = 1997-03-01 | publisher = O'Reilly Media | location = Sebastopol, CA | isbn = 1-56592-225-5 }} * {{cite book | last1 = Robbins | first1 = Arnold | title = Effective Awk Programming: A User's Guide for Gnu Awk | url = https://www.gnu.org/software/gawk/manual/ | access-date = 2009-04-16 | edition = 1.0.3 | year = 2000 | publisher = [[iUniverse]] | location = Bloomington, IN | isbn = 0-595-10034-1 | archive-url= https://web.archive.org/web/20090412190359/https://www.gnu.org/software/gawk/manual/| archive-date= 12 April 2009 | url-status= live}} == See also == * [[Data transformation]] * [[Event-driven programming]] * [[List of Unix commands]] * [[sed]] == References == {{reflist|30em}} == Further reading == * {{cite web |url=https://www.fosslife.org/awk-power-and-promise-40-year-old-language |title=Awk: The Power and Promise of a 40-Year-Old Language |work=Fosslife |author=Andy Oram |date=May 19, 2021 |accessdate=June 9, 2021}} * {{cite web |first=Naomi |last=Hamilton |url=https://www.computerworld.com/article/2535126/the-a-z-of-programming-languages--awk.html |title=The A-Z of Programming Languages: AWK |date=May 30, 2008 |work=[[Computerworld]] |access-date=2008-12-12}} β Interview with Alfred V. Aho on AWK * {{cite web | url = http://www.ibm.com/developerworks/library/l-awk1/ | title = Awk by example, Part 1: An intro to the great language with the strange name | access-date = 2009-04-16 | last = Robbins | first = Daniel | author-link = Daniel Robbins (computer programmer) | date = 2000-12-01 | work = Common threads | publisher = IBM DeveloperWorks }} * {{cite web | url = http://www.ibm.com/developerworks/library/l-awk2/ | title = Awk by example, Part 2: Records, loops, and arrays | access-date = 2009-04-16 | last = Robbins | first = Daniel | date = 2001-01-01 | work = Common threads | publisher = IBM DeveloperWorks }} * {{cite web | url = http://www.ibm.com/developerworks/library/l-awk3/ | title = Awk by example, Part 3: String functions and ... checkbooks? | access-date = 2009-04-16 | last = Robbins | first = Daniel | date = 2001-04-01 | work = Common threads | publisher = IBM DeveloperWorks | archive-url= https://web.archive.org/web/20090519074032/http://www.ibm.com/developerworks/linux/library/l-awk3.html| archive-date= 19 May 2009 | url-status= live}} * [https://web.archive.org/web/20081031084509/http://www.think-lamp.com/2008/10/awk-a-boon-for-cli-enthusiasts/ AWK β Become an expert in 60 minutes] * {{man|cu|awk|SUS|pattern scanning and processing language}} * {{man|1|gawk|Linux}} == External links == {{Wikibooks|An Awk Primer}} * [http://doc.cat-v.org/henry_spencer/amazing_awk_assembler/ The Amazing Awk Assembler] by [[Henry Spencer]]. * {{cite web |title=AWK (formerly) at Curlie |url=https://curlie.org/Computers/Programming/Languages/Awk |website=[[Curlie]] |archive-url=https://web.archive.org/web/20220318040955/https://curlie.org/Computers/Programming/Languages/Awk |archive-date=2022-03-18 |language=en |url-status=dead}} * [http://awklang.org awklang.org] The site for things related to the awk language * {{webarchive |url=https://web.archive.org/web/20160403181356/http://awk.info/ |title=Awk Community Portal}} {{Unix commands}} {{Plan 9 commands}} {{Authority control}} {{DEFAULTSORT:Awk}} [[Category:1977 software]] [[Category:Cross-platform software]] [[Category:Domain-specific programming languages]] [[Category:Free and open source interpreters]] [[Category:Pattern matching programming languages]] [[Category:Plan 9 commands]] [[Category:Programming languages created in 1977]] [[Category:Scripting languages]] [[Category:Standard Unix programs]] [[Category:Text-oriented programming languages]] [[Category:Unix SUS2008 utilities]] [[Category:Unix text processing utilities]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:About
(
edit
)
Template:Authority control
(
edit
)
Template:Blockquote
(
edit
)
Template:Cbignore
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite magazine
(
edit
)
Template:Cite news
(
edit
)
Template:Cite tech report
(
edit
)
Template:Cite web
(
edit
)
Template:Comma separated entries
(
edit
)
Template:IPAc-en
(
edit
)
Template:Infobox programming language
(
edit
)
Template:Main other
(
edit
)
Template:Man
(
edit
)
Template:Nowrap
(
edit
)
Template:Plan 9 commands
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Sister project
(
edit
)
Template:URL
(
edit
)
Template:Unix commands
(
edit
)
Template:Webarchive
(
edit
)
Template:Wikibooks
(
edit
)