Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
AWK
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Examples == === Hello, World! === Here is the customary [["Hello, World!" program]] written in AWK: <syntaxhighlight lang="awk"> BEGIN { print "Hello, world!" exit } </syntaxhighlight> === Print lines longer than 80 characters === Print all lines longer than 80 characters. The default action is to print the current line. <syntaxhighlight lang="awk"> length($0) > 80 </syntaxhighlight> === Count words === Count words in the input and print the number of lines, words, and characters (like [[wc (Unix)|wc]]): <syntaxhighlight lang="awk"> { words += NF chars += length + 1 # add one to account for the newline character at the end of each record (line) } END { print NR, words, chars } </syntaxhighlight> As there is no pattern for the first line of the program, every line of input matches by default, so the increment actions are executed for every line. <code>words += NF</code> is shorthand for <code>words = words + NF</code>. === Sum last word === <syntaxhighlight lang="awk"> { s += $NF } END { print s + 0 } </syntaxhighlight> <code>s</code> is incremented by the numeric value of <code>$NF</code>, which is the last word on the line as defined by AWK's field separator (by default, white-space). <code>NF</code> is the number of fields in the current line, e.g. 4. Since <code>$4</code> is the value of the fourth field, <code>$NF</code> is the value of the last field in the line regardless of how many fields this line has, or whether it has more or fewer fields than surrounding lines. <code>$</code> is actually a unary operator with the highest [[operator precedence]]. (If the line has no fields, then <code>NF</code> is 0, <code>$0</code> is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0.) At the end of the input, the <code>END</code> pattern matches, so <code>s</code> is printed. However, since there may have been no lines of input at all, in which case no value has ever been assigned to <code>s</code>, <code>s</code> will be an empty string by default. Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value. This results from AWK's arithmetic operators, like addition, [[Type conversion|implicitly casting]] their operands to numbers before computation as required. (Similarly, concatenating a variable with an empty string coerces from a number to a string, e.g., <code>s ""</code>. Note, there is no operator to concatenate strings, they are just placed adjacently.) On an empty input, the coercion in <code>{ print s + 0 }</code> causes the program to print <code>0</code>, whereas with just the action <code>{ print s }</code>, an empty line would be printed. === Match a range of input lines === <syntaxhighlight lang="awk"> NR % 4 == 1, NR % 4 == 3 { printf "%6d %s\n", NR, $0 } </syntaxhighlight> The action statement prints each line numbered. The printf function emulates the standard C [[printf]] and works similarly to the print command described above. The pattern to match, however, works as follows: ''NR'' is the number of records, typically lines of input, AWK has so far read, i.e. the current line number, starting at 1 for the first line of input. ''%'' is the [[modulo operation|modulo]] operator. ''NR % 4 == 1'' is true for the 1st, 5th, 9th, etc., lines of input. Likewise, ''NR % 4 == 3'' is true for the 3rd, 7th, 11th, etc., lines of input. The range pattern is false until the first part matches, on line 1, and then remains true up to and including when the second part matches, on line 3. It then stays false until the first part matches again on line 5. Thus, the program prints lines 1,2,3, skips line 4, and then 5,6,7, and so on. For each line, it prints the line number (on a 6 character-wide field) and then the line contents. For example, when executed on this input: Rome Florence Milan Naples Turin Venice The previous program prints: 1 Rome 2 Florence 3 Milan 5 Turin 6 Venice ==== Printing the initial or the final part of a file ==== As a special case, when the first part of a range pattern is constantly true, e.g. ''1'', the range will start at the beginning of the input. Similarly, if the second part is constantly false, e.g. ''0'', the range will continue until the end of input. For example, <syntaxhighlight lang="awk"> /^--cut here--$/, 0 </syntaxhighlight> prints lines of input from the first line matching the regular expression ''^--cut here--$'', that is, a line containing only the phrase "--cut here--", to the end. === Calculate word frequencies === [[Word frequency]] using [[associative array]]s: <syntaxhighlight lang="awk"> BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) words[tolower($i)]++ } END { for (i in words) print i, words[i] } </syntaxhighlight> The BEGIN block sets the field separator to any sequence of non-alphabetic characters. Separators can be regular expressions. After that, we get to a bare action, which performs the action on every input line. In this case, for every field on the line, we add one to the number of times that word, first converted to lowercase, appears. Finally, in the END block, we print the words with their frequencies. The line for (i in words) creates a loop that goes through the array ''words'', setting ''i'' to each ''subscript'' of the array. This is different from most languages, where such a loop goes through each ''value'' in the array. The loop thus prints out each word followed by its frequency count. <code>tolower</code> was an addition to the One True awk (see below) made after the book was published. === Match pattern from command line === This program can be represented in several ways. The first one uses the [[Bourne shell]] to make a shell script that does everything. It is the shortest of these methods: <syntaxhighlight lang="bash"> #!/bin/sh pattern="$1" shift awk '/'"$pattern"'/ { print FILENAME ":" $0 }' "$@" </syntaxhighlight> The <code>$pattern</code> in the awk command is not protected by single quotes so that the shell does expand the variable but it needs to be put in double quotes to properly handle patterns containing spaces. A pattern by itself in the usual way checks to see if the whole line (<code>$0</code>) matches. <code>FILENAME</code> contains the current filename. awk has no explicit concatenation operator; two adjacent strings concatenate them. <code>$0</code> expands to the original unchanged input line. There are alternate ways of writing this. This shell script accesses the environment directly from within awk: <syntaxhighlight lang="bash"> #!/bin/sh export pattern="$1" shift awk '$0 ~ ENVIRON["pattern"] { print FILENAME ":" $0 }' "$@" </syntaxhighlight> This is a shell script that uses <code>ENVIRON</code>, an array introduced in a newer version of the One True awk after the book was published. The subscript of <code>ENVIRON</code> is the name of an environment variable; its result is the variable's value. This is like the [[getenv]] function in various standard libraries and [[POSIX]]. The shell script makes an environment variable <code>pattern</code> containing the first argument, then drops that argument and has awk look for the pattern in each file. <code>~</code> checks to see if its left operand matches its right operand; <code>!~</code> is its inverse. A regular expression is just a string and can be stored in variables. The next way uses command-line variable assignment, in which an argument to awk can be seen as an assignment to a variable: <syntaxhighlight lang="bash"> #!/bin/sh pattern="$1" shift awk '$0 ~ pattern { print FILENAME ":" $0 }' pattern="$pattern" "$@" </syntaxhighlight> Or You can use the ''-v var=value'' command line option (e.g. ''awk -v pattern="$pattern" ...''). Finally, this is written in pure awk, without help from a shell or without the need to know too much about the implementation of the awk script (as the variable assignment on command line one does), but is a bit lengthy: <syntaxhighlight lang="awk"> BEGIN { pattern = ARGV[1] for (i = 1; i < ARGC; i++) # remove first argument ARGV[i] = ARGV[i + 1] ARGC-- if (ARGC == 1) { # the pattern was the only thing, so force read from standard input (used by book) ARGC = 2 ARGV[1] = "-" } } $0 ~ pattern { print FILENAME ":" $0 } </syntaxhighlight> The <code>BEGIN</code> is necessary not only to extract the first argument, but also to prevent it from being interpreted as a filename after the <code>BEGIN</code> block ends. <code>ARGC</code>, the number of arguments, is always guaranteed to be β₯1, as <code>ARGV[0]</code> is the name of the command that executed the script, most often the string <code>"awk"</code>. <code>ARGV[ARGC]</code> is the empty string, <code>""</code>. <code>#</code> initiates a comment that expands to the end of the line. Note the <code>if</code> block. awk only checks to see if it should read from standard input before it runs the command. This means that awk 'prog' only works because the fact that there are no filenames is only checked before <code>prog</code> is run! If you explicitly set <code>ARGC</code> to 1 so that there are no arguments, awk will simply quit because it feels there are no more input files. Therefore, you need to explicitly say to read from standard input with the special filename <code>-</code>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)