Editing Aztec Code (section)

== Encoding ==
The encoding process consists of the following steps:
# Converting the source message to a string of bits
# Computing the necessary symbol size and mode message, which determines the Reed–Solomon codeword size
# [[Bit-stuffing]] the message into Reed–Solomon codewords
# Padding the message to a codeword boundary
# Appending check codewords
# Arranging the complete message in a spiral around the core

All conversion between bits strings and other forms is performed according to the [[big-endian]] (most significant bit first) convention.

=== Character set ===
All 8-bit values can be encoded, plus two escape codes:
* FNC1, an escape symbol used to mark the presence of an application identifier, in the same way as in the [[GS1-128]] standard.
* ECI, an escape followed by a 6-digit [[Extended Channel Interpretation]] code, which specifies the character set used to interpret the following bytes.

By default, codes 0–127 are interpreted according to ANSI X3.4 ([[ASCII]]), and 128–255 are interpreted according to [[ISO/IEC 8859-1]]: Latin Alphabet No. 1.  This corresponds to ECI 000003.

Bytes are translated into 4- and 5-bit codes, based on a current decoding mode, with shift and latch codes for changing modes.  Byte values not available this way may be encoded using a general "binary shift" code, which is followed by a length and a number of 8-bit codes.

For changing modes, a ''shift'' affects only the interpretation of the single following code, while a ''latch'' affects all following codes.  Most modes use 5-bit codes, but Digit mode uses 4-bit codes.

{|class="wikitable" style="text-align:center"
|+ Aztec code character encoding
!rowspan=2 scope="col"| Code ||colspan=5 scope="col"| Mode
|rowspan=18|
!rowspan=2 scope="col"| Code ||colspan=4 scope="col"| Mode
|-
!scope="col"| Upper ||scope="col"| Lower ||scope="col"| Mixed ||scope="col"| Punct ||scope="col"| Digit
!scope="col"| Upper ||scope="col"| Lower ||scope="col"| Mixed ||scope="col"| Punct
|-
!scope="row"| 0
| P/S || P/S || P/S || FLG(''n'') || P/S
!scope="row"| 16
| O || o || ^\ || +
|-
!scope="row"| 1
| SP || SP || SP || CR || SP
!scope="row"| 17
| P || p || ^] ||, 
|-
!scope="row"| 2
| A || a || ^A || CR LF || 0
!scope="row"| 18
| Q || q || ^^ || -
|-
!scope="row"| 3
| B || b || ^B || . SP || 1
!scope="row"|  19
| R || r || ^_ || .
|-
!scope="row"| 4
| C || c || ^C ||, SP || 2
!scope="row"| 20
| S || s || @ || /
|-
!scope="row"| 5
| D || d || ^D || : SP || 3
!scope="row"| 21
| T || t || \ || :
|-
!scope="row"| 6
| E || e || ^E || ! || 4
!scope="row"| 22
| U || u || ^ || ;
|-
!scope="row"| 7
| F || f || ^F || " || 5
!scope="row"| 23
| V || v || _ || <
|-
!scope="row"| 8
| G || g || ^G || # || 6
!scope="row"| 24
| W || w || ` || =
|-
!scope="row"| 9
| H || h || ^H || $ || 7
!scope="row"| 25
| X || x || {{pipe}} || >
|-
!scope="row"| 10
| I || i || ^I || % || 8
!scope="row"| 26
| Y || y || ~ || ?
|-
!scope="row"| 11
| J || j || ^J || & || 9
!scope="row"| 27
| Z || z || ^? || [
|-
!scope="row"| 12
| K || k || ^K || ' ||, 
!scope="row"| 28
| L/L || U/S || L/L || ]
|-
!scope="row"| 13
| L || l || ^L || ( || .
!scope="row"| 29
| M/L || M/L || U/L || {
|-
!scope="row"| 14
| M || m || ^M || ) || U/L
!scope="row"| 30
| D/L || D/L || P/L || }
|-
!scope="row"| 15
| N || n || ^[ || * || U/S
!scope="row"| 31
| B/S || B/S || B/S || U/L
|}
* Initial mode is "Upper"
* x/S = Shift to mode x for one character; B/S = shift to 8-bit binary
* x/L = Latch to mode x for following characters
* Punct codes 2–5 encode two bytes each
* The table lists ASCII characters, but it is the byte values that are encoded, even if a non-ASCII character set is in use

B/S (binary shift) is followed by a 5-bit length.  If non-zero, this indicates that 1–31 8-bit bytes follow.  If zero, 11 additional length bits encode the number of following bytes less 31.  (Note that for 32–62 bytes, two 5-bit byte shift sequences are more compact than one 11-bit.)  At the end of the binary sequence, the previous mode is resumed.

FLG(''n'') is followed by a 3-bit ''n'' value.  ''n''=0 encodes FNC1.  ''n''=1–6 is followed by 1–6 digits (in digit mode) which are zero-padded to make a 6-bit ECI identifier.  ''n''=7 is reserved and currently illegal.

=== Mode message ===
The mode message encodes the number of layers (''L'' layers encoded as the integer ''L''−1), and the number of data codewords (''D'' codewords, encoded as the integer ''D''−1) in the message.  All remaining codewords are used as check codewords.

For compact Aztec codes, the number of layers is encoded as a 2-bit value, and the number of data codewords as a 6-bit value, resulting in an 8-bit mode word.  For full Aztec codes, the number of layers is encoded in 5 bits, and the number of data codewords is encoded in 11 bits, making a 16-bit mode word.

The mode word is broken into two or four 4-bit codewords in [[Galois field|GF(16)]], and 5 or 6 Reed–Solomon check words are appended, making a 28- or 40-bit mode message, which is wrapped in a 1-pixel layer around the core. Thus a (15,10) or (15,9) Reed-Solomon code (shortened to (7,2) or (10,4) respectively), over GF(16) is used.

Because an L+1-layer compact Aztec code can hold more data than an L-layer full code, full codes with less than 4 layers are rarely used.

Most importantly, the number of layers determines the size of the Reed–Solomon codewords used.  This varies from 6 to 12 bits:
{|class="wikitable" style="text-align:center;"
|+ Aztec code finite field polynomials
! Bits || Field || Primitive polynomial || Generator polynomial (decimal coefficients) || Used for
|-
| 4 || GF(16) || ''x''<sup>4</sup>+''x''+1 || ''x''<sup>5</sup>+11''x''<sup>4</sup>+4''x''<sup>3</sup>+6''x''<sup>2</sup>+2''x''+1 (Compact code) <br />''x''<sup>6</sup>+7''x''<sup>5</sup>+9''x''<sup>4</sup>+3''x''<sup>3</sup>+12''x''<sup>2</sup>+10''x''+12 (Full code)|| Mode message
|-
| 6 || GF(64) || ''x''<sup>6</sup>+''x''+1 ||depends on number of error correction words|| 1–2 layers
|-
| 8 || GF(256) || ''x''<sup>8</sup>+''x''<sup>5</sup>+''x''<sup>3</sup>+''x''<sup>2</sup>+1 ||depends on number of error correction words|| 3–8 layers
|-
| 10 || GF(1024) || ''x''<sup>10</sup>+''x''<sup>3</sup>+1 ||depends on number of error correction words|| 9–22 layers
|-
| 12 || GF(4096) || ''x''<sup>12</sup>+''x''<sup>6</sup>+''x''<sup>5</sup>+''x''<sup>3</sup>+1 ||depends on number of error correction words || 23–32 layers
|}
The codeword size ''b'' is the smallest even number which ensures that the total number of codewords in the symbol is less than the limit of 2<sup>''b''</sup>−1 which can be corrected by a Reed–Solomon code.

As mentioned above, it is recommended that at least 23% of the available codewords, plus 3, are reserved for correction, and a symbol size is chosen such that the message will fit into the available space.

=== Bit stuffing ===
The data bits are broken into codewords, with the first bit corresponding to the most significant coefficient.  While doing this, code words of all-zero and all-ones are avoided by [[bit stuffing]]: if the first ''b''−1 bits of a code word have the same value, an extra bit with the complementary value is inserted into the data stream.  This insertion takes place whether or not the last bit of the code word would have had the same value or not.

Also, note that this only applies to strings of ''b''−1 bits ''at the beginning of a code word''.  Longer strings of identical bits are permitted as long as they straddle a code word boundary.

When decoding, a code word of all zero or all one may be assumed to be an [[erasure code|erasure]], and corrected more efficiently than a general error.

This process makes the message longer, and the final number of data codewords recorded in the mode message is not known until it is complete.  In rare cases, it may be necessary to jump to the next-largest symbol and begin the process all over again to maintain the minimum fraction of check words.

=== Padding ===
After bit stuffing, the data string is padded to the next codeword boundary by appending 1 bit.  If this would result in a code word of all ones, the last bit is changed to zero (and will be ignored by the decoder as a bit-stuffing bit).  On decoding, the padding bits may be decoded as shift and latch codes, but that will not affect the message content.  The reader must accept and ignore a partial code at the end of the message, as long as it is all-ones.

Additionally, if the total number of data bits available in the symbol is not a multiple of the codeword size, the data string is prefixed with an appropriate number of 0 bits to occupy the extra space.  These bits are not included in the check word computation.

=== Check codewords ===
Both the mode word, and the data, must have check words appended to fill out the available space.  This is computed by appending ''K'' check words such that the entire message is a multiple of the Reed–Solomon polynomial (''x''−2)(''x''−4)...(''x''−2<sup>''K''</sup>).

Note that check words are ''not'' subject to bit stuffing, and may be all-zero or all-one.  Thus, it is not possible to detect the erasure of a check word.

=== Laying out the message ===
[[File:Aztec-Code-With-Reference-Grid.png|alt=|thumb|9-layer (53×53) Aztec code with reference grid highlighted in red.]]
A full Aztec code symbol has, in addition to the core, a "reference grid" of alternating black and white pixels occupying every 16th row and column. A compact Aztec code does not contain this grid. These known pixels allow a reader to maintain alignment with the pixel grid over large symbols. For up to 4 layers (31×31 pixels), this consists only of single lines extending outward from the core, continuing the alternating pattern.  Inside the 5th layer, however, additional rows and columns of alternating pixels are inserted ±16 pixels from the center, so the 5th layer is located ±17 and ±18 pixels from the center, and a 5-layer symbol is 37×37 pixels.

Likewise, additional reference grid rows and columns are inserted ±32 pixels from the center, making a 12-layer symbol 67×67 pixels.  In this case, the 12th layer occupies rings ±31 and ±33 pixels from the center.  The pattern continues indefinitely outward, with 15-pixel blocks of data separated by rows and columns of the reference grid.

One way to construct the symbol is to delete the reference grid entirely and begin with a 14×14-pixel core centered on a 2×2 pixel-white square.  Then break it into 15×15 pixel blocks and insert the reference grid between them.

The mode message begins at the top-left corner of the core and wraps around it clockwise in a 1-bit thick layer.  It begins with the most significant bit of the number of layers and ends with the check words.  For a compact Aztec code, it is broken into four 7-bit pieces to leave room for the orientation marks.  For a full Aztec code, it is broken into four 10-bit pieces, and those pieces are each divided in half by the reference grid.

In some cases, the total capacity of the matrix does not divide evenly by full code words. In such cases, the main message is padded with 0 bits in the beginning. These bits are not included in the check word calculation and should be skipped during decoding. The total matrix capacity for a full symbol can be calculated as (112+16*L)*L for a full Aztec code and (88+16*L)*L for a compact Aztec code, where L is the symbol size in layers.<ref>{{cite web |url=http://recog.ru/blog/standarts/6.html |title=Спецификация Aztec Code (без Small Aztec) |language=ru |trans-title=Aztec Code Specification (without Small Aztec) |url-status=dead |archive-url=https://web.archive.org/web/20200225193653/http://recog.ru/blog/standarts/6.html |archive-date=2020-02-25}}</ref>  As an example, the total matrix capacity of a compact Aztec code with 1 layer is 104 bits.  Since code words are six bits, this gives 17 code words and two extra bits.  Two zero bits are prepended to the message as padding and must be skipped during decoding.

The padded main message begins at the outer top-left of the entire symbol and spirals around it ''counterclockwise'' in a 2-bit thick layer, ending directly above the top-left corner of the core.  This places the bit-stuffed data words, for which erasures can be detected, in the outermost layers of the symbol, which are most prone to erasures.  The check words are stored closer to the core. The last check word ends just above the top left corner of the bull's eye.

With the core in its standard orientation, the first bit of the first data word is placed in the upper-left corner, with additional bits placed in a 2-bit-wide column left-to-right and top-to-bottom.  This continues until 2 rows from the bottom of the symbol when the pattern rotates 90 degrees counterclockwise and continues in a 2-bit high row, bottom-to-top and left-to-right.  After 4 equal-sized quarter layers, the spiral continues with the top-left corner of the next-inner layer, finally ending one pixel above the top-left corner of the core.

Finally, 1 bit are printed as black squares, and 0 bits are printed as white squares.