Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Symbol table
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Data structure used by a language translator such as a compiler or interpreter}} {{More citations needed|date=November 2012}} In [[computer science]], a '''symbol table''' is a [[data structure]] used by a language [[Translator_(computing)|translator]] such as a [[compiler]] or [[interpreter (computing)|interpreter]], where each [[Identifier (computer languages)|identifier]], [[Symbol_(programming)|symbol]], [[Constant_(computer_programming)|constant]], [[Method_(computer_programming)|procedure]] and [[Subroutine|function]] in a program's [[source code]] is associated with information relating to its [[Declaration (computer programming)|declaration]] or appearance in the source. In other words, the entries of a symbol table store the information related to the entry's corresponding symbol.{{sfn|Copper|Torczon|2011|p=253}} ==Background== A symbol table may only exist in memory during the translation process, or it may be embedded in the output of the translation, such as in an [[Application binary interface|ABI]] [[object file]] for later use. For example, it might be used during an interactive [[debugger|debugging session]], or as a resource for formatting a diagnostic report during or after [[execution (computers)|execution]] of a program.<ref>{{cite book|last1=Nguyen|first1=Binh|title=Linux Dictionary|date=2004|page=1482|url=https://books.google.com/books?id=vdZWBQAAQBAJ|accessdate=Apr 14, 2018}}</ref> ==Description== The minimum information contained in a symbol table used by a translator and [[intermediate representation]] (IR) includes the symbol's name and its location or address. For a compiler targeting a platform with a concept of [[relocatability (computing)|relocatability]], it will also contain relocatability attributes (absolute, relocatable, etc.) and needed relocation information for relocatable symbols. Symbol tables for [[high-level programming language]]s may store the symbol's type: string, integer, floating-point, etc., its size, and its dimensions and its bounds. Not all of this information is included in the output file, but may be provided for use in [[debugging]]. In many cases, the symbol's [[cross-reference]] information is stored with or linked to the symbol table. Most compilers print some or all of this information in symbol table and cross-reference listings at the end of translation.{{sfn|Copper|Torczon|2011|p=253}} ==Implementation== Numerous [[data structure]]s are available for implementing tables. Trees, linear lists and [[self-organizing list]]s can all be used to implement a symbol table. The symbol table is accessed by most phases of a compiler, beginning with [[lexical analysis]], and continuing through optimization. A compiler may use one large symbol table for all symbols or use separated, or hierarchical symbol tables for different [[Scope (programming)|scopes]]. For example, in a scoped language such as [[ALGOL|Algol]] or [[PL/I]] a symbol "p" can be declared separately in several procedures, perhaps with different attributes. The scope of each declaration is the section of the program in which references to "p" resolve to that declaration. Each declaration represents a unique identifier "p". The symbol table must have some means of differentiating references to the different "p"s. A common data structure used to implement symbol tables is the [[hash table]]. The time for searching in hash tables is independent of the number of elements stored in the table, so it is efficient for a large number of elements. It also simplifies the classification of literals in tabular format by including the classification in calculation of the hash key.{{sfn|Copper|Torczon|2011|p=254}} As the lexical analyser spends a great proportion of its time looking up the symbol table, this activity has a crucial effect on the overall speed of the compiler. A symbol table must be organised in such a way that entries can be found as quickly as possible. Hash tables are usually used to organise a symbol table, where the keyword or identifier is 'hashed' to produce an array subscript. [[hash collision|Collisions]] are inevitable in a hash table, and a common way of handling them is to store the synonym in the next available free space in the table. ==Applications== An [[object file]] will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a [[Linker (computing)|linker]] will identify and resolve these symbol references. Usually all undefined external symbols will be searched for in one or more [[Library (computing)|object libraries]]. If a module is found that defines that symbol it is linked together with the first object file, and any undefined external identifiers are added to the list of identifiers to be looked up. This process continues until all external references have been resolved. It is an error if one or more remains unresolved at the end of the process. While [[reverse engineering]] an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been [[strip (Unix)|stripped]] or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program. ==Example== Consider the following program written in [[C (programming language)|C]]: <syntaxhighlight lang="C"> // Declare an external function extern double bar(double x); // Define a public function double foo(int count) { double sum = 0.0; // Sum all the values bar(1) to bar(count) for (int i = 1; i <= count; i++) sum += bar((double) i); return sum; } </syntaxhighlight> A C compiler that parses this code will contain at least the following symbol table entries: {| class="wikitable" |- style="background:silver" !style="width:8em" | Symbol name !style="width:10em" | Type !style="width:10em" | Scope |- | <code>bar</code> || function, double || extern |- | <code>x</code> || double || function parameter |- | <code>foo</code> || function, double || global |- | <code>count</code> || int || function parameter |- | <code>sum</code> || double || block local |- | <code>i</code> || int || for-loop statement |- |} In addition, the symbol table may also contain entries generated by the compiler for intermediate expression values (e.g., the expression that casts the <code>i</code> loop variable into a <code>double</code>, and the return value of the call to function <code>bar()</code>), statement labels, and so forth. ==Example: SysV ABI== {| class="wikitable floatright" |+Example table: SysV ABI |- ! Address !! Type !! Name |- style="font-size:105%;font-family:monospace;;" |00000020 || style="text-align:center;" | a || T_BIT |- style="font-size:105%;font-family:monospace;;" |00000040 || style="text-align:center;" | a || F_BIT |- style="font-size:105%;font-family:monospace;;" |00000080 || style="text-align:center;" | a || I_BIT |- style="font-size:105%;font-family:monospace;;" |20000004 || style="text-align:center;" | t || irqvec |- style="font-size:105%;font-family:monospace;;" |20000008 || style="text-align:center;" | t || fiqvec |- style="font-size:105%;font-family:monospace;;" |2000000c || style="text-align:center;" | t || InitReset |- style="font-size:105%;font-family:monospace;;" |20000018 || style="text-align:center;" | T || _main |- style="font-size:105%;font-family:monospace;;" |20000024 || style="text-align:center;" | t || End |- style="font-size:105%;font-family:monospace;;" |20000030 || style="text-align:center;" | T || AT91F_US3_CfgPIO_useB |- style="font-size:105%;font-family:monospace;;" |2000005c || style="text-align:center;" | t || AT91F_PIO_CfgPeriph |- style="font-size:105%;font-family:monospace;;" |200000b0 || style="text-align:center;" | T || [[Entry point|main]] |- style="font-size:105%;font-family:monospace;;" |20000120 || style="text-align:center;" | T || AT91F_DBGU_Printk |- style="font-size:105%;font-family:monospace;;" |20000190 || style="text-align:center;" | t || AT91F_US_TxReady |- style="font-size:105%;font-family:monospace;;" |200001c0 || style="text-align:center;" | t || AT91F_US_PutChar |- style="font-size:105%;font-family:monospace;;" |200001f8 || style="text-align:center;" | T || AT91F_SpuriousHandler |- style="font-size:105%;font-family:monospace;;" |20000214 || style="text-align:center;" | T || AT91F_DataAbort |- style="font-size:105%;font-family:monospace;;" |20000230 || style="text-align:center;" | T || AT91F_FetchAbort |- style="font-size:105%;font-family:monospace;;" |2000024c || style="text-align:center;" | T || AT91F_Undef |- style="font-size:105%;font-family:monospace;;" |20000268 || style="text-align:center;" | T || AT91F_UndefHandler |- style="font-size:105%;font-family:monospace;;" |20000284 || style="text-align:center;" | T || AT91F_LowLevelInit |- style="font-size:105%;font-family:monospace;;" |200002e0 || style="text-align:center;" | t || AT91F_DBGU_CfgPIO |- style="font-size:105%;font-family:monospace;;" |2000030c || style="text-align:center;" | t || AT91F_PIO_CfgPeriph |- style="font-size:105%;font-family:monospace;;" |20000360 || style="text-align:center;" | t || AT91F_US_Configure |- style="font-size:105%;font-family:monospace;;" |200003dc || style="text-align:center;" | t || AT91F_US_SetBaudrate |- style="font-size:105%;font-family:monospace;;" |2000041c || style="text-align:center;" | t || AT91F_US_Baudrate |- style="font-size:105%;font-family:monospace;;" |200004ec || style="text-align:center;" | t || AT91F_US_SetTimeguard |- style="font-size:105%;font-family:monospace;;" |2000051c || style="text-align:center;" | t || AT91F_PDC_Open |- style="font-size:105%;font-family:monospace;;" |2000059c || style="text-align:center;" | t || AT91F_PDC_DisableRx |- style="font-size:105%;font-family:monospace;;" |200005c8 || style="text-align:center;" | t || AT91F_PDC_DisableTx |- style="font-size:105%;font-family:monospace;;" |200005f4 || style="text-align:center;" | t || AT91F_PDC_SetNextTx |- style="font-size:105%;font-family:monospace;;" |20000638 || style="text-align:center;" | t || AT91F_PDC_SetNextRx |- style="font-size:105%;font-family:monospace;;" |2000067c || style="text-align:center;" | t || AT91F_PDC_SetTx |- style="font-size:105%;font-family:monospace;;" |200006c0 || style="text-align:center;" | t || AT91F_PDC_SetRx |- style="font-size:105%;font-family:monospace;;" |20000704 || style="text-align:center;" | t || AT91F_PDC_EnableRx |- style="font-size:105%;font-family:monospace;;" |20000730 || style="text-align:center;" | t || AT91F_PDC_EnableTx |- style="font-size:105%;font-family:monospace;;" |2000075c || style="text-align:center;" | t || AT91F_US_EnableTx |- style="font-size:105%;font-family:monospace;;" |20000788 || style="text-align:center;" | T || __aeabi_uidiv |- style="font-size:105%;font-family:monospace;;" |20000788 || style="text-align:center;" | T || __udivsi3 |- style="font-size:105%;font-family:monospace;;" |20000884 || style="text-align:center;" | T || __aeabi_uidivmod |- style="font-size:105%;font-family:monospace;;" |2000089c || style="text-align:center;" | T || __aeabi_idiv0 |- style="font-size:105%;font-family:monospace;;" |2000089c || style="text-align:center;" | T || __aeabi_ldiv0 |- style="font-size:105%;font-family:monospace;;" |2000089c || style="text-align:center;" | T || __div0 |- style="font-size:105%;font-family:monospace;;" |200009a0 || style="text-align:center;" | D || _data |- style="font-size:105%;font-family:monospace;;" |200009a0 || style="text-align:center;" | A || _etext |- style="font-size:105%;font-family:monospace;;" |200009a4 || style="text-align:center;" | A || __bss_end__ |- style="font-size:105%;font-family:monospace;;" |200009a4 || style="text-align:center;" | A || __bss_start |- style="font-size:105%;font-family:monospace;;" |200009a4 || style="text-align:center;" | A || __bss_start__ |- style="font-size:105%;font-family:monospace;;" |200009a4 || style="text-align:center;" | A || _edata |- style="font-size:105%;font-family:monospace;;" |200009a4 || style="text-align:center;" | A || _end |} An example of a symbol table can be found in the [[SysV]] [[Application Binary Interface]] (ABI) specification, which mandates how [[Symbol (programming)|symbols]] are to be laid out in a binary file, so that different compilers, linkers and loaders can all consistently find and work with the symbols in a compiled object. The SysV ABI is implemented in the [[GNU Binary Utilities|GNU binutils']] [[nm (Unix)|nm]] utility. This format uses a sorted [[memory address]] field, a "symbol type" field, and a symbol identifier (called "Name").<ref>{{cite web |title=nm |url=http://sourceware.org/binutils/docs-2.17/binutils/nm.html#nm|website=sourceware.org |accessdate=May 30, 2020}}</ref> The symbol types in the SysV ABI (and nm's output) indicate the nature of each entry in the symbol table. Each symbol type is represented by a single character. For example, symbol table entries representing initialized data are denoted by the character "d" and symbol table entries for functions have the symbol type "t" (because executable code is located in the ''text'' section of an object file). Additionally, the capitalization of the symbol type indicates the type of linkage: lower-case letters indicate the symbol is local and upper-case indicates external (global) linkage. == Example: the Python symbol table == The [[Python (programming language)|Python]] programming language includes extensive support for creating and manipulating symbol tables.<ref>symtable β [https://docs.python.org/3/library/symtable.html Python documentation]</ref> Properties that can be queried include whether a given symbol is a [[free variable]] or a [[bound variable]], whether it is [[block scope]] or [[global scope]], whether it is imported, and what [[namespace]] it belongs to. == Example: Dynamic symbol tables == Some programming languages allow the symbol table to be manipulated at run-time, so that symbols can be added at any time. [[Racket (programming language)|Racket]] is an example of such a language.<ref>Symbols - [https://docs.racket-lang.org/reference/symbols.html Racket Documentation]</ref> Both the [[LISP]] and the [[Scheme (programming language)|Scheme]] programming languages allow arbitrary, generic properties to be associated with each symbol.<ref>Symbols - [https://www.gnu.org/software/guile/manual/html_node/Symbols.html Guile Documentation]</ref> The [[Prolog]] programming language is essentially a symbol-table manipulation language; symbols are called ''atoms'', and the relationships between symbols can be reasoned over. Similarly, [[OpenCog]] provides a dynamic symbol table, called the ''atomspace'', which is used for [[knowledge representation]]. ==See also== *[[Debug symbol]] *[[.debug_info]] ==References== {{Reflist}} ==Bibliography== * {{cite book|title= Engineering a Compiler|edition=2|doi=10.1016/C2009-0-27982-7|publisher=[[Elsevier]], [[Rice University]]|isbn=978-0-12-088478-0|location=[[Houston]], [[Texas]]|first1=Keith D.|last1=Copper|first2=Linda|last2=Torczon|date=18 January 2011|s2cid=40425497|url=https://www.elsevier.com/books/engineering-a-compiler/cooper/978-0-12-088478-0}} {{Authority control}} [[Category:Compiler structures]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Authority control
(
edit
)
Template:Cite book
(
edit
)
Template:Cite web
(
edit
)
Template:More citations needed
(
edit
)
Template:Reflist
(
edit
)
Template:Sfn
(
edit
)
Template:Short description
(
edit
)