Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Threaded code
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Token threading=== Token-threaded code implements the thread as a list of indices into a table of operations; the index width is naturally chosen to be as small as possible for density and efficiency. 1 byte / 8-bits is the natural choice for ease of programming, but smaller sizes like 4-bits, or larger like 12 or 16 bits, can be used depending on the number of operations supported. As long as the index width is chosen to be narrower than a machine pointer, it will naturally be more compact than the other threading types without much special effort by the programmer. It is usually half to three-fourths the size of other threadings, which are themselves a quarter to an eighth the size of non-threaded code. The table's pointers can either be indirect or direct. Some Forth compilers produce token-threaded code. Some programmers consider the "[[p-code machine|p-code]]" generated by some [[Pascal (programming language)|Pascal]] compilers, as well as the [[bytecode]]s used by [[.NET Framework|.NET]], [[Java (programming language)|Java]], BASIC and some [[C (programming language)|C]] compilers, to be token-threading. A common approach, historically, is [[bytecode]], which typically uses 8-bit opcodes with a stack-based virtual machine. The archetypal bytecode [[Interpreter (computing)|interpreter]] is known as a "decode and dispatch interpreter" and follows the form: <syntaxhighlight lang="c"> start: vpc = &thread dispatch: addr = decode(&vpc) // Convert the next bytecode operation to a pointer to machine code that implements it // Any inter-instruction operations are performed here (e.g. updating global state, event processing, etc) jump addr CODE_PTR decode(BYTE_CODE **p) { // In a more complex encoding, there may be multiple tables to choose between or control/mode flags return table[*(*p)++]; } thread: /* Contains bytecode, not machine addresses. Hence it is more compact. */ 1 /*pushA*/ 2 /*pushB*/ 0 /*add*/ table: &add /* table[0] = address of machine code that implements bytecode 0 */ &pushA /* table[1] ... */ &pushB /* table[2] ... */ pushA: *sp++ = A jump dispatch pushB: *sp++ = B jump dispatch add: addend1 = *--sp addend2 = *--sp *sp++ = addend1 + addend2 jump dispatch </syntaxhighlight> If the virtual machine uses only byte-size instructions, <code>decode()</code> is simply a fetch from <code>thread</code>, but often there are commonly used 1-byte instructions plus some less-common multibyte instructions (see [[complex instruction set computer]]), in which case <code>decode()</code> is more complex. The decoding of single byte opcodes can be very simply and efficiently handled by a branch table using the opcode directly as an index. For instructions where the individual operations are simple, such as "push" and "add", the [[Computational overhead|overhead]] involved in deciding what to execute is larger than the cost of actually executing it, so such interpreters are often much slower than machine code. However, for more complex ("compound") instructions, the overhead percentage is proportionally less significant. There are times when token-threaded code can sometimes run faster than the equivalent machine code when that machine code ends up being too large to fit in the physical CPU's L1 instruction cache. The higher [[code density]] of threaded code, especially token-threaded code, can allow it to fit entirely in the L1 cache when it otherwise would not have, thereby avoiding cache thrashing. However, threaded code consumes both instruction cache (for the implementation of each operation) as well as data cache (for the bytecode and tables) unlike machine code which only consumes instruction cache; this means threaded code will eat into the budget for the amount of data that can be held for processing by the CPU at any given time. In any case, if the problem being computed involves applying a large number of operations to a small amount of data then using threaded code may be an ideal optimization. <ref name="heller"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)