Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
String interning
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Data structure for reusing strings}} In computer science, '''string interning''' is a method of storing only one copy of each distinct [[String (computer science)|string]] value, which must be [[Immutable object|immutable]].<ref>{{cite web|title=String.Intern Method (String)|url=https://msdn.microsoft.com/en-us/library/system.string.intern(v=vs.110).aspx|website=Microsoft Developer Network|access-date=25 March 2017}}</ref> Interning strings makes some string processing tasks more time-efficient or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a '''string intern pool'''. The single copy of each string is called its ''intern'' and is typically looked up by a method of the string class, for example String.intern()<ref>{{Javadoc:SE|java/lang|String|intern()}}</ref> in [[Java (programming language)|Java]]. All compile-time constant strings in Java are automatically interned using this method.<ref>{{cite web|url=https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html|title=Chapter 15. Expressions|website=docs.oracle.com|accessdate=30 January 2019}}</ref> String interning is supported by some modern [[object-oriented]] [[programming language]]s, including Java, [[Python (programming language)|Python]], [[PHP]] (since 5.4), [[Lua (programming language)|Lua]]<ref>{{cite web|url=http://lua-users.org/wiki/ImmutableObjects|title=lua-users wiki: Immutable Objects|author=|date=|website=lua-users.org|accessdate=30 January 2019}}</ref> and [[List of CLI languages|.NET languages]].<ref>{{cite web|url=https://docs.microsoft.com/en-us/dotnet/api/system.string|title=String Class (System)|last=rpetrusha|date=|website=docs.microsoft.com|accessdate=30 January 2019}}</ref> [[Lisp (programming language)|Lisp]], [[Scheme (programming language)|Scheme]], [[Julia (programming language)|Julia]], [[Ruby (programming language)|Ruby]] and [[Smalltalk]] are among the languages with a [[Symbol (programming)|symbol]] type that are basically interned strings. The library of the [[Standard ML of New Jersey]] contains an <code>atom</code> type that does the same thing. [[Objective-C]]'s selectors, which are mainly used as method names, are interned strings. Objects other than strings can be interned. For example, in Java, when primitive values are [[Object type (object-oriented programming)#Boxing|boxed]] into a [[Primitive wrapper class|wrapper object]], certain values (any <code>boolean</code>, any <code>byte</code>, any <code>char</code> from 0 to 127, and any <code>short</code> or <code>int</code> between β128 and 127) are interned, and any two boxing conversions of one of these values are guaranteed to result in the same object.<ref>{{cite web|url=https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html|title=Chapter 5. Conversions and Promotions|website=docs.oracle.com|accessdate=30 January 2019}}</ref> ==History== [[Lisp (programming language)|Lisp]] introduced the notion of interned strings for its [[Symbol (programming)|symbols]]. Historically, the data structure used as a string intern pool was called an ''oblist'' (when it was implemented as a linked list) or an ''obarray'' (when it was implemented as an array). Modern Lisp dialects typically distinguish symbols from strings; interning a given string returns an existing symbol or creates a new one, whose ''name'' is that string. Symbols often have additional properties that strings do not such as storage for associated values, or namespacing. The distinction is also useful to prevent accidentally comparing an interned string with a not-necessarily-interned string, which could lead to intermittent failures depending on usage patterns. ==Motivation== String interning speeds up string comparisons, which are sometimes a performance bottleneck in applications (such as [[compiler]]s and [[dynamic programming language]] runtimes) that rely heavily on [[associative array]]s with string keys to look up the attributes and methods of an object. Without interning, comparing two distinct strings may involve examining every character of both. This is slow for several reasons: it is inherently [[Time complexity#Linear time|O(n)]] in the length of the strings; it typically requires reads from several regions of [[Computer data storage#Primary storage|memory]], which take time; and the reads fill up the processor cache, meaning there is less cache available for other needs. With interned strings, a simple [[Identity (object-oriented programming)|object identity test]] suffices after the original intern operation; this is typically implemented as a pointer equality test, normally just a single machine instruction with no memory reference at all. String interning also reduces memory usage if there are many instances of the same string value; for instance, it is read from a [[computer network|network]] or from [[Computer storage|storage]]. Such strings may include [[magic number (programming)|magic numbers]] or [[Protocol (computing)|network protocol]] information. For example, XML parsers may intern names of tags and attributes to save memory. Network transfer of objects over Java RMI serialization object streams can transfer strings that are interned more efficiently, as the String object's handle is used in place of duplicate objects upon serialization.<ref>{{cite web|url=https://docs.oracle.com/javase/7/docs/platform/serialization/spec/serial-arch.html|title=Java Object Serialization Specification: 1 - System Architecture|author=|date=|website=docs.oracle.com|accessdate=30 January 2019}}</ref> ==Issues== ===Multithreading=== If the interned strings are not immutable, one source of drawbacks is that string interning may be problematic when mixed with [[Multithreading (software)|multithreading]]. In many systems, string interns are required to be global across all threads within an address space (or across any contexts which may share pointers), thus the intern pool(s) are global resources that should be synchronized for safe concurrent access. While this only affects string creation (where the intern pool must be checked and modified if necessary), and [[double-checked locking]] may be used on platforms where this is a safe optimization, the need for mutual exclusion when modifying the intern pool can be expensive.<ref>{{cite web|url=http://java-performance.info/string-intern-java-6-7-8-multithreaded-access/|title=String.intern in Java 6, 7 and 8 - multithreaded access|date=3 September 2013|website=java-performance.info|accessdate=30 January 2019}}</ref> Contention can also be reduced by partitioning the string space into multiple pools, which can be synchronized independently of one another. ===Reclaiming unused interned strings=== Many implementations of interned strings do not attempt to reclaim (manually or otherwise) strings that are no longer used. For applications where the number of interned strings is small or fixed, or which are short-lived, the loss of system resources may be tolerable. But for long-running systems where large numbers of string interns are created at runtime, the need to reclaim unused interns may arise. This task can be handled by a [[Garbage collection (computer science)|garbage collector]], though for this to work correctly [[weak reference]]s to string interns must be stored in the intern pool. ==See also== *[[Flyweight pattern]] ==References== {{Reflist}} ==External links== *[http://msdn2.microsoft.com/en-us/library/ms177906.aspx Visual J# String class] *[http://msdn2.microsoft.com/en-us/library/system.string.intern.aspx .NET String Class] *[http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/Interners.java Guava Java Library - Interner - Non-permgen String.intern and supports other immutable types with weak and strong referenced implementations] *[https://websparrow.org/java/string-intern-method-in-java Understanding Java's intern() Method for Strings] [[Category:Software optimization]] [[Category:String (computer science)|Interning]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite web
(
edit
)
Template:Javadoc:SE
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)