Editing Programming language (section)

==Elements==
Every programming language includes fundamental elements for describing data and the operations or transformations applied to them, such as adding two numbers or selecting an item from a collection. These elements are governed by syntactic and semantic rules that define their structure and meaning, respectively.

===Syntax===
{{Main|Syntax (programming languages)}}
[[File:Python add5 parse.png|thumb|367px|[[Parse tree]] of [[Python (programming language)|Python code]] with inset tokenization]]
[[File:Python add5 syntax.svg|thumb|292px|[[Syntax highlighting]] is often used to aid programmers in recognizing elements of source code. The language above is [[Python (programming language)|Python]].]]
A programming language's surface form is known as its [[syntax (programming languages)|syntax]]. Most programming languages are purely textual; they use sequences of text including words, numbers, and punctuation, much like written natural languages. On the other hand, some programming languages are [[visual programming language|graphical]], using visual relationships between symbols to specify a program.

The syntax of a language describes the possible combinations of symbols that form a syntactically correct program. The meaning given to a combination of symbols is handled by semantics (either [[Formal semantics of programming languages|formal]] or hard-coded in a [[Reference implementation (computing)|reference implementation]]). Since most languages are textual, this article discusses textual syntax.

The programming language syntax is usually defined using a combination of [[regular expression]]s (for [[lexical analysis|lexical]] structure) and [[Backus–Naur form]] (for [[context-free grammar|grammatical]] structure). Below is a simple grammar, based on [[Lisp (programming language)|Lisp]]:
<syntaxhighlight lang="bnf">
expression ::= atom | list
atom       ::= number | symbol
number     ::= [+-]?['0'-'9']+
symbol     ::= ['A'-'Z''a'-'z'].*
list       ::= '(' expression* ')'
</syntaxhighlight>

This grammar specifies the following:
* an ''expression'' is either an ''atom'' or a ''list'';
* an ''atom'' is either a ''number'' or a ''symbol'';
* a ''number'' is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign;
* a ''symbol'' is a letter followed by zero or more of any alphabetical characters (excluding whitespace); and
* a ''list'' is a matched pair of parentheses, with zero or more ''expressions'' inside it.

The following are examples of well-formed token sequences in this grammar: <code>12345</code>, <code>()</code> and <code>(a b c232 (1))</code>.

Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit [[undefined behavior]]. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it.

Using [[natural language]] as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false:
* "[[Colorless green ideas sleep furiously]]." is grammatically well-formed but has no generally accepted meaning.
* "John is a married bachelor." is grammatically [[well-formedness|well-formed]] but expresses a meaning that cannot be true.

The following [[C (programming language)|C language]] fragment is syntactically correct, but performs operations that are not semantically defined (the operation <code>*p >> 4</code> has no meaning for a value having a complex type and <code>p->im</code> is not defined because the value of <code>p</code> is the [[null pointer]]):

<syntaxhighlight lang="c">
complex *p = NULL;
complex abs_p = sqrt(*p >> 4 + p->im);
</syntaxhighlight>

If the [[type declaration]] on the first line were omitted, the program would trigger an error on the undefined variable <code>p</code> during compilation. However, the program would still be syntactically correct since type declarations provide only semantic information.

The grammar needed to specify a programming language can be classified by its position in the [[Chomsky hierarchy]]. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are [[context-free grammar]]s.<ref>{{cite book|author=Michael Sipser|year=1996|title=Introduction to the Theory of Computation|publisher=PWS Publishing|isbn=978-0-534-94728-6 |author-link=Michael Sipser|title-link=Introduction to the Theory of Computation}} Section 2.2: Pushdown Automata, pp.101–114.</ref> Some languages, including Perl and Lisp, contain constructs that allow execution during the parsing phase. Languages that have constructs that allow the programmer to alter the behavior of the parser make syntax analysis an [[undecidable problem]], and generally blur the distinction between parsing and execution.<ref>Jeffrey Kegler, "[http://www.jeffreykegler.com/Home/perl-and-undecidability Perl and Undecidability] {{webarchive|url=https://web.archive.org/web/20090817183115/http://www.jeffreykegler.com/Home/perl-and-undecidability |date=17 August 2009 }}", ''The Perl Review''. Papers 2 and 3 prove, using respectively [[Rice's theorem]] and direct reduction to the [[halting problem]], that the parsing of Perl programs is in general undecidable.</ref> In contrast to [[Lisp macro|Lisp's macro system]] and Perl's <code>BEGIN</code> blocks, which may contain general computations, C macros are merely string replacements and do not require code execution.<ref>Marty Hall, 1995, [http://www.apl.jhu.edu/~hall/Lisp-Notes/Macros.html Lecture Notes: Macros] {{webarchive|url=https://web.archive.org/web/20130806054148/http://www.apl.jhu.edu/~hall/Lisp-Notes/Macros.html |date=6 August 2013 }}, [[PostScript]] [http://www.apl.jhu.edu/~hall/Lisp-Notes/Macros.ps version] {{webarchive|url=https://web.archive.org/web/20000817211709/http://www.apl.jhu.edu/~hall/Lisp-Notes/Macros.ps |date=17 August 2000 }}</ref>

===Semantics===
{{Logical connectives sidebar}}
The term [[Semantics#Computer science|''semantics'']] refers to the meaning of languages, as opposed to their form ([[#Syntax|syntax]]).

====Static semantics====
Static semantics defines restrictions on the structure of valid texts that are hard or impossible to express in standard syntactic formalisms.<ref name="Aaby 2004"/>{{Failed verification|date=January 2023|reason=This site says nothing about "static semantics" or any connection between semantics and "structure" or "restrictions".}} For compiled languages, static semantics essentially include those semantic rules that can be checked at compile time. Examples include checking that every [[identifier]] is declared before it is used (in languages that require such declarations) or that the labels on the arms of a [[case statement]] are distinct.<ref>Michael Lee Scott, ''Programming language pragmatics'', Edition 2, Morgan Kaufmann, 2006, {{ISBN|0-12-633951-1}}, p. 18–19</ref> Many important restrictions of this type, like checking that identifiers are used in the appropriate context (e.g. not adding an integer to a function name), or that [[subroutine]] calls have the appropriate number and type of arguments, can be enforced by defining them as rules in a [[logic]] called a [[type system]]. Other forms of [[static code analysis|static analyses]] like [[data flow analysis]] may also be part of static semantics. Programming languages such as [[Java (programming language)|Java]] and [[C Sharp (programming language)|C#]] have [[definite assignment analysis]], a form of data flow analysis, as part of their respective static semantics.<ref name=":1">{{Cite book |last=Winskel |first=Glynn |url=https://books.google.com/books?id=JzUNn6uUxm0C |title=The Formal Semantics of Programming Languages: An Introduction |date=5 February 1993 |publisher=MIT Press |isbn=978-0-262-73103-4 |language=en}}</ref>

====Dynamic semantics====
{{Main|Semantics of programming languages}}
{{unreferenced|section|date=April 2024}}
Once data has been specified, the machine must be instructed to perform operations on the data. For example, the semantics may define the [[evaluation strategy|strategy]] by which expressions are evaluated to values, or the manner in which [[control flow|control structures]] conditionally execute [[Statement (computer science)|statements]]. The ''dynamic semantics'' (also known as ''execution semantics'') of a language defines how and when the various constructs of a language should produce a program behavior. There are many ways of defining execution semantics. Natural language is often used to specify the execution semantics of languages commonly used in practice. A significant amount of academic research goes into [[formal semantics of programming languages]], which allows execution semantics to be specified in a formal manner. Results from this field of research have seen limited application to programming language design and implementation outside academia.<ref name=":1" />

===Type system===
{{Main|Data type|Type system|Type safety}}

A [[data type]] is a set of allowable values and operations that can be performed on these values.{{sfn|Sebesta|2012|p=244}} Each programming language's [[type system]] defines which data types exist, the type of an [[Expression (mathematics)|expression]], and how [[type equivalence]] and [[type compatibility]] function in the language.{{sfn|Sebesta|2012|p=245}}

According to [[type theory]], a language is fully typed if the specification of every operation defines types of data to which the operation is applicable.<ref name="typing">{{cite web|url=http://www.acooke.org/comp-lang.html|author=Andrew Cooke|title=Introduction To Computer Languages|access-date=13 July 2012|url-status=live|archive-url=https://web.archive.org/web/20120815140215/http://www.acooke.org/comp-lang.html|archive-date=15 August 2012}}</ref> In contrast, an untyped language, such as most [[assembly language]]s, allows any operation to be performed on any data, generally sequences of bits of various lengths.<ref name="typing"/> In practice, while few languages are fully typed, most offer a degree of typing.<ref name="typing"/> 

Because different types (such as [[integer]]s and [[floating point|floats]]) represent values differently, unexpected results will occur if one type is used when another is expected. [[Type checking]] will flag this error, usually at [[compile time]] (runtime type checking is more costly).{{sfn|Sebesta|2012|pp=15, 408–409}} With [[Strongly-typed programming language|strong typing]], [[type error]]s can always be detected unless variables are explicitly [[type conversion|cast]] to a different type. [[Weak typing]] occurs when languages allow implicit casting—for example, to enable operations between variables of different types without the programmer making an explicit type conversion. The more cases in which this [[type coercion]] is allowed, the fewer type errors can be detected.{{sfn|Sebesta|2012|pp=303–304}}  
====Commonly supported types====
{{See also|Primitive data type}}
Early programming languages often supported only built-in, numeric types such as the [[integer]] (signed and unsigned) and [[floating point]] (to support operations on [[real number]]s that are not integers). Most programming languages support multiple sizes of floats (often called [[Single-precision floating-point format|float]] and [[Double-precision floating-point format|double]]) and integers depending on the size and precision required by the programmer. Storing an integer in a type that is too small to represent it leads to [[integer overflow]]. The most common way of representing negative numbers with signed types is [[twos complement]], although [[ones complement]] is also used.{{sfn|Sebesta|2012|pp=246–247}} Other common types include [[Boolean data type|Boolean]]—which is either true or false—and [[Character (computing) |character]]—traditionally one [[byte]], sufficient to represent all [[ASCII]] characters.{{sfn|Sebesta|2012|p=249}} 

[[array (data type)|Arrays]] are a data type whose elements, in many languages, must consist of a single type of fixed length. Other languages define arrays as references to data stored elsewhere and support elements of varying types.{{sfn|Sebesta|2012|p=260}} Depending on the programming language, sequences of multiple characters, called [[string (computing)|strings]], may be supported as arrays of characters or their own [[primitive type]].{{sfn|Sebesta|2012|p=250}} Strings may be of fixed or variable length, which enables greater flexibility at the cost of increased storage space and more complexity.{{sfn|Sebesta|2012|p=254}} Other data types that may be supported include [[list (computing)|lists]],{{sfn|Sebesta|2012|pp=281–282}} [[associative arrays|associative (unordered) arrays]] accessed via keys,{{sfn|Sebesta|2012|pp=272–273}} [[record (computer science)|record]]s in which data is mapped to names in an ordered structure,{{sfn|Sebesta|2012|pp=276–277}} and [[tuple]]s—similar to records but without names for data fields.{{sfn|Sebesta|2012|p=280}} [[Pointer (computer programming)|Pointer]]s store memory addresses, typically referencing locations on the [[Heap (programming)|heap]] where other data is stored.{{sfn|Sebesta|2012|pp=289–290}}

The simplest [[user-defined type]] is an [[Ordinal data type|ordinal type]], often called an [[enumeration]], whose values can be mapped onto the set of positive integers.{{sfn|Sebesta|2012|p=255}} Since the mid-1980s, most programming languages also support [[abstract data types]], in which the representation of the data and operations are [[information hiding|hidden from the user]], who can only access an [[Interface (computing)|interface]].{{sfn|Sebesta|2012|pp=244–245}} The benefits of [[data abstraction]] can include increased reliability, reduced complexity, less potential for [[name collision]], and allowing the underlying [[data structure]] to be changed without the client needing to alter its code.{{sfn|Sebesta|2012|p=477}}

====Static and dynamic typing====
In [[static typing]], all expressions have their types determined before a program executes, typically at compile-time.<ref name="typing"/> Most widely used, statically typed programming languages require the types of variables to be specified explicitly. In some languages, types are implicit; one form of this is when the compiler can [[type inference|infer]] types based on context. The downside of [[implicit typing]] is the potential for errors to go undetected.{{sfn|Sebesta|2012|p=211}} Complete type inference has traditionally been associated with functional languages such as [[Haskell]] and [[ML (programming language)|ML]].<ref>{{Cite conference |last=Leivant |first=Daniel |date=1983 |title=Polymorphic type inference |conference=ACM SIGACT-SIGPLAN symposium on Principles of programming languages |language=en |location=Austin, Texas |publisher=ACM Press |pages=88–98 |doi=10.1145/567067.567077 |isbn=978-0-89791-090-3|doi-access=free }}</ref> 

With dynamic typing, the type is not attached to the variable but only the value encoded in it. A single variable can be reused for a value of a different type. Although this provides more flexibility to the programmer, it is at the cost of lower reliability and less ability for the programming language to check for errors.{{sfn|Sebesta|2012|pp=212–213}} Some languages allow variables of a [[union type]] to which any type of value can be assigned, in an exception to their usual static typing rules.{{sfn|Sebesta|2012|pp=284–285}}

===Concurrency===
{{see also|Concurrent computing}}
In computing, multiple instructions can be executed simultaneously. Many programming languages support instruction-level and subprogram-level concurrency.{{sfn|Sebesta|2012|p=576}} By the twenty-first century, additional processing power on computers was increasingly coming from the use of additional processors, which requires programmers to design software that makes use of multiple processors simultaneously to achieve improved performance.{{sfn|Sebesta|2012|p=579}} [[Interpreted language]]s such as [[Python (programming language)|Python]] and [[Ruby (programming language)|Ruby]] do not support the concurrent use of multiple processors.{{sfn|Sebesta|2012|p=585}} Other programming languages do support managing data shared between different threads by controlling the order of execution of key instructions via the use of [[Semaphore (programming)|semaphore]]s, controlling access to shared data via [[monitor (synchronization)|monitor]], or enabling [[message passing]] between threads.{{sfn|Sebesta|2012|pp=585–586}}
===Exception handling===
{{main|Exception handling}}
Many programming languages include exception handlers, a section of code triggered by [[runtime error]]s that can deal with them in two main ways:{{sfn|Sebesta|2012|pp=630, 634}}
*Termination: shutting down and handing over control to the [[operating system]]. This option is considered the simplest.
*Resumption: resuming the program near where the exception occurred. This can trigger a repeat of the exception, unless the exception handler is able to modify values to prevent the exception from reoccurring.
Some programming languages support dedicating a block of code to run regardless of whether an exception occurs before the code is reached; this is called finalization.{{sfn|Sebesta|2012|p=635}}

There is a tradeoff between increased ability to handle exceptions and reduced performance.{{sfn|Sebesta|2012|p=631}} For example, even though array index errors are common{{sfn|Sebesta|2012|p=261}} C does not check them for performance reasons.{{sfn|Sebesta|2012|p=631}} Although programmers can write code to catch user-defined exceptions, this can clutter a program. Standard libraries in some languages, such as C, use their return values to indicate an exception.{{sfn|Sebesta|2012|p=632}} Some languages and their compilers have the option of turning on and off error handling capability, either temporarily or permanently.{{sfn|Sebesta|2012|pp=631, 635–636}}