Editing Version control (section)

==Structure==
Revision control manages changes to a set of data over time. These changes can be structured in various ways.

Often the data is thought of as a collection of many individual items, such as files or documents, and changes to individual files are tracked. This accords with intuitions about separate files but causes problems when identity changes, such as during renaming, splitting or merging of files. Accordingly, some systems such as [[Git]], instead consider changes to the data as a whole, which is less intuitive for simple changes but simplifies more complex changes.

When data that is under revision control is modified, after being retrieved by ''checking out,'' this is not in general immediately reflected in the revision control system (in the ''repository''), but must instead be ''checked in'' or ''committed.'' A copy outside revision control is known as a "working copy". As a simple example, when editing a computer file, the data stored in memory by the editing program is the working copy, which is committed by saving. Concretely, one may print out a document, edit it by hand, and only later manually input the changes into a computer and save it. For source code control, the working copy is instead a copy of all files in a particular revision, generally stored locally on the developer's computer;<ref group="note">In this case, edit buffers are a secondary form of working copy, and not referred to as such.</ref> in this case saving the file only changes the working copy, and checking into the repository is a separate step.

If multiple people are working on a single data set or document, they are implicitly creating branches of the data (in their working copies), and thus issues of merging arise, as discussed below. For simple collaborative document editing, this can be prevented by using [[file locking]] or simply avoiding working on the same document that someone else is working on.

Revision control systems are often centralized, with a single authoritative data store, the ''repository,'' and check-outs and check-ins done with reference to this central repository. Alternatively, in [[distributed revision control]], no single repository is authoritative, and data can be checked out and checked into any repository. When checking into a different repository, this is interpreted as a merge or patch.

===Graph structure===
[[File:Revision controlled project visualization-2010-24-02.svg|thumb|upright|Example history graph of a revision-controlled project; trunk is in green, branches in yellow, and graph is not a tree due to presence of merges (the red arrows).]]
In terms of [[graph theory]], revisions are generally thought of as a line of development (the ''trunk'') with branches off of this, forming a directed tree, visualized as one or more parallel lines of development (the "mainlines" of the branches) branching off a trunk. In reality the structure is more complicated, forming a [[directed acyclic graph]], but for many purposes "tree with merges" is an adequate approximation.

Revisions occur in sequence over time, and thus can be arranged in order, either by revision number or timestamp.<ref group="note">In principle two revisions can have identical timestamp, and thus cannot be ordered on a line. This is generally the case for separate repositories, though is also possible for simultaneous changes to several branches in a single repository. In these cases, the revisions can be thought of as a set of separate lines, one per repository or branch (or branch within a repository).</ref> Revisions are based on past revisions, though it is possible to largely or completely replace an earlier revision, such as "delete all existing text, insert new text". In the simplest case, with no branching or undoing, each revision is based on its immediate predecessor alone, and they form a simple line, with a single latest version, the "HEAD" revision or ''tip''. In [[graph theory]] terms, drawing each revision as a point and each "derived revision" relationship as an arrow (conventionally pointing from older to newer, in the same direction as time), this is a [[linear graph]]. If there is branching, so multiple future revisions are based on a past revision, or undoing, so a revision can depend on a revision older than its immediate predecessor, then the resulting graph is instead a [[directed tree]] (each node can have more than one child), and has multiple tips, corresponding to the revisions without children ("latest revision on each branch").<ref group="note">The revision or repository "tree" should not be confused with the directory tree of files in a working copy.</ref> In principle the resulting tree need not have a preferred tip ("main" latest revision) – just various different revisions – but in practice one tip is generally identified as HEAD. When a new revision is based on HEAD, it is either identified as the new HEAD, or considered a new branch.<ref group="note">If a new branch is based on HEAD, then topologically HEAD is no longer a tip, since it has a child.</ref> The list of revisions from the start to HEAD (in graph theory terms, the unique path in the tree, which forms a linear graph as before) is the ''trunk'' or ''mainline.''<ref group="note">"Mainline" can also refer to the main path in a separate branch.</ref> Conversely, when a revision can be based on more than one previous revision (when a node can have more than one ''parent''), the resulting process is called a ''[[merge (revision control)|merge]],'' and is one of the most complex aspects of revision control. This most often occurs when changes occur in multiple branches (most often two, but more are possible), which are then merged into a single branch incorporating both changes. If these changes overlap, it may be difficult or impossible to merge, and require manual intervention or rewriting.

In the presence of merges, the resulting graph is no longer a tree, as nodes can have multiple parents, but is instead a rooted [[directed acyclic graph]] (DAG). The graph is acyclic since parents are always backwards in time, and rooted because there is an oldest version. Assuming there is a trunk, merges from branches can be considered as "external" to the tree – the changes in the branch are packaged up as a ''patch,'' which is applied to HEAD (of the trunk), creating a new revision without any explicit reference to the branch, and preserving the tree structure. Thus, while the actual relations between versions form a DAG, this can be considered a tree plus merges, and the trunk itself is a line.

In distributed revision control, in the presence of multiple repositories these may be based on a single original version (a root of the tree), but there need not be an original root - instead there can be a separate root (oldest revision) for each repository. This can happen, for example, if two people start working on a project separately. Similarly, in the presence of multiple data sets (multiple projects) that exchange data or merge, there is no single root, though for simplicity one may think of one project as primary and the other as secondary, merged into the first with or without its own revision history.