Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Contig
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Set of overlapping DNA segments that together represent a consensus region of DNA}} {{about|'''contig''' in DNA sequencing|the '''contig''' defragmentation program|Contig (defragmentation utility)}}[[File:PET contig scaffold.png|thumb|Overlapping reads from paired-end sequencing form contigs; contigs and gaps of known length form scaffolds.]]A '''contig''' (from ''contiguous'') is a set of overlapping DNA segments that together represent a [[Consensus sequence|consensus region of DNA]].<ref name="contig assembly">Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005.</ref> In [[Shotgun sequencing#Whole genome shotgun sequencing|bottom-up sequencing]] projects, a contig refers to overlapping sequence data ([[Read (biology)|reads]]);<ref name="textbook">{{cite book |last1=Gibson |first1=Greg |last2=Muse |first2=Spencer V. |title=A Primer of Genome Science |edition=3rd |page=84 |publisher=Sinauer Associates |year=2009 |isbn=978-0-878-93236-8}}</ref> in [[Shotgun sequencing#Hierarchical shotgun sequencing|top-down sequencing]] projects, contig refers to the overlapping clones that form a [[Gene mapping#Physical Mapping|physical map]] of the genome that is used to guide sequencing and [[Sequence assembly|assembly]].<ref name="genome map">Dear, P. H. ''Genome Mapping''. Encyclopedia of Life Sciences, 2005. {{doi|10.1038/npg.els.0005353}}.</ref> Contigs can thus refer both to overlapping DNA sequences and to overlapping physical segments (fragments) contained in clones depending on the context. ==Original definition of contig== In 1980, Staden <ref>{{cite journal|last1=Staden|first1=R|title=A new computer method for the storage and manipulation of DNA gel reading data|journal=Nucleic Acids Research|date=1980|volume=8|issue=16|pages=3673–3694|doi=10.1093/nar/8.16.3673|pmc=324183|pmid=7433103}}</ref> wrote: ''In order to make it easier to talk about our data gained by the shotgun method of sequencing we have invented the word "contig". A contig is a set of gel readings that are related to one another by overlap of their sequences. All gel readings belong to one and only one contig, and each contig contains at least one gel reading. The gel readings in a contig can be summed to form a contiguous consensus sequence and the length of this sequence is the length of the contig.'' ==Sequence contigs== A sequence contig is a continuous (not contiguous) sequence resulting from the reassembly of the small DNA fragments generated by [[Shotgun sequencing|bottom-up sequencing]] strategies. This meaning of contig is consistent with the original definition by [[Rodger Staden]] (1979).<ref>{{cite journal |author=Staden R |year=1979 |title=A strategy of DNA sequencing employing computer programs |journal=Nucleic Acids Research |volume= 6|issue=7 |pages=2601–2610 |pmc=327874 |pmid=461197 |doi=10.1093/nar/6.7.2601}}</ref> The bottom-up [[DNA sequencing]] strategy involves shearing genomic DNA into many small fragments ("bottom"), sequencing these fragments, reassembling them back into contigs and eventually the entire genome ("up"). Because current technology allows for the direct sequencing of only relatively short DNA fragments (300–1000 nucleotides), genomic DNA must be fragmented into small pieces prior to sequencing.<ref name="genome sequencing">Dunham, I. ''Genome Sequencing''. Encyclopedia of Life Sciences, 2005.</ref> In bottom-up sequencing projects, [[Polymerase chain reaction|amplified]] DNA is sheared randomly into fragments appropriately sized for sequencing. The subsequent sequence reads, which are the data that contain the sequences of the small fragments, are put into a database. The [[Genome project#Assembly software|assembly software]]<ref name="genome sequencing" /> then searches this database for pairs of overlapping reads. Assembling the reads from such a pair (including, of course, only one copy of the identical sequence) produces a longer contiguous read (contig) of sequenced DNA. By repeating this process many times, at first with the initial short pairs of reads but then using increasingly longer pairs that are the result of previous assembly, the DNA sequence of an entire chromosome can be determined. Today, it is common to use [[DNA sequencing theory#Pairwise end-sequencing|paired-end sequencing]] technology where both ends of ''consistently sized'' longer DNA fragments are sequenced. Here, a contig still refers to any contiguous stretch of sequence data created by read overlap. Because the fragments are of known length, the distance between the two end reads from each fragment is known.<ref name="pet">{{cite journal |vauthors=Fullwood MJ, Wei C, Liu ET |title=Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses |journal=Genome Research |year=2009 |volume=19 |pages=521–532 |doi=10.1101/gr.074906.107 |pmid=19339662 |issue=4|display-authors=etal|pmc=3807531 }}</ref> This gives additional information about the orientation of contigs constructed from these reads and allows for their assembly into '''scaffolds''' in a process called [[Scaffolding (bioinformatics)|scaffolding]]. Scaffolds consist of overlapping contigs separated by gaps of known length. The new constraints placed on the orientation of the contigs allows for the placement of highly repeated sequences in the genome. If one end read has a repetitive sequence, as long as its [[mate pair]] is located within a contig, its placement is known.<ref name="pet" /> The remaining gaps between the contigs in the scaffolds can then be sequenced by a variety of methods, including PCR amplification followed by sequencing (for smaller gaps) and [[Bacterial artificial chromosome|BAC]] cloning methods followed by sequencing for larger gaps.<ref name="textbook" /> ==BAC contigs== Contig can also refer to the overlapping [[Bacterial artificial chromosome|clones]] that form a [[Gene mapping#Physical Mapping|physical map]] of a chromosome when the '''top-down''' or '''[[Shotgun sequencing#Hierarchical Shotgun sequencing|hierarchical]]''' sequencing strategy is used.<ref name="contig assembly" /> In this sequencing method, a low-resolution [[Gene mapping#Physical Mapping|map]] is made prior to sequencing in order to provide a framework to guide the later assembly of the sequence reads of the genome. This map identifies the relative positions and overlap of the clones used for sequencing. Sets of overlapping clones that form a contiguous stretch of DNA are called contigs; the minimum number of clones that form a contig that covers the entire chromosome comprise the tiling path that is used for sequencing. Once a tiling path has been selected, its component BACs are sheared into smaller fragments and sequenced. Contigs therefore provide the framework for hierarchical sequencing.<ref name="genome map" /> The assembly of a contig map involves several steps. First, DNA is sheared into larger (50–200kb) pieces, which are cloned into [[Bacterial Artificial Chromosome|BACs]] or [[P1-derived artificial chromosome|PACs]] to form a BAC [[Library (biology)|library]]. Since these clones should cover the entire genome/chromosome, it is theoretically possible to assemble a contig of BACs that covers the entire chromosome.<ref name="contig assembly" /> Reality, however, is not always ideal. Gaps often remain, and a scaffold—consisting of contigs and gaps—that covers the map region is often the first result.<ref name="contig assembly" /> The gaps between contigs can be closed by various methods outlined below. ===Construction of BAC contigs=== BAC contigs are constructed by aligning BAC regions of known overlap via a variety of methods. One common strategy is to use [[sequence-tagged site]] (STS) content mapping to detect unique DNA sites in common between BACs. The degree of overlap is roughly estimated by the number of STS markers in common between two clones, with more markers in common signifying a greater overlap.<ref name="textbook" /> Because this strategy provides only a very rough estimate of overlap, [[restriction digest]] fragment analysis, which provides a more precise measurement of clone overlap, is often used.<ref name="textbook" /> In this strategy, clones are treated with one or two [[restriction enzyme]]s and the resulting fragments separated by [[gel electrophoresis]]. If two clones, they will likely have restriction sites in common, and will thus share several fragments.<ref name="genome map" /> Because the number of fragments in common and the length of these fragments is known (the length is judged by comparison to a size standard), the degree of overlap can be deduced to a high degree of precision. ===Gaps between contigs=== Gaps often remain after initial BAC contig construction. These gaps occur if the [[Bacterial Artificial Chromosome]] (BAC) library screened has low complexity, meaning it does not contain a high number of STS or restriction sites, or if certain regions were less stable in cloning hosts and thus underrepresented in the library.<ref name="contig assembly" /> If gaps between contigs remain after STS landmark mapping and restriction fingerprinting have been performed, the sequencing of contig ends can be used to close these gaps. This end-sequencing strategy essentially creates a novel STS with which to screen the other contigs. Alternatively, the end sequence of a contig can be used as a primer to [[Primer walking|primer walk]] across the gap.<ref name="textbook" /> == See also == *[[Staden Package]] ==References== <references/> ==External links== {{Wiktionary}} * [http://staden.sourceforge.net/contig.html Definition of the term and historical perspective] * [http://staden.sourceforge.net/ Staden package of sequence assembly: Definitions and background information] [[Category:Molecular biology]] [[Category:Genomics]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:About
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Doi
(
edit
)
Template:Short description
(
edit
)
Template:Sister project
(
edit
)
Template:Wiktionary
(
edit
)