Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Chemical table file
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Family of chemical file formats}} '''Chemical table file''' (CT file) is a family of text-based [[chemical file format]]s that describe molecules and chemical reactions. One format, for example, lists each atom in a molecule, the x-y-z coordinates of that atom, and the bonds among the atoms. == File formats == There are several file formats in the family. The formats ''were'' created by [[MDL Information Systems]] (MDL), which was acquired by [[Symyx Technologies]] then merged with [[Accelrys]] Corp., and now called BIOVIA, a subsidiary of Dassault Systemes of [[Dassault Group]].<ref>{{Cite journal|last1=Dalby|first1=A.|last2=Nourse|first2=J. G.|last3=Hounshell|first3=W. D.|last4=Gushurst|first4=A. K. I.|last5=Grier|first5=D. L.|last6=Leland|first6=B. A.|last7=Laufer|first7=J.|year=1992|title=Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited|journal=Journal of Chemical Information and Modeling|volume=32|issue=3|pages=244|doi=10.1021/ci00007a012}}</ref> The CT file is an [[open format]]. BIOVIA publishes its specification.<ref>{{cite web |url=https://discover.3ds.com/sites/default/files/2020-08/biovia_ctfileformats_2020.pdf |title=CT File Formats |author=<!--Not stated--> |publisher=Biovia |date=August 2020 |access-date=2021-02-19 |archive-url=https://web.archive.org/web/20210219065450/https://discover.3ds.com/sites/default/files/2020-08/biovia_ctfileformats_2020.pdf |archive-date=2021-02-19 |url-status=live}}</ref> BIOVIA requires users to register to download the CT file format specifications.<ref>{{cite web |url=https://discover.3ds.com/ctfile-documentation-request-form |title=Registration form |date=13 August 2020 |access-date=2021-02-19 |publisher=Biovia |archive-url=https://web.archive.org/web/20201001232143/https://discover.3ds.com/ctfile-documentation-request-form |archive-date=2020-10-01 |url-status=live }}</ref> === Molfile === {{Infobox file format | name = ctab | extension = {{mono|.mol}} | mime = chemical/x-mdl-molfile | owner = | creatorcode = | genre = [[chemical file format]] | container for = | contained by = | extended from = | extended to = }} An '''MDL Molfile''' is a file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule. The molfile consists of some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information. The molfile is sufficiently common that most, if not all, [[cheminformatics]] software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as [[Mathematica]]. The current ''de facto'' standard version is molfile V2000, although, more recently, the V3000 format has been circulating widely enough to present a potential compatibility issue for those applications that are not yet V3000-capable. {| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;" |+ [[File:L-Alanine.svg|thumb|center|The contents of a Molfile of L-Alanine]] | L-Alanine |'''Title line''' (can be blank but line must exist) ! rowspan="3" |'''Header Block''' (3 lines) |- |<pre> ABCDEFGH09071717443D</pre> |'''Program / file timestamp line''' (Name of source program and a file timestamp) |- |<pre>Exported</pre> |'''Comment line''' (can be blank but line must exist) |- |<pre>6 5 0 0 1 0 3 V2000</pre> |'''Counts line''' ! rowspan="4" |Connection table |- |<pre>-0.6622 0.5342 0.0000 C 0 0 2 0 0 0 0.6622 -0.3000 0.0000 C 0 0 0 0 0 0 -0.7207 2.0817 0.0000 C 1 0 0 0 0 0 -1.8622 -0.3695 0.0000 N 0 3 0 0 0 0 0.6220 -1.8037 0.0000 O 0 0 0 0 0 0 1.9464 0.4244 0.0000 O 0 5 0</pre> |'''Atom block''' (1 line for each atom): x, y, z (in [[angstrom]]s), element, etc. |- |<pre>1 2 1 0 0 0 0 1 3 1 0 1 0 0 1 4 1 0 0 0 0 2 5 2 0 0 0 0 2 6 1 0 0 0 0</pre> |'''Bond block''' (1 line for each bond): 1st atom, 2nd atom, type, etc. |- |<pre>M CHG 2 4 1 6 -1 M ISO 1 3 13</pre> |'''Properties block''' |- | M END |'''END line''' (NOTE: some programs don't like a blank line before M END) !'''END''' |} ==== Counts line block specification ==== {| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;" |+ !Value !6 !5 !0 !0 !0 !1 !V2000 |- |Description |number of atoms |number of bonds |number of atom list |Chiral flag, 1 = chiral; 0 = not chiral |number of stext entries |number of lines of additional properties |mol version |- |Type |[Generic] |[Generic] |[Query] |[Generic] |[ISIS/Desktop] |[Generic] | |} ==== Bond block specification ==== The [[Chemical bond|Bond]] Block is made up of bond lines, one line per bond, with the following format: 111 222 ttt sss xxx rrr ccc where the values are described in the following table: {| class="wikitable" |+ !Field !Meaning !Values |- |111 |first atom number | |- |222 |second atom number | |- |ttt |bond type |1= Single, 2 = Double, 3 = Triple, 4 = Aromatic,5 = Single or Double, 6 = Single or Aromatic, 7 = Double or Aromatic, 8 = Any |- |sss |bond stereo |For single bonds: 0 = not stereo; 1= up; 4=either, 6= down For double bonds: 0= Use x-, y-, z-coords from atom block to determine cis or trans; 3=Cis or trans (either) double bond |- |xxx |not used | |- |rrr |bond topology |0 = Either, 1 = Ring, 2 = Chain |- |ccc |reacting center status |0 = unmarked, 1 = a center, -1 = not a center, Additional: 2 = no change, 4 = bond made/broken, 8 = bond order changes 12 = 4+8 (both made/broken and changes); 5 = (4 + 1), 9 = (8 + 1), and 13 = (12 + 1) are also possible |} === Extended Connection Table (V3000) === The extended (V3000) molfile consists of a regular molfile “no structure” followed by a single molfile appendix that contains the body of the connection table (Ctab). The following figure shows both an [[alanine]] structure and the extended molfile corresponding to it. Note that the “no structure” is flagged with the “V3000” instead of the “V2000” version stamp. There are two other changes to the header in addition to the version: * The number of appendix lines is always written as 999, regardless of how many there actually are. (All current readers will disregard the count and stop at M END.) * The “dimensional code” is maintained more explicitly. Thus “3D” really means 3D, although “2D” will be interpreted as 3D if any non-zero Z-coordinates are found. Unlike the V2000 molfile, the V3000 extended Rgroup molfile has the same header format as a non-Rgroup molfile. {| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;" |+ [[File:L-Alanine.svg|center|366x366px]] | L-Alanine !Description ! rowspan="4" |Header block |- | GSMACCS-II07189510252D 1 0.00366 0.00000 0 !Header with timestamp |- | Figure 1, J. Chem. Inf. Comput. Sci., Vol 32, No. 3., 1992 !Comment line |- | 0 0 0 0 0 999 V3000 !V2000-compatibility line |- | M V30 BEGIN CTAB | ! rowspan="5" |Connection table |- | M V30 COUNTS 6 5 0 0 1 !Counts line |- |<pre>M V30 BEGIN ATOM M V30 1 C -0.6622 0.5342 0 0 CFG=2 M V30 2 C 0.6622 -0.3 0 0 M V30 3 C -0.7207 2.0817 0 0 MASS=13 M V30 4 N -1.8622 -0.3695 0 0 CHG=1 M V30 5 O 0.622 -1.8037 0 0 M V30 6 O 1.9464 0.4244 0 0 CHG=-1 M V30 END ATOM</pre> !Atom block |- |<pre>M V30 BEGIN BOND M V30 1 1 1 2 M V30 2 1 1 3 CFG=1 M V30 3 1 1 4 M V30 4 2 2 5 M V30 5 1 2 6 M V30 END BOND</pre> !Bond block |- | M V30 END CTAB M END | |} ==== Counts line ==== A counts line is required, and must be first. It specifies the number of atoms, bonds, 3D objects, and Sgroups. It also specifies whether or not the CHIRAL flag is set. Optionally, the counts line can specify molregno. This is only used when the regno exceeds 999999 (the limit of the format in the molfile header line). The format of the counts line is: {| |+ <pre>M V30 COUNTS na nb nsg n3d chiral</pre> |- style="font-family:monospace;" |M V30 COUNTS |na |nb |nsg |n3d |chiral |[REGNO=regno] |- style="font-family:monospace;" |M V30 COUNTS |6 |5 |0 |0 |1 | |- | | {{vert header|va=top|number of atoms}} | {{vert header|va=top|number of bonds}} | {{vert header|va=top|number of Sgroups}} | {{vert header|va=top|number of 3D constrains}} | {{vert header|va=top|1=if 1 = molecule is chiral}} | {{vert header|va=top|molecule or model regno}} |} === SDF === {{Infobox file format | name = ctab | extension = {{mono|.sd}}, {{mono|.sdf}} | mime = chemical/x-mdl-sdfile | owner = | creatorcode = | genre = [[chemical file format]] | container for = | contained by = | extended from = | extended to = }} SDF is one of a family of chemical-data file formats developed by MDL; it is intended especially for structural information. "SDF" stands for structure-data format, and SDF files actually wrap the molfile ([[#Molfile|MDL Molfile]]) format. Multiple records are [[delimiter|delimited]] by lines consisting of four dollar signs ($$$$). A key feature of this format is its ability to include associated data. Associated data items are denoted as follows: <syntaxhighlight lang="doscon"> > <Unique_ID> XCA3464366 > <ClogP> 5.825 > <Vendor> Sigma > <Molecular Weight> 499.611 </syntaxhighlight> Multiple-line data items are also supported. The MDL SDF-format specification requires that a hard-carriage-return character be inserted if a single line of any text field exceeds 200 characters. This requirement is frequently violated in practice, as many [[SMILES]] and [[InChI]] strings exceed that length. === Other formats of the family === There are other, less commonly used formats of the family: * '''RXNFile''' - for representing a single chemical reaction; * '''RDFile''' - for representing a list of records with associated data. Each record can contain chemical structures, reactions, textual and tabular data; * '''RGFile''' - for representing the [[Markush structure|Markush structures]] (deprecated, Molfile V3000 can represent Markush structures); * '''XDFile''' - for representing chemical information in [[XML]] format. ==See also== * [[Chemical file format#Converting between formats]] == References == {{reflist|30em}} == External links == * [https://www.collaborativedrug.com/cdd-visualization CDD Free Visualization] free software to visualize, process and analyse SD files (SDF) from [https://www.collaborativedrug.com/cdd-visualization]. * [https://adroitdi.com Adroit Repository] paid software to process SD files (SDF) from [https://adroitdi.com Adroit DI]. * [http://cactus.nci.nih.gov/SDF_toolkit/ SDF Toolkit] free software to process SD files (SDF). * [http://cactus.nci.nih.gov/chemical/structure NCI/CADD Chemical Identifier Resolver] generates SD files (SDF) from chemical names, CAS Registry Numbers, SMILES, InChI, InChIKey, .... * [http://www.knime.org/ KNIME] free software to manipulate data and do datamining, can also read and write SD files (SDF). * [https://comptox.epa.gov/dashboard Comparative Toxicology Dashboard] service provided by the Environmental Protection Agency (EPA) which generates SD files (SDF) from chemical names, CAS Registry Numbers, SMILES, InChI, InChIKey, ... {{DEFAULTSORT:Chemical Table File}} [[Category:Computational chemistry]] [[Category:Chemical file formats]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Infobox file format
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Vert header
(
edit
)