Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
GIS file format
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Standard of encoding}} A '''GIS file format''' or '''geospatial file format''' is a standard for encoding [[geographical information]] into a [[computer file]]. It is a specialized type of [[file format]] for use in [[geographic information system]]s (GIS), [[remote sensing]] [[image processing]] tools, and other geospatial applications. Since the 1970s, dozens of formats have been created based on various [[Data model (GIS)|data models]] for various purposes. They have been created by government mapping agencies (such as the [[USGS]] or [[National Geospatial-Intelligence Agency]]), GIS software vendors, standards bodies such as the [[Open Geospatial Consortium]], informal user communities, and even individual developers. ==History== The first GIS installations of the 1960s, such as the [[Canada Geographic Information System]] were based on bespoke software and stored data in bespoke file structures designed for the needs of the particular project. As more of these appeared, they could be compared to find best practices and common structures.<ref name="tomlinson1976">{{cite book |last1=Tomlinson |first1=Roger F. |last2=Calkins |first2=Hugh W. |last3=Marble |first3=Duane F. |title=Computer handling of geographical data |date=1976 |publisher=UNESCO Press}}</ref> When general-purpose GIS software was developed in the 1970s and early 1980s, including programs from academic labs such as the [[Harvard Laboratory for Computer Graphics and Spatial Analysis]], government agencies (e.g., the [[Map Overlay and Statistical System]] (MOSS) developed by the U.S. [[United States Fish and Wildlife Service|Fish & Wildlife Service]] and [[Bureau of Land Management]]), and new GIS software companies such as [[Esri]] and [[Intergraph]], each program was built around its own proprietary (and often secret) file format.<ref name="chrisman2006">{{cite book |last1=Chrisman |first1=Nick |title=Charting the Unknown: How Computer Mapping at Harvard Became GIS |date=2006 |publisher=Esri Press |isbn=978-1-58948-118-3}}</ref> Since each GIS installation was effectively isolated from all others, interchange between them was not a major consideration. By the early 1990s, the proliferation of GIS worldwide and an increasing need for sharing data, soon accelerated by the emergence of the [[World Wide Web]] and [[spatial data infrastructure]]s, led to the need for interoperable data and standard formats. An early attempt at standardization was the U.S. [[Spatial Data Transfer Standard]], released in 1994 and designed to encode the wide variety of federal government data.<ref name="SDTS">{{cite web |title=Spatial Data Transfer Standard |url=https://www.usgs.gov/publications/spatial-data-transfer-standard-sdts |publisher=USGS |doi=10.3133/fs07799|access-date=6 January 2023}}</ref> Although this particular format failed to garner widespread support, it led to other standardization efforts, especially the [[Open Geospatial Consortium]] (OGC), which has developed or adopted several vendor-neutral standards, some of which have been adopted by the [[International Standards Organization]] (ISO).<ref name="OGC">{{cite web |title=OGC Standards |url=https://www.ogc.org/docs/is |website=Open Geospatial Consortium |publisher=OGC |access-date=6 January 2023}}</ref> Another development in the 1990s was the public release of proprietary file formats by GIS software vendors, enabling them to be used by other software. The most notable example of this was the publication of the Esri [[Shapefile]] format,<ref name="shapefile">{{cite web |title=ESRI Shapefile Technical Description |url=https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf |website=Esri Technical Library |publisher=Esri |access-date=6 January 2023 |date=July 1998}}</ref> which by the late 1990s had become the most popular ''de facto'' standard for data sharing by the entire geospatial industry.<ref name="lo-yeung2002">{{cite book |last1=Lo |first1=Chor Pang |last2=Yeung |first2=Albert K.W. |title=Concepts and Techniques of Geographic Information Systems |date=2002 |publisher=Prentice Hall |isbn=0-13-080427-4 |page=185}}</ref> When proprietary formats were not shared (for example, the ESRI ARC/INFO coverage), software developers frequently reverse-engineered them to enable import and export in other software, further facilitating data exchange. One result of this was the emergence of [[free and open-source software]] [[Library (computer science)|libraries]], such as the [[GDAL|Geospatial Data Abstraction Library (GDAL)]], which have greatly facilitated the integration of spatial data in any format into a variety of software.<ref>{{cite web |title=Software using GDAL |url=https://gdal.org/software_using_gdal.html |website=Geographic Data Abstraction Library |publisher=OSGEO |access-date=6 January 2023}}</ref> During the 2000s, the need for specialized spatial files was reduced somewhat by the emergence of [[spatial database]]s, which incorporated spatial data into general-purpose relational databases. However, new file formats have continued to appear, especially with the proliferation of web mapping; formats such as the [[Keyhole Markup Language]] (KML) and [[GeoJSON]] can be more easily integrated into web development languages than traditional GIS files. ==Format characteristics== Over a hundred distinct formats have been created for the storage of spatial data, of which 20-30 are currently in common usage for different purposes. These can be distinguished in a number of ways: * ''Open'' formats are developed collectively by a community and are available for anyone to implement and contribute improvements, while ''Proprietary'' formats have been developed by a software company for use only in their own software and are generally maintained as a trade secret (although they are often reverse-engineered by others). A third category between these would include formats that are owned exclusively by one company or organization, but are published and available for implementation by anyone, such as the Esri [[Shapefile]].<ref name="shapefile" /> * Some file formats are ''[[text file]]s'' that can be read by humans (such as those based on [[XML]] or [[JSON]]), especially those intended for data exchange, while others are ''[[binary file]]s'', most commonly those designed for native use in GIS software. * ''Inherently spatial'' formats were designed specifically for storing geographic data, while others are ''spatial extensions'' to formats designed for a more general use (e.g., [[GeoTIFF]], [[spatial database]]s). * Many data formats incorporate some form of ''[[data compression]]'', especially raster files. Generally, lossless compression methods are preferable over [[Lossy compression|lossy]] methods, because the original data values need to be retrieved.<ref name="bolstad">{{cite book |last1=Bolstad |first1=Paul |title=GIS Fundamentals: A First Text on Geographic Information Systems |date=2019 |publisher=XanEdu |location=Ann Arbor, MI |isbn=978-1-59399-552-2 |page=69}}</ref> ==Raster formats== {{further|Raster graphics|Data model (GIS)#Raster data model}} [[Image:geabios3d.jpg|framed|right|Digital elevation model, map (image), and vector data]] Like any digital image, raster GIS data is based on a regular tessellation of space into a rectangular grid of rows and columns of ''cells'' (also known as [[pixel]]s), with each cell having a measured value stored. The major difference from a photograph is that the grid is [[Georeferencing|registered]] to geographic space rather than a field of view. The [[Spatial resolution|resolution]] of the raster data set is its cell width in ground units. Because a grid is a sample of a continuous space, raster data is most commonly used to represent [[Field (geography)|geographic fields]], in which a property varies continuously or discretely over space. Common examples include [[remote sensing]] imagery, [[Digital elevation model|terrain/elevation]], [[population density]], [[Weather map|weather and climate]], [[Soil map|soil properties]], and many others. Raster data can be images with each pixel (or cell) containing a color value. The value recorded for each cell may be of any [[level of measurement]], including a discrete qualitative value, such as land use type, or a continuous quantitative value, such as temperature, or a [[Nullable type|null]] value if no data is available. While a raster cell stores a single value, it can be extended by using raster bands to represent RGB (red, green, blue) colors, colormaps (a mapping between a thematic code and RGB value), or an extended attribute table with one row for each unique cell value. It can also be used to represent discrete [[Geographic feature]]s, but usually only in exigent circumstances. Raster data is stored in various formats; from a standard file-based structure of TIFF, JPEG, etc. to [[binary large object]] (BLOB) data stored directly in a [[relational database management system]] (RDBMS) similar to other vector-based feature classes. Database storage, when properly indexed, typically allows for quicker retrieval of the raster data but can require storage of millions of significantly sized records. ===Raster format examples=== *ADRG β [[National Geospatial-Intelligence Agency]] (NGA)'s ARC Digitized Raster Graphics<ref>{{cite web |title=Arc Digitized Raster Graphic (ADRG) |publisher=[[Library of Congress]] |work=Digital Preservation |date=2011-09-25 |url=http://www.digitalpreservation.gov/formats/fdd/fdd000282.shtml |accessdate=2014-03-13}}</ref> *[[Binary file]] β An unformatted file consisting of raster data written in one of several [[data type]]s, where multiple band are stored in BSQ (band sequential), BIP (band interleaved by pixel) or BIL (band interleaved by line). Georeferencing and other metadata are stored one or more [[sidecar file]]s.<ref>{{cite web | title=Various Supported GDAL Raster Formats |url=http://www.gdal.org/frmt_various.html }}</ref> *[[Digital raster graphic]] (DRG) β digital scan of a paper [[USGS]] [[topographic map]] *ECRG β [[National Geospatial-Intelligence Agency]] (NGA)'s Enhanced Compressed ARC Raster Graphics (better resolution than CADRG and no color loss) *[[ECW (file format)|ECW]] β Enhanced Compressed Wavelet (from ERDAS). A compressed wavelet format, often lossy. *[[Esri grid]] β proprietary [[binary data|binary]] raster format used by [[Esri]] since the mid-1980s *[[GeoTIFF]] β [[TIFF]] variant enriched with GIS relevant metadata, especially [[georeferencing]]. An open format that has become one of the most common formats for data sharing. *IMG β [[ERDAS IMAGINE]] image file format *[[JPEG2000]] β Open-source raster format. A compressed format, allows both lossy and lossless compression. *[[MrSID]] β Multi-Resolution Seamless Image Database (by Lizardtech). A compressed wavelet format, allows both lossy and lossless compression. *[[netCDF]]-CF β netCDF file format with [[Climate and Forecast Metadata Conventions|CF medata conventions]] for earth science data. Binary storage in open format with optional compression. Allows for direct web-access of subsets/aggregations of maps through [[OPeNDAP]] protocol. *RPF β Raster Product Format, military file format specified in [[United States Military Standard|MIL-STD-2411]]<ref>{{cite web |title=Raster Product Format |publisher=[[Library of Congress]] |work=Digital Preservation |date=2011-10-27 |url=http://www.digitalpreservation.gov/formats/fdd/fdd000298.shtml |accessdate=2014-03-13}}</ref> **CADRG β Compressed ADRG, developed by [[National Geospatial-Intelligence Agency|NGA]], nominal compression of 55:1 over ADRG (type of Raster Product Format) **[[Controlled Image Base|CIB]] β Controlled Image Base, developed by [[National Geospatial-Intelligence Agency|NGA]] (type of Raster Product Format) *[[USGS DEM]] β The [[USGS]]' Digital Elevation Model **[[GTOPO30]] β Large complete Earth elevation model at 30 arc seconds, delivered in the USGS DEM format *[[DTED]] β [[National Geospatial-Intelligence Agency]] (NGA)'s Digital Terrain Elevation Data, the military standard for elevation data *[[World file]] β [[Georeference|Georeferencing]] a raster image file (e.g. JPEG, BMP) ==Vector formats{{Anchor|Vector}}== {{further|Vector graphics|Data model (GIS)#Vector data model}} [[Image:Simple vector map.svg|thumb|250px|right|A simple vector map, using each of the vector elements: points for wells, lines for rivers, and a polygon for the lake]] A ''vector'' dataset (sometimes called a ''feature'' dataset) stores information about discrete objects, using an encoding of the [[Vector graphics|vector logical data model]] to represent the location or ''geometry'' of each object, and an encoding of its other properties that is usually based on [[relational database]] technology. Typically, a single dataset collects information about a set of closely related or similar objects, such as all of the roads in a city. The Vector data model uses [[coordinate geometry]] to represent each shape as one of several [[geometric primitive]]s, most commonly ''[[Point (geometry)|points]]'' (a single coordinate of zero [[dimension]]), ''[[Line (geometry)|lines]]'' (a one-dimensional ordered list of coordinates connected by straight lines), and ''[[polygon]]s'' (a self-closing boundary line enclosing a two-dimensional region). Many data structures have been developed to encode these primitives as digital data, but most modern vector file formats are based on the [[Open Geospatial Consortium]] (OGC) [[Simple Features]] specification, often directly incorporating its [[Well-known text representation of geometry|Well-known text]] (WKT) or Well-known binary (WKB) encodings. In addition to the geometry of each object, a vector dataset must also be able to store its ''attributes''. For example, a database that describes lakes may contain each lake's depth, water quality, and pollution level. Since the 1970s, almost all vector file formats have adopted the [[relational database]] model, either in principle or directly incorporating [[relational database management system|RDBMS]] software. Thus, the entire dataset is stored in a ''table'', with each ''row'' representing a single object that contains ''columns'' for each attribute.<ref name="longley2011" />{{rp|256}} Two strategies have been used to integrate the geometry and attributes into a single vector file format structure:<ref name="chang2014">{{cite book |last1=Chang |first1=Kang-tsung |title=Introduction to Geographic Information Systems |date=2014 |publisher=McGraw-Hill |isbn=978-0-07-352290-6 |pages=50β57 |edition=7th}}</ref> * A ''[[Georelational data model|georelational format]]'' stores them as two separate files, with the geometry and attributes of each object being linked by file ordering or a [[primary key]]. This was most common from the 1970s through the early 1990s, because GIS software developers had to invent their own geometry data structures, but incorporated existing relational database file formats for the attributes. For example, the [[Esri]] [[Shapefile]] format includes the .dbf file from the DOS [[dBase]] software. * The ''Object-based model'' stores them in a single structure, loosely or directly based on the objects in [[object-oriented programming]] languages. This is the basis of most modern file formats, including [[spatial database]]s that include a geometry column along with the other attributes in a single relational table. Other formats, such as [[GeoJSON]], use different structures for geometry and attributes, but combine them for each object in the same file. [[Geospatial topology]] is often an important part of vector data, representing the inherent spatial relationships (especially adjacency) between objects. Topology has been managed in vector file formats in four ways. In a ''topological data structure'', most notably Harvard's POLYVRT and its successor the [[ArcInfo|ARC/INFO]] coverage, topological connections between points, lines, and polygons are an inherent part of the encoding of those features.<ref name="bolstad" />{{rp|46β49}} Conversely, non-topological or ''spaghetti data'' (such as the Esri [[Shapefile]] and most [[spatial database]]s) includes no topology information, with each geometry being completely independent of all others. A ''topology dataset'' (often used in [[Transport network analysis|network analysis]]) augments spaghetti data with a separate file encoding the topological connections.<ref name="longley2011">{{cite book |last1=Longley |first1=Paul A. |last2=Goodchild |first2=Michael F. |last3=Maguire |first3=David J. |last4=Rhind |first4=David W. |title=Geographic Information Systems & Science |date=2011 |publisher=Wiley |edition=3rd}}</ref>{{rp|218}} A ''topology rulebase'' is a list of desired topology rules used to enforce spatial integrity in spaghetti data, such as "county polygons must not overlap" and "state polygons must share boundaries with county polygons."<ref name="chang2014" /> Vector datasets usually represent discrete [[geographical feature]]s, such as buildings, trees, and counties. However, they may also be used to represent [[Field (geography)|geographical fields]] by storing locations where the spatially continuous field has been sampled. Sample points (e.g., [[weather stations]] and [[sensor networks]]), [[Contour line]]s and [[triangulated irregular network]]s (TIN) are used to represent elevation or other values that change continuously over space. TINs record values at point locations, which are connected by lines to form an irregular mesh of triangles. The face of the triangles represent the terrain surface. ===Example vector file formats=== {{see also|Comparison of GIS vector file formats}} Formats commonly in current usage: *[[Shapefile]] β a popular vector data GIS format, developed by [[Esri]] *[[Geography Markup Language]] (GML) β XML based open standard (by [[OpenGIS]]) for GIS data exchange *[[GeoJSON]] β a lightweight format based on [[JSON]], used by many open source GIS packages *[[GeoMedia]] β [[Intergraph]]'s [[Microsoft Access]] based format for spatial vector storage *[[Keyhole Markup Language]] (KML) β XML based open standard (by [[OpenGIS]]) for GIS data exchange *[[MapInfo TAB format]] β [[MapInfo Corporation|MapInfo]]'s vector data format using TAB, DAT, ID and MAP files *[[Measure Map Pro format]] β [[XML]] data format to store GIS data *[[National Transfer Format]] (NTF) β National Transfer Format (mostly used by the UK Ordnance Survey) *[[Spatialite]] β a spatial extension to [[SQLite]], providing vector geodatabase functionality. It is similar to [[PostGIS]], [[Oracle Spatial]], and SQL Server with spatial extensions *[[Simple Features]] β [[Open Geospatial Consortium]] specification for vector data **[[Well-known text representation of geometry|Well-known text]] (WKT) β A text markup language for representing feature geometry, developed by [[Open Geospatial Consortium]] **[[Well-known text representation of geometry|Well-known binary]] (WKB) β Binary version of well-known text, used in many [[spatial database]]s *[[SOSI]] β a spatial data format used for all public exchange of spatial data in Norway * [[AutoCAD DXF]] β data transfer format for [[AutoCAD]] data (by [[Autodesk]]) *[[Geographic Data Files]] (GDF) β An interchange file format for geographic data Historical formats seldom used today: *[[ArcInfo]] Coverage - topological data structure used in Arc/INFO from 1981 through 2000 *[[Esri TIN]] β proprietary [[binary data|binary]] format for [[triangulated irregular network]] data used by [[Esri]] *[[Digital line graph]] (DLG) β a USGS format for vector data *[[TIGER]] β Topologically Integrated Geographic Encoding and Referencing *[[Vector Product Format]] (VPF) β [[National Geospatial-Intelligence Agency]] (NGA)'s format of vectored data for large geographic databases * [[Spatial Data File]] β [[Autodesk]]'s high-performance geodatabase format, native to [[MapGuide]] * ISFC β [[Intergraph]]'s [[MicroStation]] based CAD solution attaching vector elements to a relational [[Microsoft Access]] database *[[Dual Independent Map Encoding]] (DIME) β A historic GIS file format, developed in the 1960s ==Advantages and disadvantages== There are some important advantages and disadvantages to using a raster or vector data model to represent reality: * Raster datasets record a value for all points in the area covered which may require more storage space than representing data in a vector format that can store data only where needed. * Raster data is computationally less expensive to render than vector graphics * Combining values and writing custom formulas for combining values from different layers are much easier using raster data. * There are transparency and aliasing problems when overlaying multiple stacked pieces of raster images. * Vector data allows for visually smooth and easy implementation of overlay operations, especially in terms of graphics and shape-driven information like maps, routes and custom fonts, which are more difficult with raster data. * Vector data can be displayed as [[vector graphics]] used on traditional maps, whereas raster data will appear as an [[image]] that may have a blocky appearance for object boundaries. (depending on the resolution of the raster file). * Vector data can be easier to register, scale, and re-project, which can simplify combining vector layers from different sources. * Vector data is more compatible with relational database environments, where they can be part of a relational table as a normal column and processed using a multitude of operators. * Vector file sizes are usually smaller than raster data, which can be tens, hundreds or more times larger than vector data (depending on resolution). * Vector data is simpler to update and maintain, whereas a raster image will have to be completely reproduced. (Example: a new road is added). * Vector data allows much more analysis capability, especially for "networks" such as roads, power, rail, telecommunications, etc. (Examples: Best route, largest port, airfields connected to two-lane highways). Raster data will not have all the characteristics of the features it displays. ==Integrated file formats== Modern [[objectβrelational database]]s can now store a variety of complex data using the [[binary large object]] datatype, including both raster grids and vector geometries. This enables some [[spatial database]] systems to store data of both models in the same database. *[[Esri]] File [[Geodatabase]] - A proprietary format for storing "feature" (vector) and raster data locally<ref name="geodatabase">{{cite web |title=The architecture of a geodatabase |url=https://pro.arcgis.com/en/pro-app/latest/help/data/geodatabases/overview/the-architecture-of-a-geodatabase.htm |website=ArcGIS Pro Documentation |publisher=Esri |access-date=8 January 2023}}</ref> *[[Esri]] Enterprise [[Geodatabase]] - A proprietary model for storing a geodatabase structure in a variety of commercial and open-source [[relational database management system]]s<ref name="geodatabase"/> *[[GeoPackage]] (GPKG) β A standards-based, open format based on the SQLite database format for both vector and raster data, adopted by the [[Open Geospatial Consortium]]<ref>{{cite web |title=OGC GeoPackage Encoding Standard |url=http://www.opengis.net/doc/IS/geopackage/1.3 |website=Open Geospatial Consortium Standards |publisher=OGC |access-date=8 January 2023}}</ref> ==See also== {{Portal|Geography}} *[[Datum (geodesy)]] *[[GDAL|GDAL/OGR]], a library for reading and writing many formats *[[FME (software)|Feature Manipulation Engine]] (FME), a commercial program for converting data between a large number of formats ==References== {{reflist}} {{Markup languages}} [[Category:GIS file formats| ]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:Cite book
(
edit
)
Template:Cite web
(
edit
)
Template:Further
(
edit
)
Template:Markup languages
(
edit
)
Template:Portal
(
edit
)
Template:Reflist
(
edit
)
Template:Rp
(
edit
)
Template:See also
(
edit
)
Template:Short description
(
edit
)