A Geodatabase is a proprietary GIS file format developed in the late 1990s by Esri (a GIS software vendor) to represent, store, and organize spatial datasets within a geographic information system.[1][2] A geodatabase is both a logical data model and the physical implementation of that logical model in several proprietary file formats released during the 2000s.[3] The geodatabase design is based on the spatial database model for storing spatial data in relational and object-relational databases.[4] Given the dominance of Esri in the GIS industry, the term "geodatabase" is used by some as a generic trademark for any spatial database, regardless of platform or design.

Geodatabase
Filename extension
.gdb (file), .geodatabase (mobile)
Developed byEsri
Initial releaseDecember 1999; 24 years ago (1999-12)
Latest release
11
2022
Type of formatdatabase
Container forspatial database including vector and raster data
Open format?no
Free format?no

History

edit

The origin of the geodatabase was in the mid-1990s during the emergence of the first spatial databases. One early approach to integrating relational databases and GIS was the use of server middleware, a third-party program that stores the spatial data in database tables in a custom format, and translates it dynamically into a logical model that can be understood by the client software. In 1996, Esri purchased an early middleware product called Spatial DataBase Engine and rebranded it ArcSDE. Initially, ArcSDE stored and delivered simple vector datasets that looked very similar to shapefiles, but the need for a more robust data model emerged as Esri's Shapefile format became a de facto standard for vector spatial data, even as its shortcomings limited its use in enterprise applications. At the same time, the Arc/INFO coverage format was becoming obsolete after 20 years, unable to handle growing expectations of GIS users.[5] Another motivating factor was that even though several relational database vendors were introducing their own spatial extensions (with the notable exception of Esri's preferred Microsoft SQL Server), their structures and interfaces varied and Esri wanted its users to see all spatial data in the same apparent structure regardless of how it was stored internally.[6]: 240 

At the end of 1999, Esri introduced the Geodatabase model as the native format used in its new ArcGIS software (branded Version 8.0 to maintain continuity with Arc/INFO).[7] Initially, it could be implemented as a multiuser geodatabase in ArcSDE on a server or the personal geodatabase locally.[8]: 12  Support for topology rules, linear referencing, and survey data were added in 2003 (with ArcGIS 8.3).[9][10][11] Network data was added to the geodatabase in 2005 (ArcGIS 9.1),[12] and vector terrain ( TIN, LIDAR) in 2006 (ArcGIS 9.2).[13] Also at the 9.2 release, ArcSDE was subsumed into ArcGIS Server and the multiuser database format was rebranded the enterprise geodatabase.

Due to shortcomings in the personal geodatabase format (especially file size limitations in Microsoft Access), Esri developed a more robust custom file format, released in 2006 (ArcGIS 9.2) as the file geodatabase.[13] It also released a product called the workgroup geodatabase that included the free Microsoft SQL Server Express for smaller multi-user applications, which has since been discontinued.[14] Eventually, the middleware components for reading and writing the geodatabase spatial database structure were incorporated into ArcGIS desktop, eliminating the need for ArcSDE to be running on the server end. The most recent addition has been the mobile geodatabase format in 2020 (ArcGIS Pro 2.7), which uses SQLite as the backend to store the entire geodatabase as a single file. This replaces the personal geodatabase, which is no longer supported.[15]

Applications

edit

Geodatabases, being a common format for GIS datasets, have applications anywhere GIS are widely employed. These applications are so basic often times researchers do not mention their use in studies. There are several fields where their use is extensively documented, including public health, crime analysis, and resource management.

Epidemiology

edit

Since John Snow famously identified the source of a cholera outbreak, spatial data has been central to epidemiology and public health.[16][17] In recent years, information that is relevant to public health has increased exponentially.[18] Leveraged correctly, this data can allow for a rapid response to emerging diseases. To accomplish this, geodatabases are employed extensively to organize data and allow for the identification of space-time patterns.[17][18] Examples of the use of geodatabase to manage epidemiological data include linking environmental and health data to find patterns.[19] They were used extensively to organize data related to West Nile virus epidemics, and the COVID-19 pandemic.[20][21] This use includes analyzing misinformation, and the infodemic, surrounding COVID-19.[22]

Resources management

edit

Geospatial data around resource management plays is extremely complex. Factors such as the forest, water, and mineral resources being managed are obvious; however, governance and socioeconomic factors also play a large role.[3][23] It is common practice to employ geodatabases to manage these diverse datasets.[23] They have also been used in organized Early Detection Rapid Response (EDRR) efforts to treat invasive plant species to protect environmental resources.[24]

TIGER files

edit

In 1995 The United States Census Bureau made the Topologically Integrated Geographic Encoding and Referencing, or TIGER, Mapping Service available to the public, facilitating desktop and Web GIS by hosting US boundary data.[25] This data availability, facilitated through the internet, silently revolutionized cartography by providing the world with authoritative boundary files, for free. Today, these files, which contain up-to-date boundaries for the United States states, counties, and more, are provided to the public in prepackaged geodatabases.[26]

Logical Model

edit

To the user, a geodatabase looks like a collection of datasets, including some containing geographic data and some auxiliary elements that add functionality to the data. This user view is identical, regardless of how the geodatabase is stored (although enterprise geodatabases add some functions).

Datasets

edit

Datasets contain geographic data. A geodatabase can contain spatially referenced data in vector or raster formats, or non-spatially referenced data in tabular format.[27][28] Each dataset contains information about any number of individual items, but typically all of the items in a dataset are of the same theme (e.g., temperature measurements, roads in a city) and have the same set of properties.

Table
A traditional relational database table without any spatial information. Called a "business table" in early versions of ArcGIS.[8]
Feature class
A dataset based on the vector data model, storing a list of objects with a geometric shape in one column and a set of attributes in additional columns. While this may seem similar to earlier vector file formats such as the shapefile, several enhancements have been added. In addition to the basic points, lines, and polygons, the shape data types can also include annotation (text), dimensions (graphical depictions of distance), multipoints (many points in a single shape), multipatch three-dimensional objects.[29]: 142  Each vertex in a line can be three-dimensional (x,y,z) and can also store a measurement value (e.g. a highway milepost value). Lastly, line segments can be curved as circular arcs or Bézier curves in addition to the traditional straight lines.[6]: 244 
Raster dataset
A raster grid use the same data model as most raster GIS and image files.[30] Geodatabase rasters using internal tiling and pyramid structures to improve drawing and analysis performance, especially for very large grids.

Auxiliary elements

edit

A number of elements can be included that are generally dependent on one or more datasets, adding functionality such as quality control. Some of these are called controller datasets

Feature dataset
A collection of several feature classes within the geodatabase, roughly similar to a folder in a file system. All of the feature classes in a feature dataset share the same spatial reference system.[31]: 54 
Versioning
In an enterprise geodatabase on a database server (not a file geodatabase), a feature class or table can be versioned to facilitate multi-user editing.[14] As in other software version control systems, previous versions of the data are stored so that undesirable edits can be rolled back.
Attribute domain
A list of valid values for a quantitative or qualitative property (e.g. land use type codes).[32]: 28  These facilitate data entry (the user can pick a value from a list rather than trying to remember the exact text) and quality control (reducing invalid values)
Subtype
Partitions a table or feature class into subcategories, with common attributes and other behaviors.[32]: 32 
Attribute index
Stored sorting order for one or more columns in a table or feature class, increasing search and processing performance.[32]: 71 
Relationship class
A configuration of a relationship between two tables or feature classes in the relational database sense, including specifying primary keys and foreign keys. This facilitates table joins and cross-table selections.[33]
Topology
A list of rules for valid topological relationships between features in one or more feature classes (e.g., "county polygons cannot overlap," "state polygons must align with county polygons").[9] These can be used for checking data quality and correcting topological errors.[31]: 55–56 
Terrain dataset (formerly TIN dataset)
A triangulated irregular network constructed from feature classes containing three-dimensional points and lines.[34]
Network dataset, Utility network
Two different approaches to constructing a topologically-connected network from feature classes representing linear segments (and often point junctions).[29]: 257  The network dataset has functionality designed primarily for transport network analysis (pedestrian, road, rail), while the utility network is designed for other infrastructure, such as water, sewer, power, and telecommunications networks.
Parcel fabric
An integrated dataset for storing cadastral surveying data, which describe real property parcels using distance-direction measurements (traditionally called COGO for coordinate geometry). Feature classes are used to store monument points, boundary lines, and parcel polygons with topological bindings between them.[35]
Linear referencing system (LRS)
An augmentation of a line feature class that enables attributes to referenced to points or segments within each line rather than attached to the line as a whole.[36] Examples include storing accident locations or real-time traffic counts along a road.
Mosaic dataset
A virtual composite raster grid composed of images that are stored as separate raster files. This is stored as a polygon feature class with a row for each image including image properties such as image filenames and georeferencing information, and the shape representing the desired display extent of the image, enabling seamless composition of overlapping images.[37]

Implementations

edit

Since its first introduction in 1999, the geodatabase has been available on a number of platforms to meet various project needs.

Enterprise Geodatabase (formerly Multiuser Geodatabase)
The data is stored in a third-party relational database management system on a server. This allows for the implementation of the geodatabase within various distributed GIS architectures, including Internet and Web GIS through the use of ArcIMS (Arc Internet Mapping Service).[38] Originally, only commercial RDBMS software was supported: Microsoft SQL Server, Oracle, Informix, or IBM Db2. Eventually, support for PostgreSQL and SAP HANA were added and Informix was discontinued. The spatial data in a feature class can either be stored using the geometry datatype provided by the RDBMS native spatial extension (Oracle Spatial, PostGIS, etc.) or Esri's proprietary ST_GEOMETRY format, all of which are based on the Open Geospatial Consortium Simple Features specification, but with different encoding structures.[11]: 13  Currently, an ArcGIS Enterprise (formerly ArcGIS Server) license must be purchased to create an enterprise geodatabase, even if no server software is installed. Each dataset (feature class, table, raster, auxilliary element) is stored as a normal table, but the geodatabase adds several system tables to provide overall organization (in ArcGIS 8-9, there were many more system tables, which were streamlined in ArcGIS 10):[39]
  • GDB_Items: a "table of contents" for all of the elements of the geodatabase as the user will see them, pointing to the corresponding physical tables
  • GDB_ItemTypes: the type of dataset of each table (table, feature class, etc.)
  • GDB_ItemRelationships: information about groupings of tables, such as feature datasets
  • GDB_ItemRelationshipTypes: lookup table of types of item relationships
  • GDB_DBTune: general parameters for the geodatabase
  • GDB_SpatialRefs: a list of the spatial reference systems used in the datasets
  • GDB_SystemCatalog: a list of all tables, including data and system tables
Personal Geodatabase
One of the original options, storing data in a single Microsoft Access file (.mdb), intended for local storage for a single user without needing a server.[8]: 12  Although the format was relatively straightforward, using the same table structure as the multiuser geodatabase, there were severe limitations, including a 2Gb file size limit and the lack of 64-bit Access libraries.[6]: 240  This was gradually phased out in favor of the file and mobile geodatabases, and it is no longer supported in ArcGIS Pro.[15]
File Geodatabase
Introduced in 2006 to replace the personal geodatabase. This is a proprietary format owned by Esri, although other software developers have reverse-engineered it[40] and Esri provides a read/write library for use in other software. It is not a single file, but a collection of files (roughly one for each data or system table in the relational database geodatabase) collected in a folder with a .gdb extension. The following file types are included:[40]
  • a########.gdbtable: a table (system table, data table, feature class, raster) consisting of rows with geometry and/or attribute columns
  • a########.gdbtablx: a lookup list of the byte offset of each row in the data table
  • a########.gdbindexes: a list of all the indexes for a data table
  • a########.name.atx: an attribute index for a data table, listing the rows in the sorted order of the selected attribute column. A single data table can have multiple indices.
  • a########.spx: a spatial index for a feature class table to speed up shape access, using a gridded spatial index.
  • a########.cdf: a compressed version of one of the above files
  • a00000001.* - a00000008.*: system tables, as in the enterprise geodatabase (GDB_SystemCatalog, GDB_SpatialRefs, GDB_DBTune, etc.)
Mobile Geodatabase
The newest option, structured the same as the enterprise geodatabase but stored as a single file in the SQLite format, a de facto standard for sharing and storing data on mobile devices.[41] Initially released with limited functionality, it now supports all of the geodatabase elements. Despite the common database, this is very different from a Geopackage model.

Application support

edit

The ability to read and write geodatabase format is not limited to Esri products; other software are also able to read & write this format, including:

See also

edit

References

edit
  1. ^ "What is a geodatabase?". ArcGIS Pro Documentation. Esri. Retrieved 8 January 2023.
  2. ^ Nasser, Hussein (June 2014). Learning ArcGIS Geodatabases. PACKT. ISBN 978-1-78398-864-8.
  3. ^ a b Chesnaux, Romain; Lambert, Mélanie; Walter, Julien; Fillastre, Ugo; Hay, Murray; Rouleau, Alain; Daigneault, Réal; Moisan, Annie; Germaneau, Denis (November 2011). "Building a geodatabase for mapping hydrogeological features and 3D modeling of groundwater systems: Application to the Saguenay–Lac-St.-Jean region, Canada". Computers & Geosciences. 37 (11): 1870–1882. Bibcode:2011CG.....37.1870C. doi:10.1016/j.cageo.2011.04.013. Retrieved 31 January 2023.
  4. ^ Arctur, David; Michael, Zeiler (2004). Designing Geodatabases: Case Studies in GIS Data Modeling (1 ed.). Redlands, CA: ESRI Press. ISBN 1-58948-021-X.
  5. ^ "The Integration of ArcInfo and ArcSDE: A True Enterprise GIS Solution". ArcNews. 1 (Winter). 2000.
  6. ^ a b c Kennedy, Michael D. (2013). Introducing Geographic Information Systems with ArcGIS : A Workbook Approach to Learning GIS. Wiley. ISBN 9781118159804.
  7. ^ "Object-Oriented Data Model: An Introduction". ArcNews. 1 (Summer). 1999.
  8. ^ a b c Zeiler, M. (1999). Modeling Our World: The ESRI Guide to Geodatabase Design. Redlands, CA: ESRI Press.
  9. ^ a b "ArcGIS 8.3 Brings Topology to the Geodatabase". ArcNews Online (Summer 2002). 2002. Retrieved 24 January 2023.
  10. ^ "Survey Analyst: A Dream Coming True". ArcNews Online (Spring 2003). 2003. Retrieved 24 January 2023.
  11. ^ a b "Understanding ArcSDE" (PDF). Esri Technical Library. Esri. pp. 24–25. Retrieved 9 January 2023.
  12. ^ "ArcGIS 9.1 Network Analyst Is Here". ArcNews Online (Spring 2005). 2005. Retrieved 24 January 2023.
  13. ^ a b "Highlights of What's to Come in ArcGIS 9.2—Esri's Next Enhanced Release". ArcNews Online (Spring 2006). 2006. Retrieved 24 January 2023.
  14. ^ a b "The Geodatabase: Modeling and Managing Spatial Data". ArcNews Online (Winter 2008/2009). 2008. Retrieved 24 January 2023.
  15. ^ a b Evans, Elaine. "It's Not Personal, It's Mobile: A brief history of the geodatabase and why personal geodatabases are not in ArcGIS Pro". ArcGIS Blog. Esri. Retrieved 9 January 2023.
  16. ^ Snow, John (1855). On the Mode of Communication of Cholera (2nd ed.). London: John Churchill.
  17. ^ a b Cromley, Ellen K.; McLafferty, Sara L. (2002). GIS and Public Health (1 ed.). New York, New York: The Guilford Press. ISBN 1-57230-707-2.
  18. ^ a b Openshaw, Stan (1994). "5". In Fotheringham, Stewart; Rogerson, Peter (eds.). Spatial Analysis and GIS: Two exploratory space-time-attribute pattern analyzers relevant to GIS (1 ed.). Basingstoke, Great Britain: Taylor & Francis. pp. 83–104. ISBN 0-7484-0103-2.
  19. ^ Zhan, F. Benjamin; Brender, Jean D.; Han, Yaowen; Suarez, Lucina; Langlois, Peter H. (12 September 2006). "GIS-EpiLink: A Spatial Search Tool for Linking Environmental and Health Data". Journal of Medical Systems. 30 (5): 405–412. doi:10.1007/s10916-006-9027-y. PMID 17069004. S2CID 15818674.
  20. ^ Lian, Min; Warner, Ronald D; Alexander, James L; Dixon, Kenneth R (21 September 2007). "Using geographic information systems and spatial and space-time scan statistics for a population-based risk analysis of the 2002 equine West Nile epidemic in six contiguous regions of Texas". International Journal of Health Geographics. 6: 42. doi:10.1186/1476-072X-6-42. ISSN 1476-072X. PMC 2098755. PMID 17888159.
  21. ^ Afagbedzi, Seth Kwaku; Owusu, Alex Barimah; Kissiedu, Isaac Newton; Amoako-Coleman, Mary; Bandoh, Delia Akosua; Noora, Charles Lwanga; Aikins, Ben Emunah; Hinneh, Richmond Takyi; Calys-Tagoe, Benedict; Keziah Laurencia Malm, Keziah Laurencia; Kenu, Ernest (September 19, 2021). "Design and deployment of relational geodatabase on mobile GIS platform for real-time COVID-19 contact tracing in Ghana". Ghana Journal of Geography. 13 (1): 126–146. doi:10.4314/gjg.v13i1.7. ISSN 2821-8892. S2CID 236392409. Retrieved 30 May 2023.
  22. ^ Forati, Amir Masoud; Ghose, Rina (August 2021). "Geospatial analysis of misinformation in COVID-19 related tweets". Applied Geography. 133. Bibcode:2021AppGe.13302473F. doi:10.1016/j.apgeog.2021.102473. PMC 8176902. PMID 34103772.
  23. ^ a b Plassin, Sophie; Koch, Jennifer; Paladino, Stephanie; Friedman, Jack R.; Spencer, Kyndra; Vaché, Kellie B. (6 March 2020). "A socio-environmental geodatabase for integrative research in the transboundary Rio Grande/Río Bravo basin". Scientific Data. 7 (1): 80. Bibcode:2020NatSD...7...80P. doi:10.1038/s41597-020-0410-1. PMC 7060182. PMID 32144267.
  24. ^ Adams, Aaron (2021). "Treating Invasive Tamarisk as an Intern at San Andres National Wildlife Refuge" (PDF). The Geographical Bulletin. 62 (2): 101–103. Retrieved 11 July 2023.
  25. ^ Fu, Pinde; Sun, Jiulin (2011). Web GIS: Principles and Applications. Redlands, Calif.: ESRI Press. ISBN 978-1-58948-245-6. OCLC 587219650.
  26. ^ "TIGER/Line Geodatabases". United States Census Bureau. January 26, 2023. Retrieved 31 May 2023.
  27. ^ DeMers, Michael (2009). Fundamentals of Geographic Information Systems (4 ed.). John Wiley and Sons Inc. ISBN 978-0-470-12906-7.
  28. ^ Dempsey, Caitlin (October 11, 2022). "Types of GIS Data Explored: Vector and Raster". GIS Lounge. Retrieved 22 January 2023.
  29. ^ a b Shellito, Bradley A. (2015). Discovering GIS and ArcGIS. W.H. Freeman. ISBN 978-1-4641-4520-9.
  30. ^ DeMers, Michael (2002). GIS Modeling in Raster (1 ed.). John Wiley and Sons Inc. ISBN 0-471-31965-1.
  31. ^ a b Chang, Kang-tsung (2014). Introduction to Geographic Information Systems (7th ed.). McGraw-Hill. ISBN 978-0-07-352290-6.
  32. ^ a b c Hussein, Nasser (2014). Learning ArcGIS Geodatabases. Birmingham/Mumbai: Packt. ISBN 978-1-78398-864-8.
  33. ^ "Relationship class properties". ArcGIS Pro Documentation. Esri. Retrieved 9 January 2023.
  34. ^ "What is a terrain dataset?". ArcGIS Pro Documentation. Esri. Retrieved 9 January 2023.
  35. ^ "Parcels in the parcel fabric". ArcGIS Pro Documentation. Esri. Retrieved 24 January 2023.
  36. ^ "Introduction to linear referencing". ArcGIS Pro Documentation. Esri. Retrieved 24 January 2023.
  37. ^ "Mosaic layer". ArcGIS Pro Documentation. Esri. Retrieved 9 January 2023.
  38. ^ Mathiyalagan, V.; Grunwald, S.; Reddy, K.R.; Bloom, S.A. (April 2005). "A WebGIS and geodatabase for Florida's wetlands". Computers and Electronics in Agriculture. 47 (1): 69–75. Bibcode:2005CEAgr..47...69M. doi:10.1016/j.compag.2004.08.003. Retrieved 31 January 2023.
  39. ^ "The architecture of a geodatabase". ArcGIS Pro Documentation. Esri. Retrieved 9 January 2023.
  40. ^ a b Rouault, Even. "FGDB Spec". Github. Retrieved 9 January 2023.
  41. ^ Rees, Donald. "Look at Mobile Geodatabases go!". ArcGIS Blog. Esri. Retrieved 9 January 2023.
  42. ^ Visual Information Solutions. Using ENVI and Geographic Information Systems (GIS): Whitepaper (PDF). Visual Information Solutions. Retrieved 1 June 2023.
  43. ^ "How to open Geodatabase data in ERDAS IMAGINE". Hexagon. 23 September 2021. Retrieved 1 June 2023.
  44. ^ UCLA Geospatial (1 October 2015). "WORKING WITH FILE GEODATABASES (.GDB) USING QGIS AND GDAL". UCLA Geospatial. Retrieved 1 June 2023.