U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset is a compilation of data obtained from the Idaho Department of Water Quality, the Idaho Department of Water Resources, and the Water Quality Portal. The 'Samples' table stores information about individual groundwater samples, including what was being sampled, when it was sampled, the results of the sample, etc. This table is related to the 'MonitoringLocation' table (which contains information about the well being sampled).
WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.
The Biological Sampling Database (BSD) is an Oracle relational database that is maintained at the NMFS Panama City Laboratory and NOAA NMFS Beaufort Laboratory. Data set includes port samples of reef fish species collected from commercial and recreational fishery landings in the U.S. South Atlantic (NC - FL Keys). The data set serves as an inventory of samples stored at the NMFS Beaufort Laboratory as well as final processed data. Information that may be inlcuded for each sample is trip level information, species, size meansurements, age, sex and reproductive data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FooDrugs database is a development done by the Computational Biology Group at IMDEA Food Institute (Madrid, Spain), in the context of the Food Nutrition Security Cloud (FNS-Cloud) project. Food Nutrition Security Cloud (FNS-Cloud) has received funding from the European Union's Horizon 2020 Research and Innovation programme (H2020-EU.3.2.2.3. – A sustainable and competitive agri-food industry) under Grant Agreement No. 863059 – www.fns-cloud.eu (See more details about FNS-Cloud below)
FooDrugs stores information extracted from transcriptomics and text documents for foo-drug interactiosn and it is part of a demonstrator to be done in the FNS-Cloud project. The database was built using MySQL, an open source relational database management system. FooDrugs host information for a total of 161 transcriptomics GEO series with 585 conditions for food or bioactive compounds. Each condition is defined as a food/biocomponent per time point, per concentration, per cell line, primary culture or biopsy per study. FooDrugs includes information about a bipartite network with 510 nodes and their similarity scores (tau score; https://clue.io/connectopedia/connectivity_scores) related with possible drug interactions with drugs assayed in conectivity map (https://www.broadinstitute.org/connectivity-map-cmap). The information is stored in eight tables:
Table “study” : This table contains basic information about study identifiers from GEO, pubmed or platform, study type, title and abstract
Table “sample”: This table contains basic information about the different experiments in a study, like the identifier of the sample, treatment, origin type, time point or concentration.
Table “misc_study”: This table contains additional information about different attributes of the study.
Table “misc_sample”: This table contains additional information about different attributes of the sample.
Table “cmap”: This table contains information about 70895 nodes, compromising drugs, foods or bioactives, overexpressed and knockdown genes (see section 3.4). The information includes cell line, compound and perturbation type.
Table “cmap_foodrugs”: This table contains information about the tau score (see section 3.4) that relates food with drugs or genes and the node identifier in the FooDrugs network.
Table “topTable”: This table contains information about 150 over and underexpressed genes from each GEO study condition, used to calculate the tau score (see section 3.4). The information stored is the logarithmic fold change, average expression, t-statistic, p-value, adjusted p-value and if the gene is up or downregulated.
Table “nodes”: This table stores the information about the identification of the sample and the node in the bipartite network connecting the tables “sample”, “cmap_foodrugs” and “topTable”.
In addition, FooDrugs database stores a total of 6422 food/drug interactions from 2849 text documents, obtained from three different sources: 2312 documents from PubMed, 285 from DrugBank, and 252 from drugs.com. These documents describe potential interactions between 1464 food/bioactive compounds and 3009 drugs. The information is stored in two tables:
Table “texts”: This table contains all the documents with its identifiers where interactions have been identified with strategy described in section 4.
Table “TM_interactions”: This table contains information about interaction identifiers, the food and drug entities, and the start and the end positions of the context for the interaction in the document.
FNS-Cloud will overcome fragmentation problems by integrating existing FNS data, which is essential for high-end, pan-European FNS research, addressing FNS, diet, health, and consumer behaviours as well as on sustainable agriculture and the bio-economy. Current fragmented FNS resources not only result in knowledge gaps that inhibit public health and agricultural policy, and the food industry from developing effective solutions, making production sustainable and consumption healthier, but also do not enable exploitation of FNS knowledge for the benefit of European citizens. FNS-Cloud will, through three Demonstrators; Agri-Food, Nutrition & Lifestyle and NCDs & the Microbiome to facilitate: (1) Analyses of regional and country-specific differences in diet including nutrition, (epi)genetics, microbiota, consumer behaviours, culture and lifestyle and their effects on health (obesity, NCDs, ethnic and traditional foods), which are essential for public health and agri-food and health policies; (2) Improved understanding agricultural differences within Europe and what these means in terms of creating a sustainable, resilient food systems for healthy diets; and (3) Clear definitions of boundaries and how these affect the compositions of foods and consumer choices and, ultimately, personal and public health in the future. Long-term sustainability of the FNS-Cloud will be based on Services that have the capacity to link with new resources and enable cross-talk amongst them; access to FNS-Cloud data will be open access, underpinned by FAIR principles (findable, accessible, interoperable and re-useable). FNS-Cloud will work closely with the proposed Food, Nutrition and Health Research Infrastructure (FNHRI) as well as METROFOOD-RI and other existing ESFRI RIs (e.g. ELIXIR, ECRIN) in which several FNS-Cloud Beneficiaries are involved directly. (https://cordis.europa.eu/project/id/863059)
***** changes between version FooDrugs_v2 and FooDrugs_V3 (31st January 2023) are:
Increased the amount of text documents by 85.675 from PubMed and ClinicalTrials.gov, and the amount of Text Mining interactions by 168.826.
Increased the amount of transcriptomic studies by 32 GEO series.
Removed all rows in table cmap_foodrugs representing interactions with values of tau=0
Removed 43 GEO series that after manually checking didn't correspond to food compounds.
Added a new column to the table texts: citation to hold the citation of the text.
Added these columns to the table study: contributor to contain the authors of the study, publication_date to store the date of publication of the study in GEO and pubmed_id to reference the publication associated with the study if any.
Added a new column to topTable to hold the top 150 up-regulated and 150 down-regulated genes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains the relational database with the information about all physical samples collected during the installation of the geothermal well on TU Delft campus, between June and November of 2023. The full description of the data collection methods is provided in the End-of-well Science Programme Report of the campus geothermal doublet (link here).
This database is a simple and easy-to-use tool that facilitated the initial registration of samples collected by TU Delft staff at the drilling site. It has been created as a desktop application using Access, the database management system from Microsoft, to enable a graphical user interface customised for the geothermal well project. The full database structure and usage is described in the Database_User_Guide.pdf file.
Sample inventories are also provided as individual spreadsheets to facilitate sample requests. Full description of the sample request procedure is described in the webpage https://www.tudelft.nl/geothermalwell.
A web application based on this sample database structure has been developed with the support of the TU Delft Digital Competence Centre (DCC) to enable remote access and central data storage (DOI: https://doi.org/10.4121/09461663-32eb-4dda-aeba-28016fd7e7f6).
This dataset is a compilation of data obtained from the Idaho Department of Water Quality, the Idaho Department of Water Resources, and the Water Quality Portal. The 'MonitoringLocation' table stores attribute data for groundwater wells. This table is related to the 'SiteID' table (which lists the variety of names given to each well by different organizations) and the 'Samples' table (which shows each sample associated with a particular well).
This example demonstrates how to use PostGIS capabilities in CyberGIS-Jupyter notebook environment. Modified from notebook by Weiye Chen (weiyec2@illinois.edu)
PostGIS is an extension to the PostgreSQL object-relational database system which allows GIS (Geographic Information Systems) objects to be stored in the database. PostGIS includes support for GiST-based R-Tree spatial indices, and functions for analysis and processing of GIS objects.
Resources for PostGIS:
Manual https://postgis.net/docs/ In this demo, we use PostGIS 3.0. Note that significant changes in APIs have been made to PostGIS compared to version 2.x. This demo assumes that you have basic knowledge of SQL.
MAPDAT is a program for plotting spatial data held in the ORACLE relational database onto any map within the Australian region at any scale. MAPDAT also includes a system for defining geological structures, thus any geological structure can be stored in the database and plotted.
The program enables the plotting of sample locations along with infomration specific to each location. The information can be displayed beside each point or in a list to the side of the map. The symbols can be sized proportionally to the value of a column in a table or a SQL expression. Town locations, survey paths, gridlines, survey areas, coastlines and other geographical lines can be plotted.
The program does not compete with geographical information systems but fills a niche at a much lower level of complexity. As a result of its simplicity a minimum in setting up of data is required and using the program is very straight forward with the user always aware of the database operations being performed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database of published cosmogenic Be-10 and Al-26 concentrations from modern river sediment and basin-averaged denudation rates inferred from these data. Ancillary spatial data includes: sample site location (point), basin outline (polygon), digital elevation model (raster), D8 flow direction and flow accumulation grids (raster), topographic gradient (raster), atmospheric pressure (raster), and cosmogenic nuclide production scaling factor and topographic shielding grids (raster). The vector spatial data uses the WGS84/Pseudo-Mercator (EPSG: 3857) projected coordinate reference system. The raster data uses the WGS86/UTM projected coordinate reference system, UTM zones depending on the extent and location of each data package. Sample metadata is comprehensive and includes all necessary information and input files for the recalculation of denudation rates using the CAIRN model (https://github.com/LSDtopotools/LSDTopoTools_CRNBasinwide). All denudation rates were recalculated and harmonised using CAIRN. The extent of the data is global, excluding Australia.
Accompanying publication: Codilean, A. T., Munack, H., Saktura, W. M., Cohen, T. J., Jacobs, Z., Ulm, S., Hesse, P. P., Heyman, J., Peters, K. J., Williams, A. N., Saktura, R. B. K., Rui, X., Chishiro-Dennelly, K., and Panta, A.: OCTOPUS database (v.2), Earth Syst. Sci. Data, 14, 3695–3713, https://doi.org/10.5194/essd-14-3695-2022, 2022.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project WARLUX - Soldiers and their communities in WWII: The impact and legacy of war experiences in Luxembourg is a research project based at the Luxembourg Centre for Contemporary and Digital History (C²DH) (University of Luxembourg). The projects focuses on the war experiences of male Luxembourgers born between 1920 and 1927 who were recruited and conscripted into Nazi German services (Reichsarbeitsdienst (RAD) and Wehrmacht) under the Nazi occupation in Luxembourg during the Second World War.
Data Sample
While over 12,000 men and women were affected by the conscription, Project WARLUX focuses on a case study of 304 recruits from Schifflange and their families. In total, the data sample includes around 1200 persons, recruits and their family members.
Origin of the data
The dataset primarily consists of compiled archival documentation, including organizational and official documents, statistics, and standardized fiches and cards. These sources are primarily sourced from the Luxembourgish National Archives and other relevant repositories.
In addition to basic information such as name, birth date, and residence, the (internal) dataset also incorporates military records sourced from German archives. Furthermore, supplementary information related to captivity, repatriation, and compensation was collected in the post-war period. The surveys and statistics conducted by the Luxembourgish state provide valuable insights into the experiences and trajectories of the war-affected generation.
It is important to note that the dataset is a composite of multiple heterogeneous sources, reflecting its diverse origins.
Database
The researchers involved in the WARLUX project opted for the utilization of a relational database, nodegoat.
The WARLUX project adheres to an object-oriented approach, which is reflected in the core functionalities provided by nodegoat. Given the project's specific focus on the war experiences of recruited Luxembourgers within Nazi services such as the Wehrmacht and RAD, the included data model (warlux data model file) represents only a partial depiction of the comprehensive nodegoat environment employed in the WARLUX project. Within this data model, the interconnected objects and their respective sub-objects are presented, with particular emphasis placed on the individual profiles of recruits and their involvement in military service.
As the data can not be published due to restriction, the team provides a pseudonymized dataset as an example of the data structure.
The provided dataset shows the male recruits (and conscripts) of the Case Study Schifflange (born between 1920 and 1927). It includes
nodegoat ID
their birthdate
information on death if it occurred during the war
whereabout after the war (unknown, missing, KIA, returned etc.)
The dataset also includes references to their recruitment into
the Wehrmacht and/or the RAD as well as their subsequent activities such as
being captured as a Prisoner of War (POW)
serving for the Allied Forces
desertion, or
draft evasion (réfractaire).
The access to the WARLUX nodegoat database, on recruits of Schifflange/Luxembourg is restricted due to sensitive data. For further questions please contact warlux@uni.lu
The project is funded by the Fond National de la Recherche Luxembourg (FNR).
The North Pacific Pelagic Seabird Database (NPPSD) was created in 2005 to consolidate data on the oceanic distribution of marine bird species in the North Pacific. Most of these data were collected using at-sea strip-transect surveys within defined areas and at known locations. The NPPSD also contains observations of other bird species and marine mammals. The original NPPSD combined data from 465 surveys conducted between 1973 and 2002, primarily in waters adjacent to Alaska. These surveys included 61,195 sample transects with location, environment, and metadata information, and the data were organized in a flat-file format. In developing revising the NPPSD (2.0), our goals were to add new datasets, to make significant improvements to database functionality and to provide the database online. The NPPSD 2.0 included data from a broadened geographic range within the North Pacific, including new observations made offshore of the Russian Federation, Japan, Korea, British Columbia (Canada), Oregon, and California. These data were imported into a relational database, proofed, and structured in a common format. The NPPSD 2.0 contained 351,674 samples (transects) collected between 1973 and 2012, representing a total sampled area of 270,259 square kilometers, and extends the time series of samples in some areas-notably the Bering Sea-to four decades. It contained observations of 16,988,138 birds and 235,545 marine mammals. The third edition of the NPPSD database corrects several data duplication errors, updates the taxonomy to the current standard, and added additional data collected since 2012. The NPPSD 3.0 includes 460,2985 samples; a 30% increase in the number of transects. It contains observations of 20,098,635 birds and 365,227 marine mammals. This updated version of the NPPSD database and is available on the USGS Alaska Science Center NPPSD web site. Supplementary materials include an updated set of standardized taxonomic codes, reference maps that show the spatial and temporal distribution of the survey efforts and a downloadable query tool.
The Alaska Geochemical Database Version 4.0 (AGDB4) contains geochemical data compilations in which each geologic material sample has one best value determination for each analyzed species, greatly improving efficiency of use. The relational database includes historical geochemical data archived in the USGS National Geochemical Database (NGDB), the Atomic Energy Commission National Uranium Resource Evaluation (NURE) Hydrogeochemical and Stream Sediment Reconnaissance databases, and the Alaska Division of Geological and Geophysical Surveys (DGGS) Geochemistry database. Data from the U.S. Bureau of Mines and the U.S. Bureau of Land Management are included as well. The data tables describe historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 120 laboratory and field analytical methods performed on 416,333 rock, sediment, soil, mineral, heavy-mineral concentrate, and oxalic acid leachate samples. The samples were collected as part of various agency programs and projects from 1938 through 2021. Most samples were collected by agency personnel and analyzed in agency laboratories or under contracts in commercial analytical laboratories. Mineralogical data from 18,138 nonmagnetic heavy-mineral concentrate samples are also included in this database. The data in the AGDB4 supersede data in the AGDB, AGDB2, and AGDB3 databases but the background about the data in these earlier versions is needed to understand what has been done to amend, clean up, correct, and format these data. Data that were not included in previous versions because they predate the earliest agency geochemical databases or were excluded for programmatic reasons are included here in the AGDB4. The AGDB4 data are the most accurate and complete to date and should be useful for a wide variety of geochemical studies. They are provided as a Microsoft Access database, as comma-separated values (CSV), and as an Esri geodatabase consisting of point feature classes and related tables.
The Alaska Geochemical Database Version 2.0 (AGDB2) contains new geochemical data compilations in which each geologic material sample has one "best value" determination for each analyzed species, greatly improving speed and efficiency of use. Like the Alaska Geochemical Database (AGDB) before it, the AGDB2 was created and designed to compile and integrate geochemical data from Alaska in order to facilitate geologic mapping, petrologic studies, mineral resource assessments, definition of geochemical baseline values and statistics, environmental impact assessments, and studies in medical geology. This relational database, created from the Alaska Geochemical Database (AGDB) that was released in 2011, serves as a data archive in support of present and future Alaskan geologic and geochemical projects, and contains data tables in several different formats describing historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 85 laboratory and field analytical methods on 264,095 rock, sediment, soil, mineral and heavy-mineral concentrate samples. Most samples were collected by U.S. Geological Survey (USGS) personnel and analyzed in USGS laboratories or, under contracts, in commercial analytical laboratories. These data represent analyses of samples collected as part of various USGS programs and projects from 1962 through 2009. In addition, mineralogical data from 18,138 nonmagnetic heavy mineral concentrate samples are included in this database. The AGDB2 includes historical geochemical data originally archived in the USGS Rock Analysis Storage System (RASS) database, used from the mid-1960s through the late 1980s and the USGS PLUTO database used from the mid-1970s through the mid-1990s. All of these data are currently maintained in the National Geochemical Database (NGDB). Retrievals from the NGDB were used to generate most of the AGDB data set. These data were checked for accuracy regarding sample location, sample media type, and analytical methods used. This arduous process of reviewing, verifying and, where necessary, editing all USGS geochemical data resulted in a significantly improved Alaska geochemical dataset. USGS data that were not previously in the NGDB because the data predate the earliest USGS geochemical databases, or were once excluded for programmatic reasons, are included here in the AGDB2 and will be added to the NGDB. The AGDB2 data provided here are the most accurate and complete to date, and should be useful for a wide variety of geochemical studies. The AGDB2 data provided in the linked database may be updated or changed periodically.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset presented in the following manuscript: The Surface Water Chemistry (SWatCh) database: A standardized global database of water chemistry to facilitate large-sample hydrological research, which is currently under review at Earth System Science Data.
Openly accessible global scale surface water chemistry datasets are urgently needed to detect widespread trends
and problems, to help identify their possible solutions, and identify critical spatial data gaps where more monitoring is required. Existing datasets are limited in availability, sample size/sampling frequency, and geographic scope. These limitations inhibit the answering of emerging transboundary water chemistry questions, for example, the detection and understanding of delayed recovery from freshwater acidification. Here, we begin to address these limitations by compiling the global surface water chemistry (SWatCh) database. We collect, clean, standardize, and aggregate open access data provided by six national and international agencies to compile a database consisting of three relational datasets: sites, methods, and samples, and one GIS shapefile of site locations. We remove poor quality data (for example, values flagged as “suspect”), standardize variable naming conventions and units, and perform other data cleaning steps required for statistical analysis. The database contains water chemistry data across seven continents, 17 variables, 38,598 sites, and over 9 million samples collected between 1960 and 2019. We identify critical spatial data gaps in the equatorial and arid climate regions, highlighting the need for more data collection and sharing initiatives in these areas, especially considering freshwater ecosystems in these environs are predicted to be among the most heavily impacted by climate change. We identify the main challenges associated with compiling global databases – limited data availability, dissimilar sample collection and analysis methodology, and reporting ambiguity – and provide recommendations to address them. By addressing these challenges and consolidating data from various sources into one standardized, openly available, high quality, and trans-boundary database, SWatCh allows users to conduct powerful and robust statistical analyses of global surface water chemistry.
The Athabasca Arctic (AA) Watershed uses a data-management system called ACBIS (Aquatic Chemistry and Biology Information System) to store, track, verify, and distribute data to clients. The system consists of a relational database and a database application. The application is a multi-user application. The entire system has been fully implemented in the Prairie and Northern regions since 1998 without major revision. Its users include project managers, field personnel, and internal staff. As of August 2015, the database contains approximately 3.5 million records and encompasses over 1,550 variables for approximately 107,000 samples from both the AA and Hudson Bay (HB) Watersheds. The data span from the 1960s to the present and sample types include water quality, sediment, semi-permeable membrane devices (SPMDs), and fish samples. The estimated value of ACBIS exceeds 100 million dollars.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Alaska Geochemical Database Version 3.0 (AGDB3) contains new geochemical data compilations in which each geologic material sample has one best value determination for each analyzed species, greatly improving speed and efficiency of use. Like the Alaska Geochemical Database Version 2.0 before it, the AGDB3 was created and designed to compile and integrate geochemical data from Alaska to facilitate geologic mapping, petrologic studies, mineral resource assessments, definition of geochemical baseline values and statistics, element concentrations and associations, environmental impact assessments, and studies in public health associated with geology. This relational database, created from databases and published datasets of the U.S. Geological Survey (USGS), Atomic Energy Commission National Uranium Resource Evaluation (NURE), Alaska Division of Geological & Geophysical Surveys (DGGS), U.S. Bureau of Mines, and U.S. Bureau of Land Management serves as a data archive in support ...
The Geochemical Database for Iron Oxide-Copper-Cobalt-Gold-Rare Earth Element Deposits of Southeast Missouri (IOCG-REE_GX) contains new geochemical data compilations for samples from IOCG-REE type deposits in which each rock sample has one "best value" determination for each analyzed species, greatly improving speed and efficiency of use. IOCG-REE_GX was created and designed to compile whole-rock and trace element data from southeast Missouri in order to facilitate petrologic studies, mineral resource assessments, and the definition and statistics of geochemical baseline values. This relational database serves as a data archive in support of present and future geologic and geochemical studies of IOCG-REE type deposits, and contains data tables in two different formats describing historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 24 laboratory analytical methods on 457 rock samples collected during two phases from outcrop and drill core sites from throughout the entire St. Francois Mountains terrane made by the U.S. Geological Survey (USGS). In the first phase, the USGS collected and analyzed 315 samples from 1989 to 1995. During the second phase from 2013-2015, 119 samples were collected and analyzed, and 23 samples from the first phase were reanalyzed using analytical methods of higher precision. The data presents the most precise analytical approach to report the best value for each element. In order to facilitate examination of the geochemistry of the broad range of samples reported (i.e., regional samples, ore zone, or ore deposit alteration-related), a short sample description is given and each sample is coded (Jcode) according to the type of rock suites defined by Kisvarsanyi (1981), whether the sample was collected as a non-ore deposit-related representative of a given rock suite or as a deposit-related sample (Kcode, and; if the sample was related to a specific ore deposit the zone within the given deposit was included (Lcode. These coded data provide a robust tool for evaluating the regional geologic setting of the host terrane as well as assessing the character of hydrothermal alteration related to many of the contained mineral deposits. Data from the first phase are currently maintained in the USGS National Geochemical Database (NGDB), and data from the second phase will soon be added. The data of the IOCG-REE_GX were checked for accuracy regarding sample location, sample media type, and analytical methods used. Reference: >Kisvarsanyi, E.B., 1981, Geology of the Precambrian St. Francois terrane, southeastern Missouri: Missouri Department of Natural Resources Report of Investigations 64, 58 p.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Groceries dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.
The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.
Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5
Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.
Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.
Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.
--- Original source retains full ownership of the source dataset ---
[Note: Integrated as part of FoodData Central, April 2019.] The database consists of several sets of data: food descriptions, nutrients, weights and measures, footnotes, and sources of data. The Nutrient Data file contains mean nutrient values per 100 g of the edible portion of food, along with fields to further describe the mean value. Information is provided on household measures for food items. Weights are given for edible material without refuse. Footnotes are provided for a few items where information about food description, weights and measures, or nutrient values could not be accommodated in existing fields. Data have been compiled from published and unpublished sources. Published data sources include the scientific literature. Unpublished data include those obtained from the food industry, other government agencies, and research conducted under contracts initiated by USDA’s Agricultural Research Service (ARS). Updated data have been published electronically on the USDA Nutrient Data Laboratory (NDL) web site since 1992. Standard Reference (SR) 28 includes composition data for all the food groups and nutrients published in the 21 volumes of "Agriculture Handbook 8" (US Department of Agriculture 1976-92), and its four supplements (US Department of Agriculture 1990-93), which superseded the 1963 edition (Watt and Merrill, 1963). SR28 supersedes all previous releases, including the printed versions, in the event of any differences. Attribution for photos: Photo 1: k7246-9 Copyright free, public domain photo by Scott Bauer Photo 2: k8234-2 Copyright free, public domain photo by Scott Bauer Resources in this dataset:Resource Title: READ ME - Documentation and User Guide - Composition of Foods Raw, Processed, Prepared - USDA National Nutrient Database for Standard Reference, Release 28. File Name: sr28_doc.pdfResource Software Recommended: Adobe Acrobat Reader,url: http://www.adobe.com/prodindex/acrobat/readstep.html Resource Title: ASCII (6.0Mb; ISO/IEC 8859-1). File Name: sr28asc.zipResource Description: Delimited file suitable for importing into many programs. The tables are organized in a relational format, and can be used with a relational database management system (RDBMS), which will allow you to form your own queries and generate custom reports.Resource Title: ACCESS (25.2Mb). File Name: sr28db.zipResource Description: This file contains the SR28 data imported into a Microsoft Access (2007 or later) database. It includes relationships between files and a few sample queries and reports.Resource Title: ASCII (Abbreviated; 1.1Mb; ISO/IEC 8859-1). File Name: sr28abbr.zipResource Description: Delimited file suitable for importing into many programs. This file contains data for all food items in SR28, but not all nutrient values--starch, fluoride, betaine, vitamin D2 and D3, added vitamin E, added vitamin B12, alcohol, caffeine, theobromine, phytosterols, individual amino acids, individual fatty acids, or individual sugars are not included. These data are presented per 100 grams, edible portion. Up to two household measures are also provided, allowing the user to calculate the values per household measure, if desired.Resource Title: Excel (Abbreviated; 2.9Mb). File Name: sr28abxl.zipResource Description: For use with Microsoft Excel (2007 or later), but can also be used by many other spreadsheet programs. This file contains data for all food items in SR28, but not all nutrient values--starch, fluoride, betaine, vitamin D2 and D3, added vitamin E, added vitamin B12, alcohol, caffeine, theobromine, phytosterols, individual amino acids, individual fatty acids, or individual sugars are not included. These data are presented per 100 grams, edible portion. Up to two household measures are also provided, allowing the user to calculate the values per household measure, if desired.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/ Resource Title: ASCII (Update Files; 1.1Mb; ISO/IEC 8859-1). File Name: sr28upd.zipResource Description: Update Files - Contains updates for those users who have loaded Release 27 into their own programs and wish to do their own updates. These files contain the updates between SR27 and SR28. Delimited file suitable for import into many programs.
According to our latest research, the global Database Management System (DBMS) market size reached USD 79.3 billion in 2024, demonstrating robust expansion with a CAGR of 13.2% from 2025 to 2033, and is forecasted to attain USD 236.8 billion by 2033. The market’s rapid growth is primarily driven by the exponential increase in data generation across industries, the rising adoption of cloud-based solutions, and the growing need for real-time data analytics and security. As organizations increasingly recognize the strategic value of data, DBMS solutions are becoming indispensable for efficient data storage, access, and management.
A major growth factor propelling the Database Management System market is the surge in digital transformation initiatives across both public and private sectors. Industries such as BFSI, healthcare, retail, and manufacturing are generating vast volumes of structured and unstructured data, necessitating sophisticated DBMS platforms for effective data handling. The proliferation of IoT devices, social media, and e-commerce platforms has further amplified the need for scalable and secure database solutions that can process diverse data types in real time. Additionally, the integration of artificial intelligence and machine learning with DBMS is enabling organizations to derive actionable insights, automate routine processes, and improve decision-making, thereby fueling market demand.
Another key driver is the shift towards cloud-based database management systems, which offer unparalleled flexibility, scalability, and cost efficiency compared to traditional on-premises solutions. Cloud DBMS platforms are particularly attractive to small and medium enterprises (SMEs) that lack the resources for extensive IT infrastructure investments, allowing them to leverage enterprise-grade data management capabilities on a subscription basis. Furthermore, with the advent of hybrid and multi-cloud environments, organizations can now optimize their data architecture for performance, redundancy, and compliance, further accelerating the adoption of cloud DBMS solutions globally.
Regulatory compliance and data security concerns are also catalyzing the growth of the Database Management System market. Governments and industry bodies worldwide are introducing stringent regulations around data privacy, storage, and access, compelling organizations to upgrade their database infrastructure. Advanced DBMS solutions now incorporate robust encryption, granular access controls, and automated compliance monitoring, ensuring that sensitive data is protected and regulatory obligations are met. This heightened focus on data governance is prompting enterprises to invest in next-generation DBMS technologies, thereby expanding the market’s growth trajectory.
Regionally, North America continues to dominate the Database Management System market owing to its advanced IT infrastructure, strong presence of leading market players, and early adoption of emerging technologies. Europe follows closely, driven by stringent data protection regulations and increasing digitalization across industries. The Asia Pacific region is witnessing the fastest growth, fueled by rapid urbanization, burgeoning IT and telecom sectors, and a rising number of SMEs embracing cloud-based solutions. Latin America and the Middle East & Africa are also experiencing steady growth, supported by expanding internet penetration and government-led digital initiatives. This regional diversity ensures that the DBMS market remains dynamic and resilient to global economic fluctuations.
The Database Management System market is distinctly segmented by component into software and services, each playing a critical role in the overall ecosystem. The software segment, which encompasses both relational and non-relational DBMS platforms, forms the backbone of the market and accounts for the majority of revenue share. This dominance is attributed to the conti
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset is a compilation of data obtained from the Idaho Department of Water Quality, the Idaho Department of Water Resources, and the Water Quality Portal. The 'Samples' table stores information about individual groundwater samples, including what was being sampled, when it was sampled, the results of the sample, etc. This table is related to the 'MonitoringLocation' table (which contains information about the well being sampled).