30 datasets found
  1. Summer Camp Warehouse and Database

    • kaggle.com
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keaton Hibshman (2023). Summer Camp Warehouse and Database [Dataset]. https://www.kaggle.com/datasets/keatonhibshman/summer-camp-warehouse-and-database
    Explore at:
    zip(453037 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    Keaton Hibshman
    Description

    The following are documents that were used to build a mock database and data warehouse and sample analysis on the data warehouse. The mock company is a summer camp agency. The software that was used for this project was SQL, Excel, Visual Studio, and Power BI.

  2. Bike Store Relational Database | SQL

    • kaggle.com
    zip
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dillon Myrick (2023). Bike Store Relational Database | SQL [Dataset]. https://www.kaggle.com/datasets/dillonmyrick/bike-store-sample-database
    Explore at:
    zip(94412 bytes)Available download formats
    Dataset updated
    Aug 21, 2023
    Authors
    Dillon Myrick
    Description

    This is the sample database from sqlservertutorial.net. This is a great dataset for learning SQL and practicing querying relational databases.

    Database Diagram:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4146319%2Fc5838eb006bab3938ad94de02f58c6c1%2FSQL-Server-Sample-Database.png?generation=1692609884383007&alt=media" alt="">

    Terms of Use

    The sample database is copyrighted and cannot be used for commercial purposes. For example, it cannot be used for the following but is not limited to the purposes: - Selling - Including in paid courses

  3. Data Warehouse As A Service (Dwaas) Market Analysis North America, Europe,...

    • technavio.com
    pdf
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Data Warehouse As A Service (Dwaas) Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, Germany, France, China, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/data-warehouse-as-a-service-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Description

    Snapshot img

    Data Warehouse As A Service Market Size 2024-2028

    The data warehouse as a service market size is forecast to increase by USD 12.32 billion at a CAGR of 24.49% between 2023 and 2028.

    The market is experiencing significant growth due to several key trends. One major trend is the shift from traditional on-premises data warehouses to cloud-based DWaaS solutions. Advanced storage technologies, such as columnar databases, in-memory storage, and cloud storage, are also driving market growth. 
    However, data privacy and security risks are challenges that need to be addressed, as organizations move their data to the cloud. DWaaS providers are responding by implementing data security and data encryption techniques to mitigate these risks. Overall, the DWaaS market is poised for continued growth as more businesses seek to leverage the benefits of cloud-based data warehousing solutions.
    

    What will be the Size of the Data Warehouse As A Service Market During the Forecast Period?

    Request Free Sample

    The market represents a significant shift in how businesses manage their data environments. DWaaS offers flexibility and scalability, enabling organizations to focus on their core competencies while leveraging cloud computing for their data warehousing needs. This market is driven by the increasing demand for Business Intelligence (BI) that can handle large data volumes and provide advanced analytics capabilities. 
    Technological developments in cloud computing, software, computing, and storage have made DWaaS a viable alternative to traditional on-premises data warehouses. However, the adoption of DWaaS is not without challenges. Security issues and integration complexities are key concerns for businesses considering a move to the cloud.
    Restricted customization is another challenge, as some organizations require specific configurations for their data warehouses. Despite these challenges, the benefits of DWaaS, such as reduced IT infrastructure complexity and improved data accessibility, continue to drive market growth. The DWaaS market is expected to expand as more businesses seek to harness the power of their data for enterprise management, visualization, and data analytics.
    

    How is this Data Warehouse As A Service Industry segmented and which is the largest segment?

    The DWaaS industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      BFSI
      Government
      Healthcare
      E-commerce and retail
      Others
    
    
    Type
    
      Enterprise DWaaS
      Operational data storage
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        France
    
    
      APAC
    
        China
        Japan
    
    
      Middle East and Africa
    
    
    
      South America
    

    By End-user Insights

    The BFSI segment is estimated to witness significant growth during the forecast period.
    

    The BFSI sector's reliance on managing and analyzing large financial data volumes has fueled the adoption of Data Warehouse as a Service (DWaaS) solutions. DWaaS offers flexibility and scalability, enabling BFSI companies to efficiently manage data from retail banking institutions, lending operations, credit underwriting procedures, and financial consulting firms. DWaaS solutions provide core competencies in cloud computing, business intelligence (BI), data analytics, enterprise management, visualization, and BI solutions. Technological developments, such as IoT technology and AI technology, further enhance DWaaS capabilities. However, challenges persist, including security issues, integration challenges, and restricted customization. Cloud solutions, including cloud data warehouses, offer a data environment that is software, computing, and storage-intensive.

    DWaaS companies address concerns with service disruptions, latency, data integration, and data access. Security measures, such as data encryption and data masking, ensure data privacy. Despite these challenges, DWaaS adoption continues to grow, offering decision support services, data categorization, and data assessment to mid-size businesses and large enterprises.

    Get a glance at the Data Warehouse As A Service Industry report of share of various segments Request Free Sample

    The BFSI segment was valued at USD 665.10 million in 2018 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 35% to the growth of the global market during the forecast period.
    

    Technavio’s analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regions, Request Free Sample

    The North American market for Data Warehouse as a Service (DWaaS) is experiencing significant growth due to the region's early adoption of advanced techn

  4. G

    Data Warehousing Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Warehousing Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-warehousing-market-global-industry-analysis
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 6, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Warehousing Market Outlook



    According to our latest research, the global Data Warehousing market size reached USD 32.7 billion in 2024, reflecting robust adoption across diverse industry verticals. The market is anticipated to expand at a CAGR of 8.6% from 2025 to 2033, driven by surging demand for advanced analytics, cloud integration, and real-time business intelligence. By 2033, the Data Warehousing market size is forecasted to reach USD 68.2 billion, underscoring the sector’s pivotal role in empowering organizations to harness data for strategic decision-making. This growth is underpinned by the ongoing digital transformation across sectors, the proliferation of big data, and the increasing adoption of cloud-based solutions.




    The rapid expansion of the Data Warehousing market is primarily fueled by the exponential increase in data volumes generated from various sources such as IoT devices, enterprise applications, and social media platforms. Organizations across industries are striving to convert raw data into actionable insights, leading to heightened investments in data warehousing infrastructure and solutions. The integration of artificial intelligence and machine learning algorithms within data warehouses is enabling advanced analytics, predictive modeling, and real-time reporting, which further accelerates market growth. Additionally, the push towards digital transformation initiatives is compelling enterprises to modernize their legacy data management systems and migrate to more agile and scalable data warehousing platforms.




    Another significant growth factor for the Data Warehousing market is the increasing adoption of cloud-based data warehousing solutions. Cloud deployment offers unparalleled scalability, flexibility, and cost efficiency, making it an attractive choice for both large enterprises and small and medium-sized businesses (SMEs). Cloud data warehouses eliminate the need for substantial upfront capital expenditure and reduce the complexities associated with on-premises infrastructure management. Furthermore, the integration of data warehousing with other cloud services, such as advanced analytics and AI-driven tools, enhances the overall value proposition for organizations seeking to optimize their data-driven decision-making processes.




    The proliferation of self-service business intelligence (BI) tools and the growing emphasis on data democratization are also catalyzing the growth of the Data Warehousing market. Enterprises are empowering business users with intuitive tools that enable them to access, analyze, and visualize data without heavy reliance on IT departments. This shift not only accelerates the pace of decision-making but also fosters a data-driven culture within organizations. As regulatory requirements around data privacy and security become more stringent, data warehousing solutions are evolving to incorporate advanced security features, compliance frameworks, and robust data governance capabilities, further boosting market adoption.




    Regionally, North America continues to dominate the Data Warehousing market due to the early adoption of advanced technologies, the presence of major cloud service providers, and a mature digital ecosystem. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, increasing IT investments, and the proliferation of SMEs embracing cloud-based analytics. Europe is also witnessing steady growth, supported by stringent data protection regulations and a strong focus on digital innovation. The Middle East & Africa and Latin America are gradually catching up, with organizations in these regions increasingly recognizing the strategic value of data warehousing in driving business transformation.





    Component Analysis



    The Component segment of the Data Warehousing market comprises ETL Solutions, Data Warehouse Database, Data Warehouse Software, and Services. ETL (Extract, Transform, Load) solutions are foundational to the data warehousing process, enabling organizat

  5. d

    Warehouse and Retail Sales

    • catalog.data.gov
    • data.montgomerycountymd.gov
    • +4more
    Updated Nov 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
    Explore at:
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly

  6. Virginia Springs/Groundwater Layers - 2023

    • data.virginia.gov
    • hub.arcgis.com
    • +3more
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Virginia Department of Environmental Quality (2025). Virginia Springs/Groundwater Layers - 2023 [Dataset]. https://data.virginia.gov/dataset/virginia-springs-groundwater-layers-2023
    Explore at:
    html, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Jul 29, 2025
    Dataset authored and provided by
    Virginia Department of Environmental Qualityhttps://deq.virginia.gov/
    Area covered
    Hot Springs
    Description
    The VDEQ Spring SITES database contains data describing the geographic locations and site attributes of natural springs throughout the commonwealth. This data coverage continues to evolve and contains only spring locations known to exist with a reasonable degree of certainty on the date of publication. The dataset does not replace site specific inventorying or receptor surveys but can be used as a starting point. VDEQ's initial geospatial dataset of approximately 325 springs was formed in 2008 by digitizing historical spring information sheets created by State Water Control Board geologists in the 1970s through early 1990s. Additional data has been consolidated from the EPA STORET database, the U.S. Geological Survey's Ground Water Site Inventory (GWSI) and Geographic Names Inventory System (GNIS), the Virginia Department of Health SDWIS database, the Virginia DEQ Virginia Water Use Data Set (VWUDS), the Commonwealth of Virginia Division of Water Resources and Power Bulletin No. 1: "Springs of Virginia" by Collins et al., 1930 as well as several VDWR&P Surface Water Supply bulletins from the 1940's - 1950's. A 1992 Virginia Department of Game and Inland Fisheries / Virginia Tech sponsored study by Helfrich et al. titled "Evaluation of the Natural Springs of Virginia: Fisheries Management Implications", a 2004 Rockbridge County groundwater resources report written by Frits van der Leeden, and several smaller datasets from consultants and citizens were evaluated and added to the database when confidence in locational accuracy was high or could be verified with aerial or LIDAR imagery. Significant contributions have been made throughout the years by VDEQ Groundwater Characterization staff site visits as well as other geologists working in the region including: Matt Heller at Virginia Division of Geology and Mineral Resources (VDMME), Wil Orndorff at the Virginia Department of Conservation and Recreation Karst Program (VDCR), and David Nelms and Dan Doctor of the U.S. Geological Survey (USGS). Substantial effort has been made to improve locational accuracy and remove duplication present between data sources. Hundreds of spring locations that were originally obtained using topographic maps or unknown methods were updated to sub-meter locational accuracy using post-processed differential GPS (PPGPS) and through the use of several generations of aerial imagery (2002-2017) obtained from Virginia's Geographic Information Network (VGIN) and 1-meter LIDAR, where available. Scores of new spring locations were also obtained by systematic quadrangle by quadrangle analysis in areas of the Shenandoah Valley where 1-meter LIDAR datasets where obtained from the U.S. Geological Survey. Future improvements to the dataset will result when statewide 1-meter LIDAR datasets becomes available and through continued field work by DEQ staff and other contributors working in the region. Please do not hesitate to contact the author to correct mistakes or to contribute to the database.

    The VDEQ Spring FIELD MEASUREMENTS database contains data describing field derived physio-chemical properties of spring discharges measured throughout the Commonwealth of Virginia. Field visits compiled in this dataset were performed from 1928 to 2019 by geologists with the State Water Control Board, the Virginia Division of Water and Power, the Virginia Department of Environmental Quality, and the U.S. Geological Survey with contributions from other sources as noted. Values of -9999 indicate that measurements were not performed for the referenced parameter. Please do not hesitate to contact the author to add data to the database or correct errors.


    The VDEQ_Spring_WQ database is a geodatabase containing groundwater sample information collected from springs throughout Virginia. Sample specific information include: location and site information, measured field parameters, and lab verified quantifications of major ionic concentrations, trace element concentrations, nutrient concentrations, and radiological data. The VDEQ_Spring_WQ database is a subset of the VDEQ GWCHEM database which is a flat-file geodatabase containing groundwater sample information from groundwater wells and springs throughout Virginia. Sample information has been correlated via DEQ Well # and projected using coordinates in VDEQ_Spring_SITES database. The GWCHEM database is comprised of historic groundwater sample data originally archived in the United States Geological Survey (USGS) National Water Information System (NWIS) and the Environmental Protection Agency (EPA) Storage and Retrieval (STORET) data warehouse. Archived STORET data originated as groundwater sample data collected and uploaded by Virginia State Water Control Board Personnel. While groundwater sample data in the STORET data warehouse are static, new groundwater sample data are periodically uploaded to NWIS and spring laboratory WQ data reflect NWIS downloaded on 9/30/2019. Recent groundwater sample data collected by Virginia Department of Environmental Quality (DEQ) personnel as part of the Ambient Groundwater Sampling Program are entered into the database as lab results are made available by the Division of Consolidated Laboratory Services (DCLS). When possible, charge balances were calculated for samples with reported values for major ions including (at a minimum) calcium, magnesium, potassium, sodium, bicarbonate, chloride, and sulfate. Reported values for Nitrate as N, carbonate, and fluoride were included in the charge balance calculation when available. Field determined values for bicarbonate and carbonate were used in the charge balance calculation when available. For much of the legacy DEQ groundwater sample data, bicarbonate values were derived from lab reported values of alkalinity (as mg/CaCO3) under the assumption that there was no contribution by carbonate to the reported alkalinity value. Charge balance values are reported in the "Charge Balance" column of the GWCHEM geodatabase. The closer the charge balance value is to unity (1), the lower the assumed charge balance error.In order to preserve the numerical capabilities of the database, non- numeric lab qualifiers were given the following numeric identifiers:- (minus sign) = less than the concentration specified to the right of the sign-11110 = estimated-22220 = presence verified but not quantified-33330 = radchem non-detect, below sslc-4440 = analyzed for but not detected-55550 = greater than the concentration to the right of the zero-66660 = sample held beyond normal holding time-77770 = quality control failure. Data not valid.-88880 = sample held beyond normal holding time. Sample analyzed for but not detected. Value stored is limit of detection for proces in use.-11120 = Value reported is less than the criteria of detection.-9999 = no data (parameter not quantified)

    A more in depth descprition and hydrogeologic analysis of the database can be found here
    An in Depth data fact sheet can be found here
  7. Adventure Works 2022 CSVs

    • kaggle.com
    zip
    Updated Nov 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
    Explore at:
    zip(567646 bytes)Available download formats
    Dataset updated
    Nov 2, 2022
    Authors
    Algorismus
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Adventure Works 2022 dataset

    How this Dataset is created?

    On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

    How this Dataset may help you?

    this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

    How to use this Dataset?

    Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.

  8. d

    NC SELDM simulation outputs processed (R scripts) [child item]: Application...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). NC SELDM simulation outputs processed (R scripts) [child item]: Application of the North Carolina Stochastic Empirical Loading and Dilution Model (SELDM) to Assess Potential Impacts of Highway Runoff [Dataset]. https://catalog.data.gov/dataset/nc-seldm-simulation-outputs-processed-r-scripts-child-item-application-of-the-north-caroli
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    North Carolina
    Description

    In 2013, the U.S. Geological Survey (USGS) in partnership with the U.S. Federal Highway Administration (FHWA) published a new national stormwater quality model called the Stochastic Empirical Loading Dilution Model (SELDM; Granato, 2013). The model is optimized for roadway projects but in theory can be applied to a broad range of development types. SELDM is a statistically-based empirical model pre-populated with much of the data required to successfully run the application (Granato, 2013). The model uses Monte Carlo methods (as opposed to deterministic methods) to generate a wide range of precipitation events and stormwater discharges coupled with water-quality constituent concentrations and loads from the upstream basin and highway site. SELDM is particularly useful for stormwater managers in its ability to provide the statistical probability of a water-quality standard exceedance that could occur downstream of a stormwater discharge location during the period of record simulated as part of a SELDM analysis. SELDM can be used to model a variety of Best Management Practices (BMPs), which allows the user to evaluate the subsequent instream water-quality benefit of different stormwater treatment devices. This functionality makes the model well suited for supporting BMP-specific cost/benefit analyses. In 2015, the North Carolina Department of Transportation (NCDOT) initiated a partnership with the USGS South Atlantic Water Science Center (Raleigh, North Carolina office) to enhance the national SELDM model with additional data specific to North Carolina (NC) to improve the model’s predictive performance across the State. Specific USGS data incorporated to enhance the NC SELDM model included selected North Carolina streamflow data as well as water-quality transport curves for selected constituents. SELDM streamflow statistics (based on data through the 2015 water year) were computed for 266 continuous-record streamgages and updated in the StreamStats database, which is accessible from the USGS StreamStats application for North Carolina (available online via https://streamstats.usgs.gov/ss/). Instantaneous streamflow data available at 30 selected continuous-record streamgages across North Carolina, with drainage areas ranging from 4.12 to 63.3 square miles, were used to develop site-specific recession ratio statistics. Water-quality data through the 2016 water year were used to develop water-quality transport curves for 27 streamgages for the following constituents: suspended sediment concentration, total nitrogen, total phosphorus, turbidity, copper, lead, and zinc. The NCDOT identified NC highway-runoff research reports containing water-quality and quantity data available from non-USGS sources. These data were reviewed by USGS and – where deemed acceptable – were uploaded into the FHWA Highway-Runoff Database, the data warehouse and preprocessor for SELDM (Granato and others, 2018; Granato and Cazenas, 2009; Smith and Granato, 2010). Based on the analysis techniques documented by Granato (2014) in a national BMP study and using available water-quality sample data from selected highway-runoff and BMP site pairs, performance data from the NC highway-runoff research reports were also analyzed and incorporated into the NC SELDM model for three BMP types. Results of analyses completed during development of the NC SELDM model are documented in Weaver and others (2019). In 2018, USGS and NCDOT initiated an additional “phase 2” study for the NC SELDM model to complete numerous model simulations to develop an NC_SELDM_Catalog (Microsoft Excel spreadsheet) of outputs for a wide range of highway catchment and upstream basin variables. A total of 74,880 SELDM simulations were completed across the Piedmont, Blue Ridge, and Coastal Plain regions (24,960 per region) in North Carolina. Within each region, the completed simulations represented 12,480 design scenarios (one each using the grass swale and bioretention BMP device for treatment of runoff). The overall purpose of the catalog is to provide a tool to NCDOT and others to use during the transportation design process to rapidly assess the potential level of BMP that may be needed for treatment of highway runoff.

  9. VDEQ Springs WQ

    • hub.arcgis.com
    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    • +2more
    Updated Aug 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    maddie.moore_VADEQ (2023). VDEQ Springs WQ [Dataset]. https://hub.arcgis.com/datasets/f3b910d2a65e4d2e93ff7b43ac5e542a
    Explore at:
    Dataset updated
    Aug 31, 2023
    Dataset provided by
    Virginia Department of Environmental Qualityhttps://deq.virginia.gov/
    Authors
    maddie.moore_VADEQ
    Area covered
    Description

    The VDEQ_Spring_WQ database is a geodatabase containing groundwater sample information collected from springs throughout Virginia. Sample specific information include: location and site information, measured field parameters, and lab verified quantifications of major ionic concentrations, trace element concentrations, nutrient concentrations, and radiological data. The VDEQ_Spring_WQ database is a subset of the VDEQ GWCHEM database which is a flat-file geodatabase containing groundwater sample information from groundwater wells and springs throughout Virginia. Sample information has been correlated via DEQ Well # and projected using coordinates in VDEQ_Spring_SITES database. The GWCHEM database is comprised of historic groundwater sample data originally archived in the United States Geological Survey (USGS) National Water Information System (NWIS) and the Environmental Protection Agency (EPA) Storage and Retrieval (STORET) data warehouse. Archived STORET data originated as groundwater sample data collected and uploaded by Virginia State Water Control Board Personnel. While groundwater sample data in the STORET data warehouse are static, new groundwater sample data are periodically uploaded to NWIS and spring laboratory WQ data reflect NWIS downloaded on 9/30/2019. Recent groundwater sample data collected by Virginia Department of Environmental Quality (DEQ) personnel as part of the Ambient Groundwater Sampling Program are entered into the database as lab results are made available by the Division of Consolidated Laboratory Services (DCLS). When possible, charge balances were calculated for samples with reported values for major ions including (at a minimum) calcium, magnesium, potassium, sodium, bicarbonate, chloride, and sulfate. Reported values for Nitrate as N, carbonate, and fluoride were included in the charge balance calculation when available. Field determined values for bicarbonate and carbonate were used in the charge balance calculation when available. For much of the legacy DEQ groundwater sample data, bicarbonate values were derived from lab reported values of alkalinity (as mg/CaCO3) under the assumption that there was no contribution by carbonate to the reported alkalinity value. Charge balance values are reported in the "Charge Balance" column of the GWCHEM geodatabase. The closer the charge balance value is to unity (1), the lower the assumed charge balance error.In order to preserve the numerical capabilities of the database, non- numeric lab qualifiers were given the following numeric identifiers:- (minus sign) = less than the concentration specified to the right of the sign-11110 = estimated-22220 = presence verified but not quantified-33330 = radchem non-detect, below sslc-4440 = analyzed for but not detected-55550 = greater than the concentration to the right of the zero-66660 = sample held beyond normal holding time-77770 = quality control failure. Data not valid.-88880 = sample held beyond normal holding time. Sample analyzed for but not detected. Value stored is limit of detection for proces in use.-11120 = Value reported is less than the criteria of detection.-9999 = no data (parameter not quantified)

  10. VDEQ Springs FIELD MEASUREMENTS

    • data.virginia.gov
    Updated Aug 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Virginia Department of Environmental Quality (2023). VDEQ Springs FIELD MEASUREMENTS [Dataset]. https://data.virginia.gov/dataset/vdeq-springs-field-measurements
    Explore at:
    zip, arcgis geoservices rest api, csv, geojson, html, gpkg, gdb, txt, xlsx, kmlAvailable download formats
    Dataset updated
    Aug 31, 2023
    Dataset authored and provided by
    Virginia Department of Environmental Qualityhttps://deq.virginia.gov/
    Description
    The VDEQ Spring SITES database contains data describing the geographic locations and site attributes of natural springs throughout the commonwealth. This data coverage continues to evolve and contains only spring locations known to exist with a reasonable degree of certainty on the date of publication. The dataset does not replace site specific inventorying or receptor surveys but can be used as a starting point. VDEQ's initial geospatial dataset of approximately 325 springs was formed in 2008 by digitizing historical spring information sheets created by State Water Control Board geologists in the 1970s through early 1990s. Additional data has been consolidated from the EPA STORET database, the U.S. Geological Survey's Ground Water Site Inventory (GWSI) and Geographic Names Inventory System (GNIS), the Virginia Department of Health SDWIS database, the Virginia DEQ Virginia Water Use Data Set (VWUDS), the Commonwealth of Virginia Division of Water Resources and Power Bulletin No. 1: "Springs of Virginia" by Collins et al., 1930 as well as several VDWR&P Surface Water Supply bulletins from the 1940's - 1950's. A 1992 Virginia Department of Game and Inland Fisheries / Virginia Tech sponsored study by Helfrich et al. titled "Evaluation of the Natural Springs of Virginia: Fisheries Management Implications", a 2004 Rockbridge County groundwater resources report written by Frits van der Leeden, and several smaller datasets from consultants and citizens were evaluated and added to the database when confidence in locational accuracy was high or could be verified with aerial or LIDAR imagery. Significant contributions have been made throughout the years by VDEQ Groundwater Characterization staff site visits as well as other geologists working in the region including: Matt Heller at Virginia Division of Geology and Mineral Resources (VDMME), Wil Orndorff at the Virginia Department of Conservation and Recreation Karst Program (VDCR), and David Nelms and Dan Doctor of the U.S. Geological Survey (USGS). Substantial effort has been made to improve locational accuracy and remove duplication present between data sources. Hundreds of spring locations that were originally obtained using topographic maps or unknown methods were updated to sub-meter locational accuracy using post-processed differential GPS (PPGPS) and through the use of several generations of aerial imagery (2002-2017) obtained from Virginia's Geographic Information Network (VGIN) and 1-meter LIDAR, where available. Scores of new spring locations were also obtained by systematic quadrangle by quadrangle analysis in areas of the Shenandoah Valley where 1-meter LIDAR datasets where obtained from the U.S. Geological Survey. Future improvements to the dataset will result when statewide 1-meter LIDAR datasets becomes available and through continued field work by DEQ staff and other contributors working in the region. Please do not hesitate to contact the author to correct mistakes or to contribute to the database.

    The VDEQ Spring FIELD MEASUREMENTS database contains data describing field derived physio-chemical properties of spring discharges measured throughout the Commonwealth of Virginia. Field visits compiled in this dataset were performed from 1928 to 2019 by geologists with the State Water Control Board, the Virginia Division of Water and Power, the Virginia Department of Environmental Quality, and the U.S. Geological Survey with contributions from other sources as noted. Values of -9999 indicate that measurements were not performed for the referenced parameter. Please do not hesitate to contact the author to add data to the database or correct errors.


    The VDEQ_Spring_WQ database is a geodatabase containing groundwater sample information collected from springs throughout Virginia. Sample specific information include: location and site information, measured field parameters, and lab verified quantifications of major ionic concentrations, trace element concentrations, nutrient concentrations, and radiological data. The VDEQ_Spring_WQ database is a subset of the VDEQ GWCHEM database which is a flat-file geodatabase containing groundwater sample information from groundwater wells and springs throughout Virginia. Sample information has been correlated via DEQ Well # and projected using coordinates in VDEQ_Spring_SITES database. The GWCHEM database is comprised of historic groundwater sample data originally archived in the United States Geological Survey (USGS) National Water Information System (NWIS) and the Environmental Protection Agency (EPA) Storage and Retrieval (STORET) data warehouse. Archived STORET data originated as groundwater sample data collected and uploaded by Virginia State Water Control Board Personnel. While groundwater sample data in the STORET data warehouse are static, new groundwater sample data are periodically uploaded to NWIS and spring laboratory WQ data reflect NWIS downloaded on 9/30/2019. Recent groundwater sample data collected by Virginia Department of Environmental Quality (DEQ) personnel as part of the Ambient Groundwater Sampling Program are entered into the database as lab results are made available by the Division of Consolidated Laboratory Services (DCLS). When possible, charge balances were calculated for samples with reported values for major ions including (at a minimum) calcium, magnesium, potassium, sodium, bicarbonate, chloride, and sulfate. Reported values for Nitrate as N, carbonate, and fluoride were included in the charge balance calculation when available. Field determined values for bicarbonate and carbonate were used in the charge balance calculation when available. For much of the legacy DEQ groundwater sample data, bicarbonate values were derived from lab reported values of alkalinity (as mg/CaCO3) under the assumption that there was no contribution by carbonate to the reported alkalinity value. Charge balance values are reported in the "Charge Balance" column of the GWCHEM geodatabase. The closer the charge balance value is to unity (1), the lower the assumed charge balance error.In order to preserve the numerical capabilities of the database, non- numeric lab qualifiers were given the following numeric identifiers:- (minus sign) = less than the concentration specified to the right of the sign-11110 = estimated-22220 = presence verified but not quantified-33330 = radchem non-detect, below sslc-4440 = analyzed for but not detected-55550 = greater than the concentration to the right of the zero-66660 = sample held beyond normal holding time-77770 = quality control failure. Data not valid.-88880 = sample held beyond normal holding time. Sample analyzed for but not detected. Value stored is limit of detection for proces in use.-11120 = Value reported is less than the criteria of detection.-9999 = no data (parameter not quantified)

    A more in depth descprition and hydrogeologic analysis of the database can be found here
    An in Depth data fact sheet can be found here
  11. AdventureWorks 2022 Denormalized

    • kaggle.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavesh J (2024). AdventureWorks 2022 Denormalized [Dataset]. https://www.kaggle.com/datasets/bjaising/adventureworks-2022-denormalized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhavesh J
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adventure Works 2022 Denormalized dataset

    How this Dataset is created?

    The CSV data was sourced from the existing Kaggle dataset titled "Adventure Works 2022" by Algorismus. This data was normalized and consisted of seven individual CSV files. The Sales table served as a fact table that connected to other dimensions. To consolidate all the data into a single table, it was loaded into a SQLite database and transformed accordingly. The final denormalized table was then exported as a single CSV file (delimited by | ), and the column names were updated to follow snake_case style.

    DOI

    doi.org/10.6084/m9.figshare.27899706

    Data Dictionary

    Column NameDescription
    sales_order_numberUnique identifier for each sales order.
    sales_order_dateThe date and time when the sales order was placed. (e.g., Friday, August 25, 2017)
    sales_order_date_day_of_weekThe day of the week when the sales order was placed (e.g., Monday, Tuesday).
    sales_order_date_monthThe month when the sales order was placed (e.g., January, February).
    sales_order_date_dayThe day of the month when the sales order was placed (1-31).
    sales_order_date_yearThe year when the sales order was placed (e.g., 2022).
    quantityThe number of units sold in the sales order.
    unit_priceThe price per unit of the product sold.
    total_salesThe total sales amount for the sales order (quantity * unit price).
    costThe total cost associated with the products sold in the sales order.
    product_keyUnique identifier for the product sold.
    product_nameThe name of the product sold.
    reseller_keyUnique identifier for the reseller.
    reseller_nameThe name of the reseller.
    reseller_business_typeThe type of business of the reseller (e.g., Warehouse, Value Reseller, Specialty Bike Shop).
    reseller_cityThe city where the reseller is located.
    reseller_stateThe state where the reseller is located.
    reseller_countryThe country where the reseller is located.
    employee_keyUnique identifier for the employee associated with the sales order.
    employee_idThe ID of the employee who processed the sales order.
    salesperson_fullnameThe full name of the salesperson associated with the sales order.
    salesperson_titleThe title of the salesperson (e.g., North American Sales Manager, Sales Representative).
    email_addressThe email address of the salesperson.
    sales_territory_keyUnique identifier for the sales territory for the actual sale. (e.g. 3)
    assigned_sales_territoryList of sales_territory_key separated by comma assigned to the salesperson. (e.g., 3,4)
    sales_territory_regionThe region of the sales territory. US territory broken down in regions. International regions listed as country name (e.g., Northeast, France).
    sales_territory_countryThe country associated with the sales territory.
    sales_territory_groupThe group classification of the sales territory. (e.g., Europe, North America, Pacific)
    targetThe ...
  12. G

    In-Database Machine Learning Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). In-Database Machine Learning Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/in-database-machine-learning-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    In-Database Machine Learning Market Outlook



    According to our latest research, the global in-database machine learning market size in 2024 stands at USD 2.74 billion, reflecting the sector’s rapid adoption across diverse industries. The market is expected to grow at a robust CAGR of 28.6% from 2025 to 2033, reaching a projected value of USD 24.19 billion by the end of the forecast period. This exceptional growth is primarily driven by the increasing demand for advanced analytics, real-time data processing, and the seamless integration of machine learning capabilities directly within database environments, which are essential for accelerating business insights and operational efficiency.




    The primary growth factor propelling the in-database machine learning market is the exponential surge in data volumes generated by enterprises worldwide. As organizations transition to digital-first operations, the need to analyze vast datasets in real time has become paramount. Traditional machine learning workflows, which require data extraction and movement to external environments, are increasingly seen as inefficient and prone to latency and security issues. In-database machine learning eliminates these bottlenecks by enabling algorithms to run directly within the database, thus reducing data movement, minimizing latency, and ensuring higher data security. This approach not only streamlines the analytics pipeline but also empowers businesses to derive actionable insights faster, supporting critical functions such as fraud detection, predictive maintenance, and customer personalization.




    Another significant factor fueling market expansion is the growing adoption of cloud-based data platforms and the proliferation of hybrid IT infrastructures. Enterprises are leveraging cloud-native databases and data warehouses to centralize and scale their analytics capabilities. In-database machine learning solutions are designed to seamlessly integrate with these modern architectures, allowing organizations to harness the power of machine learning without the need for extensive data migration or IT overhead. This integration facilitates agile development, lowers total cost of ownership, and enables organizations to respond swiftly to market changes. Furthermore, the rise of open-source machine learning frameworks and APIs has democratized access to advanced analytics, making it easier for businesses of all sizes to implement and benefit from in-database ML capabilities.




    A third pivotal growth driver is the increasing emphasis on regulatory compliance, data privacy, and security in highly regulated industries such as BFSI and healthcare. In-database machine learning offers a compelling solution by keeping sensitive data within secure database environments, thereby reducing the risk of data breaches and ensuring compliance with stringent data protection regulations such as GDPR and HIPAA. This capability is particularly valuable for organizations operating in regions with complex regulatory landscapes, where data residency and sovereignty are critical concerns. As a result, the adoption of in-database ML is accelerating among enterprises that prioritize security, governance, and auditability in their analytics workflows.




    From a regional perspective, North America continues to dominate the in-database machine learning market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The presence of leading technology vendors, early adoption of advanced analytics, and a mature digital infrastructure contribute to North America’s leadership. However, rapid economic development, digitization initiatives, and expanding IT ecosystems in Asia Pacific are positioning the region as a significant growth engine for the forecast period. Meanwhile, Europe’s focus on data privacy and innovation is driving substantial investments in secure and compliant in-database ML solutions, further fueling market growth across the continent.





    Component Analysis



    The in-database machine learning mark

  13. VDEQ Springs WQ

    • data.virginia.gov
    Updated Aug 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Virginia Department of Environmental Quality (2023). VDEQ Springs WQ [Dataset]. https://data.virginia.gov/dataset/vdeq-springs-wq
    Explore at:
    arcgis geoservices rest api, html, kml, csv, zip, gpkg, gdb, xlsx, geojson, txtAvailable download formats
    Dataset updated
    Aug 31, 2023
    Dataset authored and provided by
    Virginia Department of Environmental Qualityhttps://deq.virginia.gov/
    Description
    The VDEQ Spring SITES database contains data describing the geographic locations and site attributes of natural springs throughout the commonwealth. This data coverage continues to evolve and contains only spring locations known to exist with a reasonable degree of certainty on the date of publication. The dataset does not replace site specific inventorying or receptor surveys but can be used as a starting point. VDEQ's initial geospatial dataset of approximately 325 springs was formed in 2008 by digitizing historical spring information sheets created by State Water Control Board geologists in the 1970s through early 1990s. Additional data has been consolidated from the EPA STORET database, the U.S. Geological Survey's Ground Water Site Inventory (GWSI) and Geographic Names Inventory System (GNIS), the Virginia Department of Health SDWIS database, the Virginia DEQ Virginia Water Use Data Set (VWUDS), the Commonwealth of Virginia Division of Water Resources and Power Bulletin No. 1: "Springs of Virginia" by Collins et al., 1930 as well as several VDWR&P Surface Water Supply bulletins from the 1940's - 1950's. A 1992 Virginia Department of Game and Inland Fisheries / Virginia Tech sponsored study by Helfrich et al. titled "Evaluation of the Natural Springs of Virginia: Fisheries Management Implications", a 2004 Rockbridge County groundwater resources report written by Frits van der Leeden, and several smaller datasets from consultants and citizens were evaluated and added to the database when confidence in locational accuracy was high or could be verified with aerial or LIDAR imagery. Significant contributions have been made throughout the years by VDEQ Groundwater Characterization staff site visits as well as other geologists working in the region including: Matt Heller at Virginia Division of Geology and Mineral Resources (VDMME), Wil Orndorff at the Virginia Department of Conservation and Recreation Karst Program (VDCR), and David Nelms and Dan Doctor of the U.S. Geological Survey (USGS). Substantial effort has been made to improve locational accuracy and remove duplication present between data sources. Hundreds of spring locations that were originally obtained using topographic maps or unknown methods were updated to sub-meter locational accuracy using post-processed differential GPS (PPGPS) and through the use of several generations of aerial imagery (2002-2017) obtained from Virginia's Geographic Information Network (VGIN) and 1-meter LIDAR, where available. Scores of new spring locations were also obtained by systematic quadrangle by quadrangle analysis in areas of the Shenandoah Valley where 1-meter LIDAR datasets where obtained from the U.S. Geological Survey. Future improvements to the dataset will result when statewide 1-meter LIDAR datasets becomes available and through continued field work by DEQ staff and other contributors working in the region. Please do not hesitate to contact the author to correct mistakes or to contribute to the database.

    The VDEQ Spring FIELD MEASUREMENTS database contains data describing field derived physio-chemical properties of spring discharges measured throughout the Commonwealth of Virginia. Field visits compiled in this dataset were performed from 1928 to 2019 by geologists with the State Water Control Board, the Virginia Division of Water and Power, the Virginia Department of Environmental Quality, and the U.S. Geological Survey with contributions from other sources as noted. Values of -9999 indicate that measurements were not performed for the referenced parameter. Please do not hesitate to contact the author to add data to the database or correct errors.


    The VDEQ_Spring_WQ database is a geodatabase containing groundwater sample information collected from springs throughout Virginia. Sample specific information include: location and site information, measured field parameters, and lab verified quantifications of major ionic concentrations, trace element concentrations, nutrient concentrations, and radiological data. The VDEQ_Spring_WQ database is a subset of the VDEQ GWCHEM database which is a flat-file geodatabase containing groundwater sample information from groundwater wells and springs throughout Virginia. Sample information has been correlated via DEQ Well # and projected using coordinates in VDEQ_Spring_SITES database. The GWCHEM database is comprised of historic groundwater sample data originally archived in the United States Geological Survey (USGS) National Water Information System (NWIS) and the Environmental Protection Agency (EPA) Storage and Retrieval (STORET) data warehouse. Archived STORET data originated as groundwater sample data collected and uploaded by Virginia State Water Control Board Personnel. While groundwater sample data in the STORET data warehouse are static, new groundwater sample data are periodically uploaded to NWIS and spring laboratory WQ data reflect NWIS downloaded on 9/30/2019. Recent groundwater sample data collected by Virginia Department of Environmental Quality (DEQ) personnel as part of the Ambient Groundwater Sampling Program are entered into the database as lab results are made available by the Division of Consolidated Laboratory Services (DCLS). When possible, charge balances were calculated for samples with reported values for major ions including (at a minimum) calcium, magnesium, potassium, sodium, bicarbonate, chloride, and sulfate. Reported values for Nitrate as N, carbonate, and fluoride were included in the charge balance calculation when available. Field determined values for bicarbonate and carbonate were used in the charge balance calculation when available. For much of the legacy DEQ groundwater sample data, bicarbonate values were derived from lab reported values of alkalinity (as mg/CaCO3) under the assumption that there was no contribution by carbonate to the reported alkalinity value. Charge balance values are reported in the "Charge Balance" column of the GWCHEM geodatabase. The closer the charge balance value is to unity (1), the lower the assumed charge balance error.In order to preserve the numerical capabilities of the database, non- numeric lab qualifiers were given the following numeric identifiers:- (minus sign) = less than the concentration specified to the right of the sign-11110 = estimated-22220 = presence verified but not quantified-33330 = radchem non-detect, below sslc-4440 = analyzed for but not detected-55550 = greater than the concentration to the right of the zero-66660 = sample held beyond normal holding time-77770 = quality control failure. Data not valid.-88880 = sample held beyond normal holding time. Sample analyzed for but not detected. Value stored is limit of detection for proces in use.-11120 = Value reported is less than the criteria of detection.-9999 = no data (parameter not quantified)

    A more in depth descprition and hydrogeologic analysis of the database can be found here
    An in Depth data fact sheet can be found here
  14. f

    Data_Sheet_1_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Oct 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unni, Deepak R.; Andorf, Carson M.; Shamimuzzaman,; Nguyen, Hung N.; Gardiner, Jack M.; Le Tourneau, Justin J.; Portwood, John L.; Cannon, Ethalinda K. S.; Triant, Deborah A.; Tayal, Aditi; Walsh, Amy T.; Elsik, Christine G. (2020). Data_Sheet_1_MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484613
    Explore at:
    Dataset updated
    Oct 22, 2020
    Authors
    Unni, Deepak R.; Andorf, Carson M.; Shamimuzzaman,; Nguyen, Hung N.; Gardiner, Jack M.; Le Tourneau, Justin J.; Portwood, John L.; Cannon, Ethalinda K. S.; Triant, Deborah A.; Tayal, Aditi; Walsh, Amy T.; Elsik, Christine G.
    Description

    MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.

  15. TTB COLAs Demo

    • kaggle.com
    zip
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jay Sobel (2024). TTB COLAs Demo [Dataset]. https://www.kaggle.com/datasets/colacloud/ttb-colas-demo
    Explore at:
    zip(93934563 bytes)Available download formats
    Dataset updated
    Jun 23, 2024
    Authors
    Jay Sobel
    Description

    The US TTB COLA Registry

    The United States regulates alcohol product labeling through an application process with the Alcohol and Tobacco Tax and Trade Bureau (TTB).

    Manufactures submit their prospective product labels and supporting documents to the TTB to receive Certificate of Label Approval (COLA).

    Application forms and label imagery are made publicly available in the TTB's Public COLA Registry. The registry contains over 2M applications dating back to the 1990s, and adds around 3,000 new application approvals every week.

    This database represents the largest public dataset of alcohol product information in the United States.

    Data Shape

    Each COLA represents an application for regulatory approval. An application can contain multiple label images; for example the front, back, and neck. Label images can contain multiple barcodes (and/or QR codes). The data model is as follows:

    • colas
    • cola_images
    • cola_image_barcodes

    A cola has multiple cola_images related via the ttb_id. A cola_image has multiple cola_image_barcodes related via the ttb_image_id.

    External Documentation

    This Google Sheet contains column-level descriptors of the dataset.

    https://docs.google.com/spreadsheets/d/1H4nBdpqaN3f0_1In6wJnb-Bc6-4pw2MaId_2sn7LnKs/edit

    This Sample

    This dataset contains records approved or surrendered in 2018. The full dataset contains records from the mid-1990s through the present day.

    This free sample is also available as a listing on the Snowflake Data Marketplace.

    The full dataset offering is available by request. The full product also contains a column of raw text for each image which was too large to upload here.

    Scraped and Enriched

    COLA Cloud is a service operated by the author of this sample dataset. COLA Cloud scrapes, parses, and transforms public COLA records into an analytics-ready, cloud-native database, ready to load straight into your data warehouse. Processing includes image-barcode extraction, image-text extraction (full text is excluded in this sample), image-text feature extraction (ocr_abv and ocr_volume are included here). Image-text is extracted with Google's Cloud Vision API; a $6,000 value over the full set of 4M images.

    Full-resolution imagery is stored in AWS S3, keyed into the data model, and can be made accessible by request.

    More details about the full product: https://colacloud.us

    Snowflake Data Marketplace listing of this demo: https://app.snowflake.com/marketplace/listing/GZT1ZVOIUH/cola-cloud-us-ttb-cola-registry-alcohol-product-catalog-demo

  16. n

    DBD - Slim Gene Ontology

    • neuinfo.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). DBD - Slim Gene Ontology [Dataset]. http://identifiers.org/RRID:SCR_005728
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Db for Dummies! is a small database that imports the Generic GO Slim. It allows data to be viewed in a tree. The Gene Ontology describes gene products in terms of their associated biological processes, cellular components and molecular functions. The Generic Slim Gene Ontology is a subset of the whole Gene Ontology. The slim version gives a broad overview and leaves out specific/fine grained terms. This example stores the slim version of the Gene Ontology (goslim_generic_obo) that can be downloaded from www.geneontology.org/GO.slims.shtml. Platform: Windows compatible

  17. Cleaned Contoso Dataset

    • kaggle.com
    zip
    Updated Aug 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanu (2023). Cleaned Contoso Dataset [Dataset]. https://www.kaggle.com/datasets/bhanuthakurr/cleaned-contoso-dataset
    Explore at:
    zip(487695063 bytes)Available download formats
    Dataset updated
    Aug 27, 2023
    Authors
    Bhanu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data was imported from the BAK file found here into SQL Server, and then individual tables were exported as CSV. Jupyter Notebook containing the code used to clean the data can be found here

    Version 6 has a some more cleaning and structuring that was noticed after importing in Power BI. Changes were made by adding code in python notebook to export new cleaned dataset, such as adding MonthNumber for sorting by month number, similar for WeekDayNumber.

    Cleaning was done in python while also using SQL Server to quickly find things. Headers were added separately, ensuring no data loss.Data was cleaned for NaN, garbage values and other columns.

  18. biochem4j: Integrated and extensible biochemical knowledge through graph...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neil Swainston; Riza Batista-Navarro; Pablo Carbonell; Paul D. Dobson; Mark Dunstan; Adrian J. Jervis; Maria Vinaixa; Alan R. Williams; Sophia Ananiadou; Jean-Loup Faulon; Pedro Mendes; Douglas B. Kell; Nigel S. Scrutton; Rainer Breitling (2023). biochem4j: Integrated and extensible biochemical knowledge through graph databases [Dataset]. http://doi.org/10.1371/journal.pone.0179130
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Neil Swainston; Riza Batista-Navarro; Pablo Carbonell; Paul D. Dobson; Mark Dunstan; Adrian J. Jervis; Maria Vinaixa; Alan R. Williams; Sophia Ananiadou; Jean-Loup Faulon; Pedro Mendes; Douglas B. Kell; Nigel S. Scrutton; Rainer Breitling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and–crucially–the relationships between them. Such a resource should be extensible, such that newly discovered relationships–for example, those between novel, synthetic enzymes and non-natural products–can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists.

  19. Business Intelligence (BI) And Analytics Platforms Market Analysis, Size,...

    • technavio.com
    pdf
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Business Intelligence (BI) And Analytics Platforms Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, and UK), APAC (China, India, Japan, and South Korea), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/business-intelligence-and-analytics-platforms-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    United States
    Description

    Snapshot img

    Business Intelligence (BI) And Analytics Platforms Market Size 2025-2029

    The business intelligence (BI) and analytics platforms market size is forecast to increase by USD 20.67 billion at a CAGR of 8.4% between 2024 and 2029.

    The market is experiencing significant growth, driven by the increasing need to enhance business efficiency and productivity. This trend is particularly prominent in industries undergoing digital transformation, seeking to gain a competitive edge through data-driven insights. Furthermore, the burgeoning medical tourism industry worldwide presents a lucrative opportunity for BI and analytics platforms, as healthcare providers and insurers look to optimize patient care and manage costs. However, this market faces challenges as well.
    The BI and analytics platforms market is characterized by its potential to revolutionize business operations and improve decision-making, while also presenting challenges related to data security and privacy. Companies looking to capitalize on this market's opportunities must prioritize both innovation and robust security measures to meet the evolving needs of their clients. Ensuring data confidentiality and compliance with evolving regulations is crucial for companies to maintain trust with their clients and mitigate potential risks.
    

    What will be the Size of the Business Intelligence (BI) And Analytics Platforms Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    In the dynamic market, data integration tools play a crucial role in seamlessly merging data from various sources. Statistical modeling and machine learning algorithms are employed for deriving insights from this integrated data. Data security tools ensure the protection of sensitive information, while decision automation streamlines processes based on data-driven insights. Data discovery tools enable users to explore and understand complex data sets, and deep learning frameworks facilitate advanced analytics capabilities. Semantic search and knowledge graphs enhance data accessibility, and dashboarding tools provide real-time insights through interactive visualizations. Metadata management tools and data cataloging help manage vast amounts of data, while data virtualization tools offer a unified view of data from multiple sources.
    Graph databases and federated analytics enable advanced data querying and analysis. AI-driven insights and augmented analytics offer more accurate predictions through predictive modeling and what-if analysis. Scenario planning and geospatial analytics provide valuable insights for strategic decision-making. Cloud data warehouses and streaming analytics facilitate real-time data ingestion and processing, and database administration tools ensure data quality and consistency. Edge analytics and cognitive analytics offer decentralized data processing and advanced contextual understanding, respectively. Data transformation techniques and location intelligence add value to raw data, making it more actionable for businesses. A data governance framework ensures data compliance and trustworthiness, while explainable AI (XAI) and automated reporting provide transparency and ease of use.
    

    How is this Business Intelligence (BI) and Analytics Platforms Industry segmented?

    The business intelligence (BI) and analytics platforms industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    End-user
    
      BFSI
      Healthcare
      ICT
      Government
      Others
    
    
    Deployment
    
      On-premises
      Cloud
    
    
    Business Segment
    
      Large enterprises
      SMEs
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        UK
    
    
      APAC
    
        China
        India
        Japan
        South Korea
    
    
      Rest of World (ROW)
    

    By End-user Insights

    The BFSI segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth in the BFSI sector due to the complete digitization of core business processes and the adoption of customer-centric business models. With the emergence of new financial technologies such as cashless banking, phone banking, and e-wallets, an extensive amount of digital data is generated every day. Analyzing this data provides valuable insights into system performance, customer behavior and expectations, demographic trends, and future growth areas. Business intelligence dashboards, in-memory analytics, anomaly detection, decision support systems, and KPI dashboards are essential tools used in the BFSI sector for data analysis. ETL processes, data governance, mobile BI, and forecast accuracy are other critical components of BI and analytics

  20. Model Car - Mint Classics

    • kaggle.com
    zip
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaston Saracusti (2024). Model Car - Mint Classics [Dataset]. https://www.kaggle.com/datasets/gastonsaracusti/model-car-mint-classics
    Explore at:
    zip(26650 bytes)Available download formats
    Dataset updated
    Apr 29, 2024
    Authors
    Gaston Saracusti
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Mint Classics Company, a retailer of classic model cars and other vehicles, is looking at closing one of their storage facilities.

    To support a data-based business decision, they are looking for suggestions and recommendations for reorganizing or reducing inventory, while still maintaining timely service to their customers. For example, they would like to be able to ship a product to a customer within 24 hours of the order being placed.

    As a data analyst, you have been asked to use MySQL Workbench to familiarize yourself with the general business by examining the current data. You will be provided with a data model and sample data tables to review. You will then need to isolate and identify those parts of the data that could be useful in deciding how to reduce inventory. You will write queries to answer questions like these:

    1) Where are items stored and if they were rearranged, could a warehouse be eliminated?

    2) How are inventory numbers related to sales figures? Do the inventory counts seem appropriate for each item?

    3) Are we storing items that are not moving? Are any items candidates for being dropped from the product line?

    The answers to questions like those should help you to formulate suggestions and recommendations for reducing inventory with the goal of closing one of the storage facilities.

    Project Objectives

    1. Explore products currently in inventory.

    2. Determine important factors that may influence inventory reorganization/reduction.

    3. Provide analytic insights and data-driven recommendations.

    Your Challenge

    Your challenge will be to conduct an exploratory data analysis to investigate if there are any patterns or themes that may influence the reduction or reorganization of inventory in the Mint Classics storage facilities. To do this, you will import the database and then analyze data. You will also pose questions, and seek to answer them meaningfully using SQL queries to retrieve data from the database provided.

    In this project, we'll use the fictional Mint Classics relational database and a relational data model. Both will be provided.

    After you perform your analysis, you will share your findings.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Keaton Hibshman (2023). Summer Camp Warehouse and Database [Dataset]. https://www.kaggle.com/datasets/keatonhibshman/summer-camp-warehouse-and-database
Organization logo

Summer Camp Warehouse and Database

A Database and Data Warehouse for a Mock Summer Camp

Explore at:
zip(453037 bytes)Available download formats
Dataset updated
Jul 25, 2023
Authors
Keaton Hibshman
Description

The following are documents that were used to build a mock database and data warehouse and sample analysis on the data warehouse. The mock company is a summer camp agency. The software that was used for this project was SQL, Excel, Visual Studio, and Power BI.

Search
Clear search
Close search
Google apps
Main menu