63 datasets found

:: United States Patent and Trademark Office :: Patent Application No....
plainsite.org
Updated Sep 28, 1999
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Patent and Trademark Office (1999). :: United States Patent and Trademark Office :: Patent Application No. 09407650, Examiner Cheryl Renea Lewis presiding [Dataset]. https://www.plainsite.org/courts/united-states-patent-and-trademark-office/database-clean-up-system/1r9d0jvv8/
Explore at:
Dataset updated
Sep 28, 1999
Dataset provided by
PlainSitehttps://www.plainsite.org/
Authors
United States Patent and Trademark Office
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
United States Patent and Trademark Office
Description
USPTO patent application no. 09407650 in the United States Patent and Trademark Office
UST Cleanup Fund Potential Sites
gis.data.ca.gov
arc-gis-hub-home-arcgishub.hub.arcgis.com
+2more
Updated Apr 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Water Boards (2023). UST Cleanup Fund Potential Sites [Dataset]. https://gis.data.ca.gov/maps/261ed794afce4f70b5c587e3ac2c94c5
Explore at:
Dataset updated
Apr 24, 2023
Dataset provided by
California State Water Resources Control Board
Authors
California Water Boards
Area covered

Description
Below is an explanation of the data along with some features that are available on this map (description is also provided in the "Getting Started" widget of the application).A variety of different colored circles appear throughout the map. They represent sites that are associated with the following programs:1) Department of Toxic Substances Control (DTSC) sites:a) Historical Inactive - Identifies sites from an older database that are non-active sites where, through a Preliminary Endangerment Assessment (PEA) or other evaluation, DTSC has determined that a removal or remedial action or further extensive investigation is required.b) School Cleanup - Identifies proposed and existing school sites that are being evaluated by DTSC for possible hazardous materials contamination. School sites are further defined as “Cleanup”, where remedial actions are or have occurred.c) School Evaluation - Identifies proposed and existing school sites that are being evaluated by DTSC for possible hazardous materials contamination. School sites are further defined as “Evaluation”, where further investigation is needed.d) Corrective Action - Investigation or cleanup activities at Resource Conservation and Recovery Act (RCRA) or state-only hazardous waste facilities (that were required to obtain a permit or have received a hazardous waste facility permit from DTSC or U.S. EPA).e) State Response - Identifies confirmed release sites where DTSC is involved in remediation, either in a lead or oversight capacity. These confirmed release sites are generally high-priority and high potential risk.f) Evaluation - Identifies suspected, but unconfirmed, contaminated sites that need or have gone through a limited investigation and assessment process.g) Tiered Permit - A corrective action cleanup project on a hazardous waste facility that either was eligible to treat or permitted to treat waste under the Tiered Permitting system.2) State Water Board or DTSC sites:a) Leaking Underground Storage Tank (LUST) Cleanup - Includes all Underground Storage Tank (UST) sites that have had an unauthorized release (i.e. leak or spill) of a hazardous substance, usually fuel hydrocarbons, and are being (or have been) cleaned up. These sites are regulated under the State Water Board's UST Cleanup Program and/or similar programs conducted by each of the nine Regional Water Boards or Local Oversight Programs.b) Cleanup Program - Includes all "non-federally owned" sites that are regulated under the State Water Board's Site Cleanup Program and/or similar programs conducted by each of the nine Regional Water Boards. Cleanup Program Sites are also commonly referred to as "Site Cleanup Program sites".c) Voluntary Cleanup - Identifies sites with either confirmed or unconfirmed releases, and the project proponents have requested that the State Water Board or DTSC oversee evaluation, investigation, and/or cleanup activities and have agreed to provide coverage for the lead agency’s costs.3) Othera) Permitted Tanks - The "Permitted Tanks" data set includes Facilities that are associated with permitted underground storage tanks from the California Environmental Reporting System (CERS) database. The CERS data consists of current and recently closed permitted underground storage tank (UST) facilities information provided to CERS by Certified Unified Program Agencies (CUPAs).*Note: Underground Storage Tank Cleanup and Cleanup Program project records are pulled from the State Water Board's GeoTracker database. The Permitted Tanks information was obtained from California EPA’s California Environmental Reporting System (CERS) database. All other project records were obtained from DTSC's EnviroStor database. Program descriptions come from DTSC’s EnviroStor Glossary of Terms and the State Water Board’s GeoTracker Site/Facility Type Definitions. The information associated with these records was last updated in the application on 4/24/2023.
NPL-PAD (National Priorities List Publication Assistance Database) for...
catalog.data.gov
data.amerigeoss.org
Updated Jan 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency, Region 7 (2024). NPL-PAD (National Priorities List Publication Assistance Database) for Region 7 [Dataset]. https://catalog.data.gov/dataset/npl-pad-national-priorities-list-publication-assistance-database-for-region-7
Explore at:
Dataset updated
Jan 21, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
THIS DATA ASSET NO LONGER ACTIVE: This is metadata documentation for the National Priorities List (NPL) Publication Assistance Databsae (PAD), a Lotus Notes application that holds Region 7's universe of NPL site information such as site description, threats and contaminants, cleanup approach, environmental process, community involvement, site repository, and regional contacts. This database used to be updated annually, at different times for different NPLs, but it is currently no longer being used. This work fell under objectives for EPA's 2003-2008 Strategic Plan (Goal 3) for Land Preservation & Restoration, which are to clean up and reuse contaminated land.
DEP Cleanup Sites
geodata.dep.state.fl.us
mapdirect-fdep.opendata.arcgis.com
+4more
Updated Oct 10, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florida Department of Environmental Protection (2009). DEP Cleanup Sites [Dataset]. https://geodata.dep.state.fl.us/datasets/dep-cleanup-sites
Explore at:
Dataset updated
Oct 10, 2009
Dataset authored and provided by
Florida Department of Environmental Protectionhttp://www.floridadep.gov/
Area covered

Description
*The data for this dataset is updated daily. The date(s) displayed in the details section on our Open Data Portal is based on the last date the metadata was updated and not the refresh date of the data itself.*The Cleanup Sites layer provides locations and document links for sites currently in the cleanup process and sites awaiting cleanup funding. Cleanup programs include: Brownfields, Petroleum, EPA Superfund (CERCLA), Drycleaning, Responsible Party Cleanup, State Funded Cleanup, State Owned Lands Cleanup and Hazardous Waste Cleanup.Please reference the metadata for contact information.
Dirty Dataset to practice Data Cleaning
kaggle.com
zip
Updated Nov 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrutha yenikonda (2023). Dirty Dataset to practice Data Cleaning [Dataset]. https://www.kaggle.com/datasets/amruthayenikonda/dirty-dataset-to-practice-data-cleaning
Explore at:
zip(1241 bytes)Available download formats
Dataset updated
Nov 3, 2023
Authors
Amrutha yenikonda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset has been obtained by web scraping a Wikipedia page for which code is linked below: https://www.kaggle.com/amruthayenikonda/simple-web-scraping-using-pandas

This dataset can be used to practice data cleaning and manipulation for example dropping of unwanted columns, null vales, removing symbols etc
d
CLEAR
catalog.data.gov
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomson Reuters West Publishing (2022). CLEAR [Dataset]. https://catalog.data.gov/dataset/clear
Explore at:
Dataset updated
Oct 14, 2022
Dataset provided by
Thomson Reuters West Publishing
Description
CLEAR has public record information and is also used for law enforcement and investigations, including personal identification and financial records, police reports, and credential verification services.
d
List of Contaminated or Potentially Contaminated Sites - Remediation...
catalog.data.gov
data.ct.gov
Updated Jan 13, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2026). List of Contaminated or Potentially Contaminated Sites - Remediation Division [Dataset]. https://catalog.data.gov/dataset/list-of-contaminated-or-potentially-contaminated-sites-remediation-division
Explore at:
Dataset updated
Jan 13, 2026
Dataset provided by
data.ct.gov
Description
Data to create the List of Contaminated or Potentially Contaminated Sites - Remediation Division is from historical program information or from new program applications and filings. More information regarding the generation of this list can be found at: https://portal.ct.gov/DEEP/Remediation--Site-Clean-Up/List-of-Contaminated-or-Potentially-Contaminated-Sites-in-Connecticut A seperate dataset is published for: List of Contaminated Sites or Potentially Contaminated - SASU Case Management System and provide a list of Leaking Underground Storage Tank Sites. The two database systems are maintained by different Divisions within the agency. There may be sites in both databases due to an overlap in responsibilities of the two Divisions. https://data.ct.gov/Environment-and-Natural-Resources/List-of-Contaminated-or-Potentially-Contaminated-S/77ya-7twa The data is updated when documents are received for responsible parties conducting site remediation. For more information regarding the individual remedial programs visit: https://portal.ct.gov/DEEP/Remediation--Site-Clean-Up/Remediation-Site-Clean-Up Those seeking additional information about information contained in this dataset may use the DEEP FOIA Process: https://portal.ct.gov/DEEP/About/FOIA-Requests Each Row represents a Remediation project (Property Transfer, Brownfield, Enforcement, Federal Remediation, State Remediation, Landfill Monitoring, RCRA Corrective Action, and Voluntary). Data to compile the list was gathered for each site from information provided to DEEP for requirements within each program. Sites may be in multiple Remediation programs and therefore may be listed more than once. Some sites have been fully cleaned up while others have limited information about the environmental conditions. The list includes only sites that been reported to DEEP or EPA. Additional information for site within the Hazard Notification program can be found at: https://portal.ct.gov/DEEP/Remediation--Site-Clean-Up/Significant-Environmental-Hazard-Program/List-of-Significant-Environmental-Hazards Significant Environmental Hazard Sites GIS Map: https://experience.arcgis.com/experience/9c100aa21fbe4ee180df9942d000f676 Details on columns which reference ELUR: Environmental Land Use Restriction (ELUR) or Notice and Use Limitation (NAUL) are used to minimize the risk of human exposure to pollutants and hazards to the environment by preventing specific uses or activities at a property or a portion of a property. Link to GIS map of ELUR and restriction type: https://ctdeep.maps.arcgis.com/apps/webappviewer/index.html?id=d37eccb2a5c3491d8f0d389a96d9a912 There may be errors in the data although we strive to minimize them. Examples of errors may include: misspelled or incomplete addresses and/or missing data.
d
Alaska Geochemical Database Version 4.0 (AGDB4) including best value data...
catalog.data.gov
data.usgs.gov
Updated Jan 21, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2026). Alaska Geochemical Database Version 4.0 (AGDB4) including best value data compilations for rock, sediment, soil, mineral, and concentrate sample media [Dataset]. https://catalog.data.gov/dataset/alaska-geochemical-database-version-4-0-agdb4-including-best-value-data-compilations-for-r-ffc2a
Explore at:
Dataset updated
Jan 21, 2026
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The Alaska Geochemical Database Version 4.0 (AGDB4) contains geochemical data compilations in which each geologic material sample has one best value determination for each analyzed species, greatly improving efficiency of use. The relational database includes historical geochemical data archived in the USGS National Geochemical Database (NGDB), the Atomic Energy Commission National Uranium Resource Evaluation (NURE) Hydrogeochemical and Stream Sediment Reconnaissance databases, and the Alaska Division of Geological and Geophysical Surveys (DGGS) Geochemistry database. Data from the U.S. Bureau of Mines and the U.S. Bureau of Land Management are included as well. The data tables describe historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 120 laboratory and field analytical methods performed on 416,333 rock, sediment, soil, mineral, heavy-mineral concentrate, and oxalic acid leachate samples. The samples were collected as part of various agency programs and projects from 1938 through 2021. Most samples were collected by agency personnel and analyzed in agency laboratories or under contracts in commercial analytical laboratories. Mineralogical data from 18,138 nonmagnetic heavy-mineral concentrate samples are also included in this database. The data in the AGDB4 supersede data in the AGDB, AGDB2, and AGDB3 databases but the background about the data in these earlier versions is needed to understand what has been done to amend, clean up, correct, and format these data. Data that were not included in previous versions because they predate the earliest agency geochemical databases or were excluded for programmatic reasons are included here in the AGDB4. The AGDB4 data are the most accurate and complete to date and should be useful for a wide variety of geochemical studies. They are provided as a Microsoft Access database, as comma-separated values (CSV), and as an Esri geodatabase consisting of point feature classes and related tables.
Z
NanoClass-compatible BOLD CO1 databases
data.niaid.nih.gov
Updated Dec 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evelien Jongepier (2021). NanoClass-compatible BOLD CO1 databases [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5751456
Explore at:
Dataset updated
Dec 3, 2021
Dataset provided by
University of Amsterdam
Authors
Evelien Jongepier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BOLD CO1 databases reformatted to use in NanoClass (https://github.com/ejongepier/NanoClass; version 0.3.0-beta or higher) and QIIME2. Three separate databases are included for use in combination with primers mtD, LCO-HCO and CI. Databases include reference sequences and reference taxonomies for the use in NanoClass, as well as pre-trained classifiers for use in QIIME2. See usage instructions below.

For questions, please contact e.jongepier@uva.nl.

==========================================

WARNING

Please note this version of a custom BOLD CO1 db comes with absolutely no warranties.

When using this db in NanoClass, mind that it has only been tested with methods: ["megablast","minimap","spingo"] NanoClass cannot be run in combination with these BOLD CO1 databases using methods ["mothur","centrifuge","kraken"]. Compatibility with ["blast","dcmegablast","qiime","rdp"] is untested. Just remove the tools you want to skip from the NanoClass/config.yaml (see also the NanoClass documentation here: https://ejongepier.github.io/NanoClass/)

Never use this data base in combination with the NanoClass snakemake -F parameter or this BOLD CO1 database will be overwriten by the default 16S SILVA database.

==========================================

DESCRIPTION

BOLD CO1 database (last) downloaded on 20210420 and reformatted for use in QIIME2 and NanoClass. To clean-up BOLD CO1 db these steps were taken (step 7 to 11 were repeated for each of the 3 primers): - remove identical duplicates [3597874] - drop seqs with non-IUPAC characters [3597839] - remove leading and trailing ambiguous bases [3597839] - remove low quality reads - remove reads with homopolymer runs - filter by length - extract fragments between primer sequences [mtD:112450; CI:121391; LCO-HCO:65307] - dereplicate / cluster [mtD:55075; CI:46470; LCO-HCO:24835] - remove uninformative taxonomic labels [mtD:55073; CI:46466; LCO-HCO:24832] - reformat db for use in NanoClass - train classifier based on fragments

==========================================

HOW TO USE THESE DBS

Use in NanoClass:

Unzip the database and copy the reference taxonomy and (unzipped) reference sequences to the NanoClass/db/common directory, like so:

$ cp mtD/bold-v20210421-taxonomy-mtD.tsv /path/to/NanoClass/db/common/ref-taxonomy.txt $ gzip -d -c mtD/bold-v20210421-frags-mtD.fa.gz > /path/to/NanoClass/db/common/ref-seqs.fna

Something similar can be done for the other two primers (CI or LCO-HCO). Only these three primers are supported at this point.

Next, create an (empty) ref-seqs.aln file just to prevent NanoClass from automatically downloading the default 16S SILVA database, which would overwrite the BOLD db you just copied into NanoClass/db/common.

$ touch /path/to/NanoClass/db/common/ref-seqs.aln

Finally, you need to make a change to the NanoClass/Snakefile (i.e change first line into the second).

optrules.extend(["plots/precision.pdf"] if len(config["methods"]) > 2 else []) optrules.extend(["plots/precision.pdf"] if len(config["methods"]) > 200 else [])

This will disable the computation of precision plots by NanoClass as this is not supported in combination with the custom BOLD CO1 databases.

Also mind that you need to change the nanofilt minlen and maxlen in the NanoClass/config.yaml to capture the appropriate fragment length for your primer. For the mtD primer I used minlen 600 and maxlen 900 for testing.

Use in QIIME2:

You can use the trained classifier directly in QIIME2, like so:

$ qiime feature-classifier classify-sklearn
--i-classifier mtD/bold-v20210421-classifier-mtD.qza
--i-reads .qza
--o-classification .qza
--verbose

Something similar can be done for the other two primers (CI or LCO-HCO). Only these three primers are supported at this point. The classifiers have only been tested with with the sklearn algorithm.
Part 201 Environmental Contamination Sites
gis-michigan.opendata.arcgis.com
gis-egle.hub.arcgis.com
Updated May 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michigan Dept. of Environment, Great Lakes, and Energy (2022). Part 201 Environmental Contamination Sites [Dataset]. https://gis-michigan.opendata.arcgis.com/datasets/egle::part-201-environmental-contamination-sites/about
Explore at:
Dataset updated
May 10, 2022
Dataset provided by
Michigan Department of Environment, Great Lakes, and Energyhttp://www.michigan.gov/egle
Authors
Michigan Dept. of Environment, Great Lakes, and Energy
Area covered

Description
The Michigan Department of Environment, Great Lakes, and Energy's (EGLE) Environmental Remediation Program manages and reduces risk at sites of environmental contamination. This is achieved through activities such as site evaluation, feasibility studies, operation and maintenance of systems, implementing land use and resource use restrictions, and monitoring. This data layer shows facilities that have been identified and mapped under Part 201, Environmental Remediation, of the Natural Resources and Environmental Protection Act, 1994 PA 451, as amended (NREPA) those areas, places, or parcels of property, or portion of a parcel of property where a hazardous substance in excess of the concentrations that satisfy the cleanup criteria for unrestricted residential use has been released, deposited, disposed of, or otherwise comes to be located. This data layer does not include all of the facilities that are subject to regulation under Part 201 because owners are not required to inform EGLE about the facilities and can pursue cleanup independently. Facilities that are not known to EGLE are not on the Inventory, nor are locations with releases that resulted in low environmental impact. This data is regularly updated. Field NameAliasDescriptionOBJECTIDN/AN/ASITENAME Site NameName for the location assigned by RRDADDRESS Address Street address for the site CITY City City associated with the street address ZIPCODE Zip Code Zip code the of the site COUNTY County County where the site is located LATITUDE Latitude Latitude (Y-Coordinate) of the siteLONGITUDE Longitude Longitude (X-Coordinate) of the siteSITEIDSite IDUnique identifier for the site within RRD’s RIDE database which connects to the Environmental MapperBusinessTypeBusiness TypeGeneral classification of the type of business that is/was associated with the Part 201 site.HorizontalReferenceDatumHorizontal Reference DatumHorizontal Reference Datum HorizontalCollectionMethodHorizontal reference Method of CollectionDescribes the method used for identifying the siteHorizontalAccuracyHorizontal Accuracy (m)An estimated measure of the horizontal accuracy of the point in meters.ReferencePointReference PointProvides a description of the relationship between the point feature and the overall siteSourceMapScale Source Map Scale The representative fraction or scale at which the point feature was mapped RiskCondition Risk ConditionRisk condition classification applied to the site by EGLE's Remediation and Redevelopment Division, which is used by the division to identify sites that are a priority to address, to manage workloads, and to report metrics on the overall facility status consistently across programs.ContaminantsContaminantsChemical classification identified on the siteHasBeaOrNomHasBeaOrNomIndicates whether EGLE has knowledge of a baseline environmental assessment or a notice of migration for the site.ProjectManagerProject ManagerThe RRD staff person assigned to manage that locationLastUpdatedLast UpdatedThe date the point was updated ShapeN/AN/A For more information about this data, please contact Matt Warner at WarnerM1@Michigan.gov.
European Union Visitor Visa Database
kaggle.com
zip
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mogens Hobolth (2025). European Union Visitor Visa Database [Dataset]. https://www.kaggle.com/datasets/mogenshobolth/european-union-schengen-visa-statistics
Explore at:
zip(845495 bytes)Available download formats
Dataset updated
May 23, 2025
Authors
Mogens Hobolth
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
Europe, European Union
Description
The European Union Visitor Visa Database contains statistics on short-stay visa issuing practices. It is based on official administrative data, cleaned-up to provide a consistent time-series from 2005 to 2022.

Background As a non-immigrant visa, a visitor visa is typically valid for a visit of up to 3 months and grants access to the entire Schengen area (for the reporting states that are full members of Schengen). Short-stay visas are an important component of border control practices, providing a mechanism for screening visitors before they arrive at the physical borders.

The statistics are reported in the original administrative data on a per consulate level. A reporting state will typically be a member of the European Union (EU) and the Schengen free travel area. The dataset does include, however, statistics reported by states in the process of becoming Schengen members. It also includes data from European countries that are a part of the Schengen but not members of the EU, such as Norway.

The dataset includes a column with the refusal rate calculated as the share of visas not issued as a total of the number of visas issued and not issued. Note that visas issued includes both explicit refusals as well as lapsed or otherwise discontinued applications.

The dataset made available here includes application statistics. It should be noted that these cover only one albeit important component of the common visa policy. Other major elements include the common list of countries subject to a visa requirement in the first place as well as consular cooperation on visa issuing.

The code for creating the dataset, as well as further details on sources, can be found in the Github repository.

Use cases The dataset can be used to probe questions on the state and evolution of EU cooperation in the area of borders and migration control. Problems that can be investigated with the data include for example: - Patterns of liberal and restrictive border practices and their determinants. - The degree of harmonization, convergent and divergent visa practices, between EU states

Acknowledgements As detailed in the repository, the raw data is processed (cleaned-up) as evidenced in the source code. The data for 2005-2012 have been imported relying on earlier data clean-up done in connection with the construction of the European Visa Database (see background section). The country classification (income group and regions) are sourced from The World Bank: World Bank Country and Lending Groups: Country classification dataset
g
The Scribe Database Collection, compiled in response to the Deepwater...
gimi9.com
Updated Jun 6, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2012). The Scribe Database Collection, compiled in response to the Deepwater Horizon oil spill incident in the Gulf of Mexico from 2010-04-23 to 2011-11-08 (NCEI Accession 0086261) [Dataset]. https://gimi9.com/dataset/data-gov_28b49704861b472c626f3234955b4f1ea055bdcc
Explore at:
Dataset updated
Jun 6, 2012
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Gulf of Mexico (Gulf of America)
Description
The Scribe Database Collection includes 14 databases containing data from the Deepwater Horizon (DWH) Oil Spill Event Response Phase. These databases are the work of federal agencies, state environmental management agencies and BP and its contractors. The types of information include locations, descriptions, and analysis of water, sediment, oil, tar, dispersant, air and other environmental samples. The versions of the databases included in this collection are the result of the second phase of a clean-up effort by the database owners and contributors to resolve inconsistencies in the initial databases and to harmonize content across the databases in order for these data to be comparable for reliable evaluation and reporting. This effort was initiated in order to meet requirements supporting the Unified Area Command.
EPA Facility Registry Service (FRS): Facility Interests Dataset Download
catalog.data.gov
data.cnra.ca.gov
+2more
Updated Mar 23, 2026
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency, Office of Environmental Information (Publisher) (2026). EPA Facility Registry Service (FRS): Facility Interests Dataset Download [Dataset]. https://catalog.data.gov/dataset/epa-facility-registry-service-frs-facility-interests-dataset-download9
Explore at:
Dataset updated
Mar 23, 2026
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This downloadable data package consists of location and facility identification information from EPA's Facility Registry Service (FRS) for all sites that are available in the FRS individual feature layers. The layers comprise the FRS major program databases, including: Assessment Cleanup and Redevelopment Exchange System (ACRES) : brownfields sites ; Air Facility System (AFS) : stationary sources of air pollution ; ICIS-AIR (AIR) : stationary sources of air pollution; Bureau of Indian Affairs (BIA) : schools data on Indian land; Base Realignment and Closure (BRAC) facilities; Clean Air Markets Division Business System (CAMDBS) : market-based air pollution control programs; Comprehensive Environmental Response, Superfund Enterprise Management System (SEMS): hazardous waste sites; Integrated Compliance Information System (ICIS) : integrated enforcement and compliance information; National Compliance Database (NCDB) : Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) and the Toxic Substances Control Act (TSCA); National Pollutant Discharge Elimination System (NPDES) module of ICIS : NPDES surface water permits; Radiation Information Database (RADINFO) : radiation and radioactivity facilities; RACT/BACT/LAER Clearinghouse (RBLC) : best available air pollution technology requirements; Resource Conservation and Recovery Act Information System (RCRAInfo) : tracks generators, transporters, treaters, storers, and disposers of hazardous waste; Toxic Release Inventory (TRI) : certain industries that use, manufacture, treat, or transport more than 650 toxic chemicals; Emission Inventory System (EIS) : inventory of large stationary sources and voluntarily-reported smaller sources of air point pollution emitters; countermeasure (SPCC) and facility response plan (FRP) subject facilities; Electronic Greenhouse Gas Reporting Tool (E-GGRT) : large greenhouse gas emitters; Emissions and; Generation Resource Integrated Database (EGRID) : power plants. The Facility Registry Service (FRS) identifies and geospatially locates facilities, sites or places subject to environmental regulations or of environmental interest. Using vigorous verification and data management procedures, FRS integrates facility data from EPA's national program systems, other federal agencies, and State and tribal master facility records and provides EPA with a centrally managed, single source of comprehensive and authoritative information on facilities. This data set contains the FRS facilities that link to the programs listed above once the program data has been integrated into the FRS database. Additional information on FRS is available at the EPA website https://www.epa.gov/enviro/facility-registry-service-frs. Included in this package are a file geodatabase, Esri ArcMap map document and an XML file of this metadata record. Full FGDC metadata records for each layer are contained in the database.
Downloadable Contaminated Site Lists
mass.gov
Updated Jan 13, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massachusetts Department of Environmental Protection (2018). Downloadable Contaminated Site Lists [Dataset]. https://www.mass.gov/info-details/downloadable-contaminated-site-lists
Explore at:
Dataset updated
Jan 13, 2018
Dataset authored and provided by
Massachusetts Department of Environmental Protection
Area covered
Massachusetts
Description
Downloadable Information on Waste Sites and Spills
d
EXXON Valdez Research and Restoration Project (EVOS) CD-ROM product,...
catalog.data.gov
datasets.ai
Updated Mar 1, 2026
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2026). EXXON Valdez Research and Restoration Project (EVOS) CD-ROM product, including the EVOS Geographic Information System (GIS) database, data dictionary and bibliography (NCEI Accession 9800175) [Dataset]. https://catalog.data.gov/dataset/exxon-valdez-research-and-restoration-project-evos-cd-rom-product-including-the-evos-geographic
Explore at:
Dataset updated
Mar 1, 2026
Dataset provided by
(Point of Contact)
Description
EXXON Valdez Oil Spill (EVOS) data were generated by the Nation Marine Fishery Service (NMFS). The EVOS area includes Prince William Sound and adjacent coastal areas. The data were put on a CD-ROM with EVOS Geographic Information Systems (GIS) database, data dictionary, and bibliography. Data are related to oil spill clean up, damage assessments, and restoration efforts. Data sets include physical features, biological features, cultural features, land status, boundaries, place names, human use, shoreline oiling, surface oiling, hydrocarbon analysis, EVOS research areas, and miscellaneous.
How Data Enrichment Services can Fix the Bad Data
kaggle.com
zip
Updated Mar 16, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aurang Zeb (2026). How Data Enrichment Services can Fix the Bad Data [Dataset]. https://www.kaggle.com/datasets/publishers/how-data-enrichment-services-can-fix-the-bad-data
Explore at:
zip(5150 bytes)Available download formats
Dataset updated
Mar 16, 2026
Authors
Aurang Zeb
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27577389%2Fdf18dcf571903a5ced883815d3ef72c1%2F1.1.jpg?generation=1773656347919651&alt=media" alt="">

You run a targeted marketing campaign with a seemingly clean list and sharp messaging. Yet, the results are disappointing. Low open rates, hard bounces, and a few replies that go nowhere. So, you and your team try to find out what went wrong. You pull the list and start checking contacts manually. What do you find? The people you were targeting are no longer working in the company. Many job titles no longer exist. Phone numbers that ring to the wrong departments. Email addresses that were never valid.

Your CRM had the data. And it was the wrong data.

This is not a targeting problem or a messaging problem. It is a data quality problem. And it is far more common than most B2B teams are willing to admit. This article explains why data quality is a critical obstacle in CRM systems and clearly argues that data enrichment services provide a targeted solution to the problem of bad data.

The Hidden Cost of Bad CRM Data

Bad data rarely announces itself. And in the world of B2B, most teams find it out after the damage is done. They discover they have a data problem after their efforts fail. It quietly eats your budget, lowers your deliverability scores, and makes your pipeline projections look worse than they should. According to Gartner research, poor data quality costs organizations an average of $12.9 million per year. That number sounds abstract until you map it to a real pipeline. Imagine this. Your team spends on outbound tools, SDR time, and campaign execution. Now, a meaningful chunk of that spend is going toward contacts who cannot be reached. The math gets worse when you account for deliverability. When your emails hard bounce at scale, inbox providers flag your domain. Your sender reputation drops. Even your valid, accurate contacts stop seeing your emails. One bad list can poison months of outbound effort. And there is a subtler cost that rarely gets discussed. When sales reps spend time calling wrong numbers or researching contacts who have moved on, they are not selling. That time cost adds up fast across a team of ten or twenty people.

Why Does CRM Data Go Bad So Quickly?

There is an uncomfortable truth about CRM data that most B2B marketers overlook. It starts decaying the moment it enters the system. People change jobs. They get promoted. Companies get acquired. Departments get restructured. A HubSpot research estimates that B2B data decays at a rate of about 22.5% per year. That means roughly one in five contacts in your CRM becomes inaccurate within twelve months of entry. Think about this. Your database has 50,000 contacts, and you last cleaned it eighteen months ago. Now do the math. You are potentially working with 15,000 to 20,000 contacts that are partially or fully wrong. Would you say that is a fringe problem? No! It is a structural issue that is costing your marketing and sales efforts. So, what are the sources of the data decay across most B2B organizations? Here are a few that you should be aware of: Job changes and promotions that update titles and email formats Company rebranding or domain changes that break email addresses Mergers and acquisitions that restructure buying committees Role eliminations that remove decision-makers entirely Manual data entry errors that slip in from the start None of these are exotic. They happen constantly. And without a systematic process to catch them, your CRM drifts further from reality with every passing quarter.

What Data Enrichment Services Actually Do

Data enrichment is not data cleansing, though the two often get confused. Cleansing removes what is wrong. Enrichment adds what is missing and updates what has changed. The distinction matters because a clean record is not the same as a complete or current one. Data cleansing and data enrichment are, in fact, two parts of the same process. First, you clean the data, and then you enrich it. A typical data enrichment services process works like this. You provide your existing contact or account records. The data enrichment provider runs them against verified, regularly updated data sources. What comes back is a record that has been checked for accuracy, filled in with missing fields, and updated to reflect current reality. In practical terms, this means: A contact who changed companies now has their current employer, title, and email A record missing a direct dial now has one appended from a verified source An account with outdated firmographics now reflects the current headcount and revenue range A contact with an invalid email has been flagged o...
github-final-datasets
kaggle.com
zip
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olga Ivanova (2023). github-final-datasets [Dataset]. https://www.kaggle.com/datasets/olgaiv39/github-final-datasets
Explore at:
zip(1877861953 bytes)Available download formats
Dataset updated
Nov 9, 2023
Authors
Olga Ivanova
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Github Clean Code Snippets Dataset

Here is a description, how the datasets for a training notebook used for Telegram ML Contest solution were prepared.

1 Step - Github Samples Database parsing

The first part of the code samples was taken from a private version of this notebook.

Here is the statistics about classes of programming languages from Github Code Snippets database https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F2fdc091661198e80559f8cb1d1a306ff%2FScreenshot%202023-11-07%20at%2021.24.42.png?generation=1699390166413391&alt=media" alt="">

From this database, 2 csv files were created - with 50000 code samples for each of the 20 programming languages included, with equal by numbers and stratified sampling. The files related here are sample_equal_prop_50000.csv and sample_equal_prop_50000.csv and sample_stratified_50000.csv, respectively.

2 Step - Github Bigquery Database parsing

Second option for capturing out additional examples was to run this notebook with making up larger amount of queries, 10000.

The resulted file is dataset-10000.csv - included to the data card

The statistics for the code programming languages is as on the next chart - it has 32 labeled classes
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F7c04342da8ec1df266cd90daf00204f9%2FScreenshot%202023-10-13%20at%2020.52.13.png?generation=1699392769199533&alt=media" alt="">

3 Step - collection of code samples of raw coding samples

To get a model more robust, code samples of 20 additional languages were collected in amount from 10 till 15 samples on more-less popular use cases. Also, for the class "OTHER", like regular language examples, according to the task of the competition, the text examples from this dataset with promts on Huggingface were added to the file. The resulted file here is rare_languages.csv - also in data card

The statistics for rare languages code snippets is as follows: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F0b340781c774d2acb988ce1567f4afa3%2FScreenshot%202023-11-08%20at%2001.13.07.png?generation=1699402436798661&alt=media" alt="">

4 Step - First and second datasets combining

For this stage of dataset creation, the number of the columns in sample_equal_prop_50000.csv and sample_stratified_50000.csv was cut out just for 2 - "snippet", "language", the version of file with equal numbers is in the data card - sample_equal_prop_50000_clean.csv

To prepare Bigquery dataset file, the column with index was cut out, and the column "content" was renamed to "snippet". These changes were saved in dataset-10000-clean.csv

After that, the files sample_equal_prop_50000_clean.csv and dataset-10000-clean.csv were combined together and saved as github-combined-file.csv

5 Step - Datasets cleaning from symbols and merging together with rare languages

The prepared files took too much RAM to be read by Pandas library, so that is why additional prepocessing has been made - the symbols like quatas, commas, ampersands, new lines and adding tabs characters were cleaned out. After clieaning, the flies were merged with rare_languages.csv file and saved as github-combined-file-no-symbols-rare-clean.csv and sample_equal_prop_50000_-no-symbols-rare-clean.csv, respectively.

The final distribution of classes turned out to be the next one https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2Ff43e0cea4c565c9f7c808527b0dfa2da%2FScreenshot%202023-11-09%20at%2020.26.30.png?generation=1699558064765454&alt=media" alt="">

6 Step - Fixing up the labels

To be suitable for TF-DF format, to each programming language a certain label was given as well. The final labels are in the data card.
ITS NCBI Qiime2 format no uncultured fungi
figshare.com
zip
Updated Jun 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxim Rubin Blum (2021). ITS NCBI Qiime2 format no uncultured fungi [Dataset]. http://doi.org/10.6084/m9.figshare.14702727.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14702727.v1
Dataset updated
Jun 1, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Maxim Rubin Blum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Qiime2 formatted NCBI ITS database (fasta + taxonomy) for analysis of fungi ITS amplicon sequencing. All sequences that have not been identified at least to Phylum level were removed.Data download: search -db nuccore -query ""(internal transcribed spacer 1"[All Fields] AND "fungi"[Filter] AND (250[SLEN] : 10000[SLEN])) NOT "uncultured Neocallimastigales"[porgn] NOT "bacteria"[Filter] NOT "uncultured fungus"[Filter] NOT "Uncultured fungus"[Filter] NOT "fungal sp."[Filter]" | efetch -format fasta -mode text > ./NCBI_ITS1_DB_raw.fastaData processing (https://github.com/gzahn/tools/blob/master/make_qiime_database_from_fasta.sh)### Search for and remove any empty sequences ###gawk 'BEGIN {RS = ">" ; FS = " " ; ORS = ""} {if ($2) print ">"$0}' NCBI_ITS1_DB_raw.fasta > NCBI_ITS1_DB_raw.fasta.tidy# Obtain NCBI taxonomy lineages for your input fastapython2 /home/bioinf/bin/entrez_qiime.py -i NCBI_ITS1_DB_raw.fasta.tidy -o NCBI_Taxonomy.txt -r kingdom,phylum,class,order,family,genus,species -a /media/bioinf/Data/NCBI_tax2021/nucl_gb.accession2taxid -n /media/bioinf/Data/NCBI_tax2021### Validate and Tidy up files ###### Edit output file to include rank IDs (QIIME needs them for some scripts)cat NCBI_Taxonomy.txt | sed 's/\t/\tk_/' | sed 's/;/>p_/' | sed 's/;/>c_/' | sed 's/;/>o_/' | sed 's/;/>f_/' | sed 's/;/>g_/' | sed 's/;/>s_/' | sed 's/>/;/g' > NCBI_QIIME_Taxonomy.txt### Edit database to single-line fasta formatawk '/^>/ {printf(" %s ",$0);next; } { printf("%s",$0);} END {printf(" ");}' < NCBI_ITS1_DB_raw.fasta.tidy > NCBI_ITS1_DB_raw.fasta.tidy.oneline.fasta### Remove first blank linesed -i '/^$/d' NCBI_ITS1_DB_raw.fasta.tidy.oneline.fasta### Remove trailing descriptions after Accession No.sed -i 's/ .*//' NCBI_ITS1_DB_raw.fasta.tidy.oneline.fasta### compare read counts in fasta and txt filesgrep -c "^>" NCBI_ITS1_DB_raw.fasta.tidy.oneline.fastawc -l NCBI_QIIME_Taxonomy.txt#if numbers are different, there are duplicates introduced by entrez_qiime.py### if some duplicates may appear in fasta file (i.e., more reads than taxonomy IDs), get lists of Seq/Taxonomy IDs and remove duplicates from fasta filecut -f 1 NCBI_QIIME_Taxonomy.txt > Tax_Namesgrep "^>" NCBI_ITS1_DB_raw.fasta.tidy.oneline.fasta | cut -d " " -f 1 | sed 's/>//g' > DB_Namessort DB_Names | uniq -d > Duplicated_IDsgrep -A1 -f Duplicated_IDs NCBI_ITS1_DB_raw.fasta.tidy.oneline.fasta | sed '/^--/d' > Duplicated_fastasfor fn in Duplicated_fastas; do count=$(wc -l add_back; donegrep -v -f Duplicated_IDs NCBI_ITS1_DB_raw.fasta.tidy.oneline.fasta > tidy.no_reps.fastacat tidy.no_reps.fasta add_back > DB_raw.fasta### Sort fasta database to same order as taxonomy mapecho "Sorting Database...This will take some time."cut -f 1 NCBI_QIIME_Taxonomy.txt > IDs_in_order.txtwhile read ID ; do grep -m 1 -A 1 "^>$ID" DB_raw.fasta ; done < IDs_in_order.txt > DB.fasta #This will take quite a long time to runmv NCBI_QIIME_Taxonomy.txt Taxonomy.txtrm DB_Names DB_raw.fasta Duplicated_fastas Duplicated_IDs IDs_in_order.txt NCBI_Taxonomy.txt Tax_Names tidy.no_reps.fasta NCBI_ITS1_DB_raw.fasta.tidy.oneline.fasta NCBI_ITS1_DB_raw.fasta.tidy add_backcat NCBI_ITS1_DB_raw.fasta.loggrep "^>" DB.fasta | sed 's/>//' >good_acc_listecho "Cleaning Taxonomy to match Database...This may take some time."while read ID ; do grep -m 1 $ID Taxonomy.txt ; done < good_acc_list > Taxonomy_ordered.txt#mv $4/Taxonomy_ordered.txt $4/Taxonomy.txt#rm $4/good_acc_listgrep "k_NA;p_NA;c_NA;o_NA;f_NA;g_NA;s_NA|^:" Taxonomy_ordered.txt | cut -f1 > bad_acc_listsed -e '/k_NA;p_NA;c_NA;o_NA;f_NA;g_NA;s_NA/d' Taxonomy_ordered.txt > Taxonomy_clean1.txtsed -e '/^:/d' Taxonomy_clean1.txt > Taxonomy.txtecho "Final cleanup to remove bad accessions..."while read bad; do echo "Removing $bad" ; sed -i -e "/$bad/,+1d" DB.fasta ; done < bad_acc_listsed -i -e '/^>:/,+1d' DB.fastagrep "^>" DB.fasta | sed 's/>//' > DB_IDs_orderedwhile read ID; do grep $ID Taxonomy_ordered.txt ; done < DB_IDs_ordered > Taxonomy_final.txtrm Taxonomy_clean1.txt Taxonomy_ordered.txtmv bad_acc_list bad_acc_list.txtecho -e "Process complete. Final database is DB_ordered.fasta, and associated taxonomy is Taxonomy_ordered.txt Accessions that were removed are in bad_acc_list.txt"
g
Alaska Geochemical Database Version 4.0 (AGDB4) including best value data...
gimi9.com
Updated Jan 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Alaska Geochemical Database Version 4.0 (AGDB4) including best value data compilations for rock, sediment, soil, mineral, and concentrate sample media | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_b94a279d9cf4a515c48113a56221dd2cee9a2c48/
Explore at:
Dataset updated
Jan 26, 2024
Description
The Alaska Geochemical Database Version 4.0 (AGDB4) contains geochemical data compilations in which each geologic material sample has one best value determination for each analyzed species, greatly improving efficiency of use. The relational database includes historical geochemical data archived in the USGS National Geochemical Database (NGDB), the Atomic Energy Commission National Uranium Resource Evaluation (NURE) Hydrogeochemical and Stream Sediment Reconnaissance databases, and the Alaska Division of Geological and Geophysical Surveys (DGGS) Geochemistry database. Data from the U.S. Bureau of Mines and the U.S. Bureau of Land Management are included as well. The data tables describe historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 120 laboratory and field analytical methods performed on 416,333 rock, sediment, soil, mineral, heavy-mineral concentrate, and oxalic acid leachate samples. The samples were collected as part of various agency programs and projects from 1938 through 2021. Most samples were collected by agency personnel and analyzed in agency laboratories or under contracts in commercial analytical laboratories. Mineralogical data from 18,138 nonmagnetic heavy-mineral concentrate samples are also included in this database. The data in the AGDB4 supersede data in the AGDB, AGDB2, and AGDB3 databases but the background about the data in these earlier versions is needed to understand what has been done to amend, clean up, correct, and format these data. Data that were not included in previous versions because they predate the earliest agency geochemical databases or were excluded for programmatic reasons are included here in the AGDB4. The AGDB4 data are the most accurate and complete to date and should be useful for a wide variety of geochemical studies. They are provided as a Microsoft Access database, as comma-separated values (CSV), and as an Esri geodatabase consisting of point feature classes and related tables.
d
UK B2C Contact Data | Contact Validation & Append | Accurate Multi Channel...
datarade.ai
.csv, .xls, .txt
Updated Nov 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagacity (2025). UK B2C Contact Data | Contact Validation & Append | Accurate Multi Channel Data | Sagacity Contact Data [Dataset]. https://datarade.ai/data-products/contact-validation-append-sagacity-contact-data-uk-wide-sagacity
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Nov 9, 2025
Dataset authored and provided by
Sagacity
Area covered
United Kingdom
Description
Our Contact Validation and Append solution identifies and fixes errors in your existing customer database whilst appending missing information, including email addresses and telephone numbers. This comprehensive approach allows you to provide excellent customer service, obtain accurate billing information, and achieve high collection rates across all your communications.

What is it? A combination of cleansing, validation, correction and appending solutions applied to your customer base, whether residential or commercial. The full process involves the following steps:

Data Cleansing: Identification and removal of duplicate, outdated, or incorrect contact records

Validation: Real-time verification of email addresses and telephone numbers to ensure deliverability

Correction: Automatic fixing of common formatting errors and standardisation of contact data

Appending: Enhancement of incomplete records with missing email addresses and telephone numbers from our comprehensive database

This multi-step approach ensures your contact database is not only clean and accurate, but also complete with the most up-to-date information available.

Use cases - Deliver more messaging to the right customers - Ensure your communications reach their intended recipients by maintaining accurate contact details - Less wastage for your messaging and marketing - Reduce bounce rates and failed delivery attempts, maximising your marketing budget efficiency - Increase delivery success and engagement propensity - Clean, validated contact data leads to higher open rates, click-through rates, and overall campaign performance - Improve customer service delivery - Reach customers through their preferred contact methods with confidence in data accuracy - Enhance billing and collection processes - Accurate contact information supports successful payment reminders and collection activities - Maintain GDPR compliance - Keep your contact database current and accurate in line with data protection requirements

Facebook

Twitter

Click to copy link

Link copied

Cite

United States Patent and Trademark Office (1999). :: United States Patent and Trademark Office :: Patent Application No. 09407650, Examiner Cheryl Renea Lewis presiding [Dataset]. https://www.plainsite.org/courts/united-states-patent-and-trademark-office/database-clean-up-system/1r9d0jvv8/

:: United States Patent and Trademark Office :: Patent Application No. 09407650, Examiner Cheryl Renea Lewis presiding

Explore at:

Dataset updated

Sep 28, 1999

Dataset provided by

PlainSitehttps://www.plainsite.org/

Authors

United States Patent and Trademark Office

License

Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically

Area covered

United States Patent and Trademark Office

Description

USPTO patent application no. 09407650 in the United States Patent and Trademark Office

Clear search

Close search

Google apps

Main menu

:: United States Patent and Trademark Office :: Patent Application No....

UST Cleanup Fund Potential Sites

NPL-PAD (National Priorities List Publication Assistance Database) for...

DEP Cleanup Sites

Dirty Dataset to practice Data Cleaning

CLEAR

List of Contaminated or Potentially Contaminated Sites - Remediation...

Alaska Geochemical Database Version 4.0 (AGDB4) including best value data...

NanoClass-compatible BOLD CO1 databases

WARNING

DESCRIPTION

HOW TO USE THESE DBS

Part 201 Environmental Contamination Sites

European Union Visitor Visa Database

The Scribe Database Collection, compiled in response to the Deepwater...

EPA Facility Registry Service (FRS): Facility Interests Dataset Download

Downloadable Contaminated Site Lists

EXXON Valdez Research and Restoration Project (EVOS) CD-ROM product,...

How Data Enrichment Services can Fix the Bad Data

Your CRM had the data. And it was the wrong data.

The Hidden Cost of Bad CRM Data

Why Does CRM Data Go Bad So Quickly?

What Data Enrichment Services Actually Do

github-final-datasets

Github Clean Code Snippets Dataset

1 Step - Github Samples Database parsing

2 Step - Github Bigquery Database parsing

3 Step - collection of code samples of raw coding samples

4 Step - First and second datasets combining

5 Step - Datasets cleaning from symbols and merging together with rare languages

6 Step - Fixing up the labels

ITS NCBI Qiime2 format no uncultured fungi

Alaska Geochemical Database Version 4.0 (AGDB4) including best value data...

UK B2C Contact Data | Contact Validation & Append | Accurate Multi Channel...

:: United States Patent and Trademark Office :: Patent Application No. 09407650, Examiner Cheryl Renea Lewis presiding