100+ datasets found

O*NET Database
onetcenter.org
excel, mysql, oracle +2
Updated Dec 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for O*NET Development (2025). O*NET Database [Dataset]. https://www.onetcenter.org/database.html
Explore at:
oracle, sql server, text, mysql, excelAvailable download formats
Dataset updated
Dec 16, 2025
Dataset provided by
Occupational Information Network
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Dataset funded by
US Department of Labor, Employment and Training Administration
Description
The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.
Data content areas include:
Worker Characteristics (e.g., Abilities, Interests, Work Styles)
Worker Requirements (e.g., Education, Knowledge, Skills)
Experience Requirements (e.g., On-the-Job Training, Work Experience)
Occupational Requirements (e.g., Detailed Work Activities, Work Context)
Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)
d
DWR Continuous Data Download Links
catalog.data.gov
data.ca.gov
+1more
Updated Jan 23, 2026
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Water Resources (2026). DWR Continuous Data Download Links [Dataset]. https://catalog.data.gov/dataset/dwr-continuous-data-download-links-90cc9
Explore at:
Dataset updated
Jan 23, 2026
Dataset provided by
California Department of Water Resources
Description
Stations and a table of download links for time-series data, from DWR's continuous environmental monitoring database. For more information, see DWR's Water Data Library, continuous data section: https://wdl.water.ca.gov/ContinuousData.aspx, where this data is also available.
E-commerce dataset by Olist (SQLite)
kaggle.com
zip
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terenci Claramunt (2024). E-commerce dataset by Olist (SQLite) [Dataset]. https://www.kaggle.com/datasets/terencicp/e-commerce-dataset-by-olist-as-an-sqlite-database
Explore at:
zip(51085670 bytes)Available download formats
Dataset updated
Apr 28, 2024
Authors
Terenci Claramunt
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
I imported the two Olist Kaggle datasets into an SQLite database. I modified the original table names to make them shorter and easier to understand. Here's the Entity-Relationship Diagram of the resulting SQLite database:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2473556%2F23a7d4d8cd99e36e32e57303eb804fff%2Fdb-schema.png?generation=1714391550829633&alt=media" alt="Database Schema">

Data sources:

https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce

https://www.kaggle.com/datasets/olistbr/marketing-funnel-olist

I used this database as a data source for my notebook:

SQL Challenge: E-commerce data analysis
classicmodels
kaggle.com
zip
Updated Dec 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marta Tavares (2022). classicmodels [Dataset]. https://www.kaggle.com/datasets/martatavares/classicmodels
Explore at:
zip(72431 bytes)Available download formats
Dataset updated
Dec 10, 2022
Authors
Marta Tavares
Description
MySQL Classicmodels sample database

The MySQL sample database schema consists of the following tables:

Customers: stores customer’s data.

Products: stores a list of scale model cars.

ProductLines: stores a list of product line categories.

Orders: stores sales orders placed by customers.

OrderDetails: stores sales order line items for each sales order.

Payments: stores payments made by customers based on their accounts.

Employees: stores all employee information as well as the organization structure such as who reports to whom.

Offices: stores sales office data.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F8652778%2Fefc56365be54c0e2591a1aefa5041f36%2FMySQL-Sample-Database-Schema.png?generation=1670498341027618&alt=media" alt="">
d
Dr. Duke's Phytochemical and Ethnobotanical Databases
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Dr. Duke's Phytochemical and Ethnobotanical Databases [Dataset]. https://catalog.data.gov/dataset/dr-dukes-phytochemical-and-ethnobotanical-databases-0849e
Explore at:
Dataset updated
Dec 2, 2025
Dataset provided by
Agricultural Research Service
Description
Of interest to pharmaceutical, nutritional, and biomedical researchers, as well as individuals and companies involved with alternative therapies and and herbal products, this database is one of the world's leading repositories of ethnobotanical data, evolving out of the extensive compilations by the former Chief of USDA's Economic Botany Laboratory in the Agricultural Research Service in Beltsville, Maryland, in particular his popular Handbook of phytochemical constituents of GRAS herbs and other economic plants (CRC Press, Boca Raton, FL, 1992). In addition to Duke's own publications, the database documents phytochemical information and quantitative data collected over many years through research results presented at meetings and symposia, and findings from the published scientific literature. The current Phytochemical and Ethnobotanical databases facilitate plant, chemical, bioactivity, and ethnobotany searches. A large number of plants and their chemical profiles are covered, and data are structured to support browsing and searching in several user-focused ways. For example, users can get a list of chemicals and activities for a specific plant of interest, using either its scientific or common name download a list of chemicals and their known activities in PDF or spreadsheet form find plants with chemicals known for a specific biological activity display a list of chemicals with their LD toxicity data find plants with potential cancer-preventing activity display a list of plants for a given ethnobotanical use find out which plants have the highest levels of a specific chemical References to the supporting scientific publications are provided for each specific result. Resources in this dataset: Resource Title: Duke-Source-CSV.zip. File Name: Duke-Source-CSV.zipResource Description: Dr. Duke's Phytochemistry and Ethnobotany - raw database tables for archival purposes. Visit https://phytochem.nal.usda.gov/phytochem/search for the interactive web version of the database. Resource Title: Data Dictionary (preliminary). File Name: DrDukesDatabaseDataDictionary-prelim.csvResource Description: This Data Dictionary describes the columns for each table. [Note that this is in progress and some variables are yet to be defined or are unused in the current implementation. Please send comments/suggestions to nal-adc-curator@ars.usda.gov ]
Chinook Database
kaggle.com
zip
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rana Sabry (2023). Chinook Database [Dataset]. https://www.kaggle.com/datasets/ranasabrii/chinook
Explore at:
zip(448874 bytes)Available download formats
Dataset updated
Nov 7, 2023
Authors
Rana Sabry
Description
The Chinook database was created as an alternative to the Northwind database. It represents a digital media store, including tables for artists, albums, media tracks, invoices and customers.

The Chinook database is available on GitHub. It’s available for various DBMSs including MySQL, SQL Server, SQL Server Compact, PostgreSQL, Oracle, DB2, and of course, SQLite.
m
Download CSV DB
maclookup.app
json
Updated Jan 30, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2026). Download CSV DB [Dataset]. https://maclookup.app/downloads/csv-database
Explore at:
jsonAvailable download formats
Dataset updated
Jan 30, 2026
Description
Free, daily updated MAC prefix and vendor CSV database. Download now for accurate device identification.
🇺🇸 US Zip Codes Database (Oct 04 2024 update)
kaggle.com
zip
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). 🇺🇸 US Zip Codes Database (Oct 04 2024 update) [Dataset]. https://www.kaggle.com/datasets/bwandowando/us-zip-codes-database-from-simplemaps-com
Explore at:
zip(4195930 bytes)Available download formats
Dataset updated
Oct 10, 2024
Authors
BwandoWando
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4408fd0c0561e4a48a03776b784ed650%2Fzip2.jpeg?generation=1728526740859651&alt=media" alt="">

US Zip Codes Database We're proud to offer a simple, accurate and up-to-date database of US Zip Codes. It's been built from the ground up using authoritative sources including the U.S. Postal Service™, U.S. Census Bureau, National Weather Service, American Community Survey, and the IRS. - Up-to-date: Data updated as of October 8, 2024. Includes data from the most recent American Community Survey (2022)! - Comprehensive: 41,618 unique zip codes including ZCTA, unique, military, and PO box zips. - Useful fields: From latitude and longitude to household income. - Accurate: Aggregated from official sources and precisely geocoded to latitude and longitude. - Simple: A single CSV file, concise field names, only one entry per zip code.

From https://simplemaps.com/data/us-zips

Image

Generated with Bing Image Generator

Note

I just downloaded and uploaded it here. All credits to https://simplemaps.com/data/us-zips
GDB Databases
zenodo.org
application/gzip, bin
Updated Sep 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tobias Fink; Lorenz C. Blum; Lars Ruddigkeit; Ruud van Deursen; Jean-Louis Reymond; Tobias Fink; Lorenz C. Blum; Lars Ruddigkeit; Ruud van Deursen; Jean-Louis Reymond (2022). GDB Databases [Dataset]. http://doi.org/10.5281/zenodo.5172018
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5172018
Dataset updated
Sep 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tobias Fink; Lorenz C. Blum; Lars Ruddigkeit; Ruud van Deursen; Jean-Louis Reymond; Tobias Fink; Lorenz C. Blum; Lars Ruddigkeit; Ruud van Deursen; Jean-Louis Reymond
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About

GDB-11 enumerates small organic molecules up to 11 atoms of C, N, O and F following simple chemical stability and synthetic feasibility rules.
GDB-13 enumerates small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date.

How to cite

To cite GDB-11, please reference:

Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physico-chemical properties, compound classes and drug discovery. Fink, T.; Reymond, J.-L. J. Chem. Inf. Model. 2007, 47, 342-353.

Virtual Exploration of the Small Molecule Chemical Universe below 160 Daltons. Fink, T.; Bruggesser, H.; Reymond, J.-L. Angew. Chem. Int. Ed. 2005, 44, 1504-1508.

To cite GDB-13, please reference:

970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. Blum L. C.; Reymond J.-L. J. Am. Chem. Soc., 2009, 131, 8732-8733.

To cite GDB-17, please reference:

Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Ruddigkeit Lars, van Deursen Ruud, Blum L. C.; Reymond J.-L. J. Chem. Inf. Model., 2012, 52, 2864-2875.

Download

You can download the databases and subsets of it using the links provided. All the molecules are stored in dearomatized, canonized SMILES format and compressed as tar/gz archive (for Windows users: Download 7-zip to open archives).

GDB-17
GDB-17-Set (50 million) GDB17.50000000.smi.gz 314 MB
Lead-like Set (100-350 MW & 1-3 clogP)(11 million) GDB17.50000000LL.smi.gz 75 MB
Lead-like Set (100-350 MW & 1-3 clogP) without small rings (3-4 ring atoms)(0.8 million) GDB17.50000000LLnoSR.smi.gz 55 MB

GDB-13
Entire GDB-13 (including all C/N/O/Cl/S molecules) gdb13.tgz 2.6 GB
GDB-13 Subsets (The sum of all the subsets below correspond to the entire GDB-13 above)
Graph subset (saturated hydrocarbons) gdb13.g.tgz 1.1 MB
Skeleton subset (unsaturated hydrocarbons) gdb13.sk.tgz 14 MB
Only carbon & nitrogen containing molecules gdb13.cn.tgz 443 MB
Only carbon & oxygen containing molecules gdb13.co.tgz 299 MB
Only carbon & nitrogen & oxygen containing molecules gdb13.cno.tgz 1.8 GB
Chlorine & sulphur containing molecules gdb13.cls.tgz 189 MB

GDB-13 Subsets (For details please refer to the Table 2 in J Comput Aided Mol Des 2011 25:637 to 647)
GDB-13 Subset AB (~635 Millions) AB.smi.gz 2.4 GB
GDB-13 Subset ABC (~441 Millions) ABC.smi.gz 1.7 GB
GDB-13 Subset ABCD (~277 Millions) ABCD.smi.gz 1.1 GB
GDB-13 Subset ABCDE (~140 Millions) ABCDE.smi.gz 565 MB
GDB-13 Subset ABCDEF (~43 Millions) ABCDEF.smi.gz 171 MB
GDB-13 Subset ABCDEFG (~13 Millions) ABCDEFG.smi.gz 50 MB
GDB-13 Subset ABCDEFGH (~1.4 Millions) ABCDEFGH.smi.gz 6.2 MB
GDB-13 Random Sample. Annotated with frequency and log-likelihood (Please refer to Exploring the GDB-13 chemical space using deep generative models)
GDB-13 Random Sample (1 Million) gdb13.1M.freq.ll.smi.gz 14.8 MB

FDB-17
FDB-17 FDB-17-fragmentset.smi.gz 62.2 MB

GDB4c
GDB4c (SMILES) GDB4c.smi.gz 6.2 MB
GDB4c3D (SMILES) GDB4c3D.smi.gz 161 MB
GDB4c3D (SDF) GDB4c3D.sdf.tar.gz 2 GB

Other
GDBMedChem (SMILES) GDBMedChem.smi 276 MB
GDBChEMBL (SMILES) GDBChEMBL.smi 353.6 MB
GDB-13 random selection (1 million) gdb13.rand1M.smi.gz 7.2 MB
Fragment-like subset (Rule of three) gdb13.frl.tgz 1.2 GB
Dark matter universe up to 9 heavy atoms dmu9.tgz 87 MB

GDB-11
Entire GDB-11 (including all C/N/O/F molecules) gdb11.tgz 122 MB
Fragrance Like Subsets: For details please refer to Ruddigkeit et al. Journal of Cheminformatics 2014, 6:27
FragranceDB (SuperScent + Flavornet) FragranceDB.smi 56 KB
TasteDB (SuperSweet + BitterDB) TasteDB.smi 44 KB
FragranceDB.FL (Fragrance-like subset of FragranceDB) FragranceDB.FL.smi 32 KB
ChEMBL.FL (Fragrance-like subset of ChEMBL) ChEMBL.FL.smi 452 KB
PubChem.FL Fragrance-like subset of PubChem PubChem.FL.smi 20 MB
ZINC.FL (Fragrance-like subset of ZINC) ZINC.FL.smi 1.3 MB
GDB-13.FL (Fragrance-like subset of GDB-13) GDB-13.FL.smi.gz 165 MB

Terms and conditions: The GDB databases may be downloaded free of charge. In published research involving GDB, cite the appropriate references mentioned above. GDB must not be used as part of or in patents. GDB and large portions thereof must not be redistributed without the express written permission of Jean-Louis Reymond.
Example Data Files
redivis.com
application/jsonl +7
Updated Jan 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2025). Example Data Files [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
Explore at:
sas, csv, spss, avro, stata, arrow, application/jsonl, parquetAvailable download formats
Dataset updated
Jan 30, 2025
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Description
Abstract

This is an example dataset demonstrating new non-tabular data file functionality on Redivis.

Methodology

Redivis now supports uploading arbitrary files to datasets. Alongside existing support for tabular data, this expands the breadth of data on Redivis and opens up novel research opportunities. Datasets can have millions of files, and each file can be up to 5 terabytes.

While you can upload literally any file type, this dataset demonstrates previews for some common file formats:

3D models

Audio files

%3C!-- --%3E

CIF + PDB files (molecular + protein structures)

FITS files (common in astronomy)

DICOM (common in MRIs)

HDF5

Images

PDFs

Videos

Text/code

TIFFs

ZIPs

TEI

%3C!-- --%3E

These previews are enabled by numerous contributions in the open source and academic community, including:

Three.js

Codemirror

DICOM Web viewer

H5Web

JS9

PDBE-molstar

GeoTIFF.js

zip.js

CETEIcean

%3C!-- --%3E

Usage

This dataset primarily consists of two folders that can be used to train and evaluate image classifications models (cats vs. dogs, of course). The files in the training images folder have already been classified, while those in the test images folder are not.

This dataset also contains another folder of example file types that have built-in previews on Redivis. You can upload any file type to Redivis, and download these files and work with them in your notebooks. However, we endeavor to provide interactive previews for common file types when it is feasible in a web browser environment. Contact us if you'd like to see a preview added for a new file format!

Beyond previewing and downloading files, many use cases will utilize the redivis-python and redivis-r client libraries to stream files to a computational environment (either within Redivis notebooks or elsewhere) for further analysis.

You can view an example image classification project using this dataset here.

This analysis is reproduced from [it original publication.](%3Chttps://towardsdatascience.com/image-classifier-cats-vs-dogs-with-convolu
Northwind and Chinook DataBase
kaggle.com
zip
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RCURIOSO (2024). Northwind and Chinook DataBase [Dataset]. https://www.kaggle.com/datasets/rcurioso/northwind-and-chinook-database/code
Explore at:
zip(461230 bytes)Available download formats
Dataset updated
Jun 19, 2024
Authors
RCURIOSO
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Northwind Database

La base de datos Northwind es una base de datos de muestra creada originalmente por Microsoft y utilizada como base para sus tutoriales en una variedad de productos de bases de datos durante décadas. La base de datos de Northwind contiene datos de ventas de una empresa ficticia llamada "Northwind Traders", que importa y exporta alimentos especiales de todo el mundo. La base de datos Northwind es un excelente esquema tutorial para un ERP de pequeñas empresas, con clientes, pedidos, inventario, compras, proveedores, envíos, empleados y contabilidad de entrada única. Desde entonces, la base de datos Northwind ha sido trasladada a una variedad de bases de datos que no son de Microsoft, incluido PostgreSQL.

El conjunto de datos de Northwind incluye datos de muestra para lo siguiente.

Proveedores: Proveedores y vendedores de Northwind

Clientes: Clientes que compran productos de Northwind

Empleados: detalles de los empleados de los comerciantes de Northwind

Productos: Información del producto

Transportistas: los detalles de los transportistas que envían los productos desde los comerciantes a los clientes finales.

Órdenes y detalles de la orden: transacciones de órdenes de venta que tienen lugar entre los clientes y la empresa.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13411583%2Fa52a5bbc3d8842abfdfcfe608b7a8d25%2FNorthwind_E-R_Diagram.png?generation=1718785485874540&alt=media" alt="">

Chinook DataBase

Chinook es una base de datos de muestra disponible para SQL Server, Oracle, MySQL, etc. Se puede crear ejecutando un único script SQL. La base de datos Chinook es una alternativa a la base de datos Northwind, siendo ideal para demostraciones y pruebas de herramientas ORM dirigidas a servidores de bases de datos únicos o múltiples.

El modelo de datos Chinook representa una tienda de medios digitales, que incluye tablas para artistas, álbumes, pistas multimedia, facturas y clientes.

Los datos relacionados con los medios se crearon utilizando datos reales de una biblioteca de iTunes. La información de clientes y empleados se creó manualmente utilizando nombres ficticios, direcciones que se pueden ubicar en mapas de Google y otros datos bien formateados (teléfono, fax, correo electrónico, etc.). La información de ventas se genera automáticamente utilizando datos aleatorios durante un período de cuatro años.

¿Por qué el nombre Chinook? El nombre de esta base de datos de ejemplo se basó en la base de datos Northwind. Los chinooks son vientos en el interior oeste de América del Norte, donde las praderas canadienses y las grandes llanuras se encuentran con varias cadenas montañosas. Los chinooks son más frecuentes en el sur de Alberta en Canadá. Chinook es una buena opción de nombre para una base de datos que pretende ser una alternativa a Northwind.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13411583%2Fd856e0358e3a572d50f1aba5e171c1c6%2FChinook%20DataBase.png?generation=1718785749657445&alt=media" alt="">
OECD Regional database
catalog.data.gov
Updated Mar 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (2021). OECD Regional database [Dataset]. https://catalog.data.gov/dataset/oecd-regional-database
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
United States Department of Statehttp://state.gov/
Description
The OECD regional database is delivered through the viewer OECD eXplorer, an interactive mapping tool designed to let users explore, download and visualize data with maps, histograms, scatterplot and others. The database comprise a set of comparable statistics on about 2000 regions in the 33 OECD countries, on topics such as population, economic output, productivity, labor market, education and innovation themes to highlight differences within countries.
c
Walmart Products Dataset – Free Product Data CSV
crawlfeeds.com
csv, zip
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Walmart Products Dataset – Free Product Data CSV [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.

Key Features

Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.

CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.

Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.

Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.

Who Benefits?

Data analysts & researchers exploring e-commerce trends or product catalog data.

Developers & data scientists building price-comparison tools, recommendation engines or ML models.

E-commerce strategists/marketers need product metadata for competitive analysis or market research.

Students/hobbyists needing a free dataset for learning or demo projects.

Why Use This Dataset Instead of Manual Scraping?

Time-saving: No need to write scrapers or deal with rate limits.

Clean, structured data: All records are verified and already formatted in CSV, saving hours of cleaning.

Risk-free: Avoid Terms-of-Service issues or IP blocks that come with manual scraping.
Instant access: Free and immediately downloadable.
World Administrative Boundaries
geopostcodes.com
csv
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeoPostcodes (2024). World Administrative Boundaries [Dataset]. https://www.geopostcodes.com/world-administrative-boundaries/
Explore at:
csvAvailable download formats
Dataset updated
Apr 28, 2024
Dataset authored and provided by
GeoPostcodes
Area covered
World
Description
Our World Administrative Boundaries Database offers comprehensive postal code data for spatial analysis, including postal and administrative areas. This dataset contains accurate and up-to-date information on all administrative divisions, cities, and zip codes, making it an invaluable resource for various applications such as address capture and validation, map and visualization, reporting and business intelligence (BI), master data management, logistics and supply chain management, and sales and marketing. Our location data packages are available in various formats, including CSV, optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more. Product features include fully and accurately geocoded data, multi-language support with address names in local and foreign languages, comprehensive city definitions, and the option to combine map data with UNLOCODE and IATA codes, time zones, and daylight saving times. Companies choose our location databases for their enterprise-grade service, reduction in integration time and cost by 30%, and weekly updates to ensure the highest quality.
EPA Facility Registry Service (FRS): Facility Interests Dataset Download
catalog.data.gov
data.cnra.ca.gov
+3more
Updated Feb 10, 2026
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency, Office of Environmental Information (Publisher) (2026). EPA Facility Registry Service (FRS): Facility Interests Dataset Download [Dataset]. https://catalog.data.gov/dataset/epa-facility-registry-service-frs-facility-interests-dataset-download9
Explore at:
Dataset updated
Feb 10, 2026
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This downloadable data package consists of location and facility identification information from EPA's Facility Registry Service (FRS) for all sites that are available in the FRS individual feature layers. The layers comprise the FRS major program databases, including: Assessment Cleanup and Redevelopment Exchange System (ACRES) : brownfields sites ; Air Facility System (AFS) : stationary sources of air pollution ; ICIS-AIR (AIR) : stationary sources of air pollution; Bureau of Indian Affairs (BIA) : schools data on Indian land; Base Realignment and Closure (BRAC) facilities; Clean Air Markets Division Business System (CAMDBS) : market-based air pollution control programs; Comprehensive Environmental Response, Superfund Enterprise Management System (SEMS): hazardous waste sites; Integrated Compliance Information System (ICIS) : integrated enforcement and compliance information; National Compliance Database (NCDB) : Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) and the Toxic Substances Control Act (TSCA); National Pollutant Discharge Elimination System (NPDES) module of ICIS : NPDES surface water permits; Radiation Information Database (RADINFO) : radiation and radioactivity facilities; RACT/BACT/LAER Clearinghouse (RBLC) : best available air pollution technology requirements; Resource Conservation and Recovery Act Information System (RCRAInfo) : tracks generators, transporters, treaters, storers, and disposers of hazardous waste; Toxic Release Inventory (TRI) : certain industries that use, manufacture, treat, or transport more than 650 toxic chemicals; Emission Inventory System (EIS) : inventory of large stationary sources and voluntarily-reported smaller sources of air point pollution emitters; countermeasure (SPCC) and facility response plan (FRP) subject facilities; Electronic Greenhouse Gas Reporting Tool (E-GGRT) : large greenhouse gas emitters; Emissions and; Generation Resource Integrated Database (EGRID) : power plants. The Facility Registry Service (FRS) identifies and geospatially locates facilities, sites or places subject to environmental regulations or of environmental interest. Using vigorous verification and data management procedures, FRS integrates facility data from EPA's national program systems, other federal agencies, and State and tribal master facility records and provides EPA with a centrally managed, single source of comprehensive and authoritative information on facilities. This data set contains the FRS facilities that link to the programs listed above once the program data has been integrated into the FRS database. Additional information on FRS is available at the EPA website https://www.epa.gov/enviro/facility-registry-service-frs. Included in this package are a file geodatabase, Esri ArcMap map document and an XML file of this metadata record. Full FGDC metadata records for each layer are contained in the database.
Gridded Soil Survey Geographic Database (gSSURGO)
agdatacommons.nal.usda.gov
bin
Updated Nov 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA Natural Resources Conservation Service (2025). Gridded Soil Survey Geographic Database (gSSURGO) [Dataset]. http://doi.org/10.15482/USDA.ADC/1255234
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1255234
Dataset updated
Nov 22, 2025
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
Authors
USDA Natural Resources Conservation Service
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is called the Gridded SSURGO (gSSURGO) Database and is derived from the Soil Survey Geographic (SSURGO) Database. SSURGO is generally the most detailed level of soil geographic data developed by the National Cooperative Soil Survey (NCSS) in accordance with NCSS mapping standards. The tabular data represent the soil attributes, and are derived from properties and characteristics stored in the National Soil Information System (NASIS). The gSSURGO data were prepared by merging traditional SSURGO digital vector map and tabular data into State-wide extents, and adding a State-wide gridded map layer derived from the vector, plus a new value added look up (valu) table containing "ready to map" attributes. The gridded map layer is offered in an ArcGIS file geodatabase raster format. The raster and vector map data have a State-wide extent. The raster map data have a 10 meter cell size that approximates the vector polygons in an Albers Equal Area projection. Each cell (and polygon) is linked to a map unit identifier called the map unit key. A unique map unit key is used to link to raster cells and polygons to attribute tables, including the new value added look up (valu) table that contains additional derived data. The value added look up (valu) table contains attribute data summarized to the map unit level using best practice generalization methods intended to meet the needs of most users. The generalization methods include map unit component weighted averages and percent of the map unit meeting a given criteria. Resources in this dataset:Resource Title: gSSURGO downloads Page. File Name: Web Page, url: https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053628#value Download gSSURGO Databases

Other resources include introduction to gSSURGO, User Guide (PDF; 4.22 MB), SSURGO/gSSURGO ArcTools, Valu1 (Value Added Look Up) Table, Metadata, Recommended Data Citations, Technical Information, Sample gSSURGO Map Themes
Complete Antivirus Database
comodo.com
cav
Updated Apr 15, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Comodo (2010). Complete Antivirus Database [Dataset]. https://www.comodo.com/home/internet-security/updates/vdp/database.php
Explore at:
cavAvailable download formats
Dataset updated
Apr 15, 2010
Dataset provided by
Comodo Grouphttp://www.comodo.com/
Authors
Comodo
License
https://www.comodo.com/home/internet-security/updates/vdp/database.phphttps://www.comodo.com/home/internet-security/updates/vdp/database.php
Description
The complete Comodo Internet Security database is available for download...
d
Data from: Global Terrorism Database
catalog.data.gov
datasets.ai
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Maryland (UMD) (2023). Global Terrorism Database [Dataset]. https://catalog.data.gov/dataset/global-terrorism-database
Explore at:
Dataset updated
May 30, 2023
Dataset provided by
University of Maryland (UMD)
Description
The Global Terrorism Database™ (GTD) is an open-source database including information on terrorist events around the world from 1970 through 2020 (with annual updates planned for the future). Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 200,000 cases.
Kraken2 Human database
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Feb 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael B. Hall; Michael B. Hall (2024). Kraken2 Human database [Dataset]. http://doi.org/10.5281/zenodo.8339700
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8339700
Dataset updated
Feb 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael B. Hall; Michael B. Hall
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A kraken2 database built from just the Human library on 29/06/2023. This archive contains just the three files required by kraken2, hash.k2d, opts.k2d, and taxo.k2d.

The commands used to download and build this database are:

k2 download-taxonomy --db db/ k2 download-library --db db/ --library human k2 build --kmer-len 35 --minimizer-len 31 --minimizer-spaces 7 --threads 8 --db db/
d
YMDB - Yeast Metabolome Database
dknet.org
rrid.site
+2more
Updated Aug 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). YMDB - Yeast Metabolome Database [Dataset]. http://identifiers.org/RRID:SCR_005890
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005890
Dataset updated
Aug 12, 2024
Description
A manually curated database of small molecule metabolites found in or produced by Saccharomyces cerevisiae (also known as Baker's yeast and Brewer's yeast). This database covers metabolites described in textbooks, scientific journals, metabolic reconstructions and other electronic databases. YMDB contains metabolites arising from normal S. cerevisiae metabolism under defined laboratory conditions as well as metabolites generated by S. cerevisiae when used in baking and in the production of wines, beers and spirits. YMDB currently contains 2027 small molecules with 857 associated enzymes and 138 associated transporters. Each small molecule has 48 data fields describing the metabolite, its chemical properties and links to spectral and chemical databases. Each enzyme/transporter is linked to its associated metabolites and has 30 data fields describing both the gene and corresponding protein. Users may search through the YMDB using a variety of database-specific tools. The simple text query supports general text queries of the textual component of the database. By selecting either metabolites or proteins in the search for field it is possible to restrict the search and the returned results to only those data associated with metabolites or with proteins. Clicking on the Browse button generates a tabular synopsis of YMDB's content. This browser view allows users to casually scroll through the database or re-sort its contents. Clicking on a given MetaboCard button brings up the full data content for the corresponding metabolite. A complete explanation of all the YMDB fields and sources is available. Under the Search link users will find a number of search options listed in a pull-down menu. The Chem Query option allows users to draw (using MarvinSketch applet or a ChemSketch applet) or to type (SMILES string) a chemical compound and to search the YMDB for chemicals similar or identical to the query compound. The Advanced Search option supports a more sophisticated text search of the text portion of YMDB. The Sequence Search button allows users to conduct BLASTP (protein) sequence searches of all sequences contained in YMDB. Both single and multiple sequence (i.e. whole proteome) BLAST queries are supported. YMDB also supports a Data Extractor option that allows specific data fields or combinations of data fields to be searched and/or extracted. Spectral searches of YMDB's reference compound NMR and MS spectral data are also supported through its MS, MS/MS, GC/MS and NMR Spectra Search links. Users may download YMDB's complete textual data, chemical structures and sequence data by clicking on the Download button.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Center for O*NET Development (2025). O*NET Database [Dataset]. https://www.onetcenter.org/database.html

O*NET Database

Explore at:

oracle, sql server, text, mysql, excelAvailable download formats

Dataset updated

Dec 16, 2025

Dataset provided by

Occupational Information Network

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United States

Dataset funded by

US Department of Labor, Employment and Training Administration

Description

The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.

Data content areas include:

Worker Characteristics (e.g., Abilities, Interests, Work Styles)
Worker Requirements (e.g., Education, Knowledge, Skills)
Experience Requirements (e.g., On-the-Job Training, Work Experience)
Occupational Requirements (e.g., Detailed Work Activities, Work Context)
Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)

Clear search

Close search

Google apps

Main menu

O*NET Database

DWR Continuous Data Download Links

E-commerce dataset by Olist (SQLite)

classicmodels

Dr. Duke's Phytochemical and Ethnobotanical Databases

Chinook Database

Download CSV DB

🇺🇸 US Zip Codes Database (Oct 04 2024 update)

Image

Note

GDB Databases

Example Data Files

Abstract

Methodology

Usage

Northwind and Chinook DataBase

OECD Regional database

Walmart Products Dataset – Free Product Data CSV

Key Features

Who Benefits?

Why Use This Dataset Instead of Manual Scraping?

World Administrative Boundaries

EPA Facility Registry Service (FRS): Facility Interests Dataset Download

Gridded Soil Survey Geographic Database (gSSURGO)

Complete Antivirus Database

Data from: Global Terrorism Database

Kraken2 Human database

YMDB - Yeast Metabolome Database

O*NET Database