100+ datasets found

Registry of Open Data on AWS
registry.opendata.aws
Updated Aug 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Services (2021). Registry of Open Data on AWS [Dataset]. https://registry.opendata.aws/registry-open-data/
Explore at:
Dataset updated
Aug 13, 2021
Dataset provided by
Amazon Web Serviceshttp://aws.amazon.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Currently, only the ndjson form of the registry is populated here.
SpaceEye-T VVHR EO Open Data
registry.opendata.aws
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SI Imaging Services (2025). SpaceEye-T VVHR EO Open Data [Dataset]. https://registry.opendata.aws/st-open-data/
Explore at:
Dataset updated
Sep 26, 2025
Dataset provided by
SI Imaging Services Co., Ltd.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SpaceEye-T satellite collects the highest resolution optical imagery among the commercial satellites, 25 cm resolution. The Open Data features various satellite images around the world for end users to experience the power of VVHR optical data.
AWS Public Blockchain Data
registry.opendata.aws
Updated Sep 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Services (2022). AWS Public Blockchain Data [Dataset]. https://registry.opendata.aws/aws-public-blockchain/
Explore at:
Dataset updated
Sep 23, 2022
Dataset provided by
Amazon Web Serviceshttp://aws.amazon.com/
Description
The AWS Public Blockchain Data initiative provides free access to blockchain datasets through collaboration with data providers. The data is optimized for analytics by being transformed into compressed Parquet files, partitioned by date for efficient querying.

Datasets
Blockchain dataset - Maintained by - Path:
- Bitcoin - AWS - s3://aws-public-blockchain/v1.0/btc/
- Ethereum - AWS - s3://aws-public-blockchain/v1.0/eth/
- Arbitrum - SonarX - s3://aws-public-blockchain/v1.1/sonarx/arbitrum/
- Aptos - SonarX - s3://aws-public-blockchain/v1.1/sonarx/aptos/
- Base - SonarX - s3://aws-public-blockchain/v1.1/sonarx/base/
- Provenance - SonarX - s3://aws-public-blockchain/v1.1/sonarx/provenance/
- XRP Ledger - SonarX - s3://aws-public-blockchain/v1.1/sonarx/xrp/
- Stellar(XDR files) - Stellar - s3://aws-public-blockchain/v1.1/stellar/
- The Open Network (TON) - TON - s3://aws-public-blockchain/v1.1/ton/
- Cronos - Cronos - s3://aws-public-blockchain/v1.1/cronos/

Become a Data Provider

We welcome additional blockchain data providers to join this initiative. If you're interested in contributing datasets to the AWS Public Blockchain Data program, please contact our team at aws-public-blockchain@amazon.com.
o
Data from: Sentinel-2
registry.opendata.aws
Updated Apr 19, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sinergise (2018). Sentinel-2 [Dataset]. https://registry.opendata.aws/sentinel-2/
Explore at:
Dataset updated
Apr 19, 2018
Dataset provided by
<a href="https://www.sinergise.com/">Sinergise</a>
Description
The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in on-going studies. L1C data are available from June 2015 globally. L2A data are available from November 2016 over Europe region and globally since January 2017.
OpenStreetMap on AWS
registry.opendata.aws
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenStreetMap Foundation (OSMF) and Pacific Atlas (2025). OpenStreetMap on AWS [Dataset]. https://registry.opendata.aws/osm/
Explore at:
Dataset updated
May 9, 2025
Dataset provided by
OpenStreetMap//www.openstreetmap.org/
Description
OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3 in both standard formats (OSM PBF, XML) and cloud-native formats optimized for analytics workloads.
o
NEXRAD on AWS
registry.opendata.aws
Updated Apr 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2018). NEXRAD on AWS [Dataset]. https://registry.opendata.aws/noaa-nexrad/
Explore at:
Dataset updated
Apr 19, 2018
Dataset provided by
<a href="https://www.unidata.ucar.edu/">Unidata</a>
Description
Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.
Update
The NEXRAD Level II archive data is moving to a new bucket: unidata-nexrad-level2 and SNS topic: arn:aws:sns:us-east-1:684042711724:NewNEXRADLevel2Archive. The old bucket and SNS topic are now deprecated and will no longer be available starting September 1, 2025.
o
CMAS Data Warehouse
registry.opendata.aws
Updated Dec 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CMAS CENTER (2022). CMAS Data Warehouse [Dataset]. https://registry.opendata.aws/cmas-data-warehouse/
Explore at:
Dataset updated
Dec 31, 2022
Dataset provided by
<a href="https://cmascenter.org/">CMAS CENTER</a>
Description
CMAS Data Warehouse on AWS collects and disseminates meteorology, emissions and air quality model input and output for Community Multiscale Air Quality (CMAQ) Model Applications. This dataset is available as part of the AWS Open Data Program, therefore egress fees are not charged to either the host or the person downloading the data. This S3 bucket is maintained as a public service by the University of North Carolina's CMAS Center, the US EPA’s Office of Research and Development, and the US EPA’s Office of Air and Radiation. Metadata and DOIs for datasets included in the CMAS Data Warehouse are available from the CMAS Dataverse site: https://dataverse.unc.edu/dataverse/cmascenter
o
Sentinel-1
registry.opendata.aws
Updated Apr 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sinergise (2018). Sentinel-1 [Dataset]. https://registry.opendata.aws/sentinel-1/
Explore at:
Dataset updated
Apr 20, 2018
Dataset provided by
<a href="https://www.sinergise.com/">Sinergise</a>
Description
Sentinel-1 is a pair of European radar imaging (SAR) satellites launched in 2014 and 2016. Its 6 days revisit cycle and ability to observe through clouds makes it perfect for sea and land monitoring, emergency response due to environmental disasters, and economic applications. This dataset represents the global Sentinel-1 GRD archive, from beginning to the present, converted to cloud-optimized GeoTIFF format.
Oxford Nanopore Technologies Benchmark Datasets
registry.opendata.aws
Updated Sep 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxford Nanopore Technologies (2020). Oxford Nanopore Technologies Benchmark Datasets [Dataset]. https://registry.opendata.aws/ont-open-data/
Explore at:
Dataset updated
Sep 29, 2020
Dataset provided by
Oxford Nanopore Technologieshttp://nanoporetech.com/
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The ont-open-data registry provides reference sequencing data from Oxford Nanopore Technologies to support, 1) Exploration of the characteristics of nanopore sequence data. 2) Assessment and reproduction of performance benchmarks 3) Development of tools and methods. The data deposited showcases DNA sequences from a representative subset of sequencing chemistries. The datasets correspond to publicly-available reference samples (e.g. Genome In A Bottle reference cell lines). Raw data are provided with metadata and scripts to describe sample and data provenance.
o
Data from: The Multilingual Amazon Reviews Corpus
registry.opendata.aws
Updated May 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon (2020). The Multilingual Amazon Reviews Corpus [Dataset]. https://registry.opendata.aws/amazon-reviews-ml/
Explore at:
Dataset updated
May 28, 2020
Dataset provided by
Amazon.comhttp://amazon.com/
Description
We present a collection of Amazon reviews specifically designed to aid research in multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. 'books', 'appliances', etc.)
o
Global Database of Events, Language and Tone (GDELT)
registry.opendata.aws
Updated Apr 19, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unmanaged (2018). Global Database of Events, Language and Tone (GDELT) [Dataset]. https://registry.opendata.aws/gdelt/
Explore at:
Dataset updated
Apr 19, 2018
Dataset provided by
Unmanaged
Description
This project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, emotions, quotes, images and events driving our global society every second of every day.
o
OpenAQ
registry.opendata.aws
Updated Apr 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAQ (2018). OpenAQ [Dataset]. https://registry.opendata.aws/openaq/
Explore at:
Dataset updated
Apr 20, 2018
Dataset provided by
<a href="https://openaq.org">OpenAQ</a>
Description
Global, aggregated physical air quality data from public data sources provided by government, research-grade and other sources. These awesome groups do the hard work of measuring these data and publicly sharing them, and our community makes them more universally-accessible to both humans and machines.
Allen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology Data...
registry.opendata.aws
Updated Feb 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allen Institute (2023). Allen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology Data [Dataset]. https://registry.opendata.aws/allen-nd-open-data/
Explore at:
Dataset updated
Feb 1, 2023
Dataset provided by
Allen Institute
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the raw and derived data we collect publicly with rich metadata, including preliminary data collected during methods development, as near to the time of collection as possible.
Global Biodiversity Information Facility (GBIF) Species Occurrences
registry.opendata.aws
Updated May 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Global Biodiversity Information Facility (GBIF) (2021). Global Biodiversity Information Facility (GBIF) Species Occurrences [Dataset]. https://registry.opendata.aws/gbif/
Explore at:
Dataset updated
May 17, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Description
The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.
NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 &...
registry.opendata.aws
Updated Apr 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA (2025). NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 & 19 [Dataset]. https://registry.opendata.aws/noaa-goes/
Explore at:
Dataset updated
Apr 4, 2025
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description

NEW GOES-19 Data!! On April 4, 2025 at 1500 UTC, the GOES-19 satellite will be declared the Operational GOES-East satellite. All products and services, including NODD, for GOES-East will transition to GOES-19 data at that time. GOES-19 will operate out of the GOES-East location of 75.2°W starting on April 1, 2025 and through the operational transition. Until the transition time and during the final stretch of Post Launch Product Testing (PLPT), GOES-19 products are considered non-operational regardless of their validation maturity level. Shortly following the transition of GOES-19 to GOES-East, all data distribution from GOES-16 will be turned off. GOES-16 will drift to the storage location at 104.7°W. GOES-19 data should begin flowing again on April 4th once this maneuver is complete.

NEW GOES 16 Reprocess Data!! The reprocessed GOES-16 ABI L1b data mitigates systematic data issues (including data gaps and image artifacts) seen in the Operational products, and improves the stability of both the radiometric and geometric calibration over the course of the entire mission life. These data were produced by recomputing the L1b radiance products from input raw L0 data using improved calibration algorithms and look-up tables, derived from data analysis of the NIST-traceable, on-board sources. In addition, the reprocessed data products contain enhancements to the L1b file format, including limb pixels and pixel timestamps, while maintaining compatibility with the operational products. The datasets currently available span the operational life of GOES-16 ABI, from early 2018 through the end of 2024. The Reprocessed L1b dataset shows improvement over the Operational L1b products but may still contain data gaps or discrepancies. Please provide feedback to Dan Lindsey (dan.lindsey@noaa.gov) and Gary Lin (guoqing.lin-1@nasa.gov). More information can be found in the GOES-R ABI Reprocess User Guide.

NOTICE: As of January 10th 2023, GOES-18 assumed the GOES-West position and all data files are deemed both operational and provisional, so no ‘preliminary, non-operational’ caveat is needed. GOES-17 is now offline, shifted approximately 105 degree West, where it will be in on-orbit storage. GOES-17 data will no longer flow into the GOES-17 bucket. Operational GOES-West products can be found in the GOES-18 bucket.

GOES satellites (GOES-16, GOES-17, GOES-18 & GOES-19) provide continuous weather imagery and monitoring of meteorological and space environment data across North America. GOES satellites provide the kind of continuous monitoring necessary for intensive data analysis. They hover continuously over one position on the surface. The satellites orbit high enough to allow for a full-disc view of the Earth. Because they stay above a fixed spot on the surface, they provide a constant vigil for the atmospheric "triggers" for severe weather conditions such as tornadoes, flash floods, hailstorms, and hurricanes. When these conditions develop, the GOES satellites are able to monitor storm development and track their movements. SUVI products available in both NetCDF and FITS.
o
Amazon Bin Image Dataset
registry.opendata.aws
Updated Apr 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon (2018). Amazon Bin Image Dataset [Dataset]. https://registry.opendata.aws/amazon-bin-imagery/
Explore at:
Dataset updated
Apr 20, 2018
Dataset provided by
Amazon.comhttp://amazon.com/
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
The Amazon Bin Image Dataset contains over 500,000 images and metadata from bins of a pod in an operating Amazon Fulfillment Center. The bin images in this dataset are captured as robot units carry pods as part of normal Amazon Fulfillment Center operations.
o
NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and...
registry.opendata.aws
Updated Jul 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (NLM) (2021). NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS [Dataset]. https://registry.opendata.aws/ncbi-pmc/
Explore at:
Dataset updated
Jul 4, 2021
Dataset provided by
<a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
Description
PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on general license type:

The PMC Open Access (OA) Subset, which includes all articles in PMC with a machine-readable Creative Commons license

The Author Manuscript Dataset, which includes all articles collected under a funder policy in PMC and made available in machine-readable formats for text mining

These datasets collectively span more than half of PMC’s total collection of full-text articles. PMC enables access to these datasets to expand the impact of open access and publicly-funded research; enable greater machine learning across the spectrum of scientific research; reach new audiences; and open new doors for discovery. The bucket in this registry contains individual articles in NISO Z39.96-2015 JATS XML format as well as in plain text as extracted from the XML. The bucket is updated daily with new and updated articles. Also included are file lists that include metadata for articles in each dataset.
World Bank - Light Every Night
registry.opendata.aws
Updated Jan 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (2021). World Bank - Light Every Night [Dataset]. https://registry.opendata.aws/wb-light-every-night/
Explore at:
Dataset updated
Jan 21, 2021
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Light Every Night - World Bank Nighttime Light Data – provides open access to all nightly imagery and data from the Visible Infrared Imaging Radiometer Suite Day-Night Band (VIIRS DNB) from 2012-2020 and the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) from 1992-2013. The underlying data are sourced from the NOAA National Centers for Environmental Information (NCEI) archive. Additional processing by the University of Michigan enables access in Cloud Optimized GeoTIFF format (COG) and search using the Spatial Temporal Asset Catalog (STAC) standard. The data is published and openly available under the terms of the World Bank’s open data license.
Open Targets - Data Lakehouse Ready
registry.opendata.aws
Updated Sep 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Services (2020). Open Targets - Data Lakehouse Ready [Dataset]. https://registry.opendata.aws/opentargets/
Explore at:
Dataset updated
Sep 15, 2020
Dataset provided by
Amazon Web Serviceshttp://aws.amazon.com/
Description
This a Parquet representation of the Open Targets Platform's latest export. The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score and rank target-disease associations for drug target identification. The Open Targets Platform (https://www.targetvalidation.org) is a freely available resource for the integration of genetics, genomics, and chemical data to aid systematic drug target identification and prioritisation. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of the Registry of Open Data S3 bucket. Deploy this dataset's corresponding CloudFormation template to create the AWS Glue catalog entries into your account in about 30 seconds. That one step will enable you to write SQL with AWS Athena, build dashboards and charts with Amazon Quicksight, perform HPC with AWS EMR, or join into your AWS Redshift clusters. More detail in (the documentation)[https://github.com/aws-samples/data-lake-as-code/blob/roda/README.md.
COVID-19 Data Lake
registry.opendata.aws
Updated Apr 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Services (2020). COVID-19 Data Lake [Dataset]. https://registry.opendata.aws/aws-covid19-lake/
Explore at:
Dataset updated
Apr 8, 2020
Dataset provided by
Amazon Web Serviceshttp://aws.amazon.com/
Description
A centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness, COVID-19. Globally, there are several efforts underway to gather this data, and we are working with partners to make this crucial data freely available and keep it up-to-date. Hosted on the AWS cloud, we have seeded our curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI.

Facebook

Twitter

Click to copy link

Link copied

Cite

Amazon Web Services (2021). Registry of Open Data on AWS [Dataset]. https://registry.opendata.aws/registry-open-data/

Registry of Open Data on AWS

Explore at:

Dataset updated

Aug 13, 2021

Dataset provided by

Amazon Web Serviceshttp://aws.amazon.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Currently, only the ndjson form of the registry is populated here.

Clear search

Close search

Google apps

Main menu

Registry of Open Data on AWS

SpaceEye-T VVHR EO Open Data

AWS Public Blockchain Data

Datasets

Become a Data Provider

Data from: Sentinel-2

OpenStreetMap on AWS

NEXRAD on AWS

Update

CMAS Data Warehouse

Sentinel-1

Oxford Nanopore Technologies Benchmark Datasets

Data from: The Multilingual Amazon Reviews Corpus

Global Database of Events, Language and Tone (GDELT)

OpenAQ

Allen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology Data...

Global Biodiversity Information Facility (GBIF) Species Occurrences

NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 &...

Amazon Bin Image Dataset

NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and...

World Bank - Light Every Night

Open Targets - Data Lakehouse Ready

COVID-19 Data Lake

Registry of Open Data on AWS