100+ datasets found
  1. Registry of Open Data on AWS

    • registry.opendata.aws
    Updated Aug 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Services (2021). Registry of Open Data on AWS [Dataset]. https://registry.opendata.aws/registry-open-data/
    Explore at:
    Dataset updated
    Aug 13, 2021
    Dataset provided by
    Amazon Web Serviceshttp://aws.amazon.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Currently, only the ndjson form of the registry is populated here.

  2. SpaceEye-T VVHR EO Open Data

    • registry.opendata.aws
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SI Imaging Services (2025). SpaceEye-T VVHR EO Open Data [Dataset]. https://registry.opendata.aws/st-open-data/
    Explore at:
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    SI Imaging Services Co., Ltd.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SpaceEye-T satellite collects the highest resolution optical imagery among the commercial satellites, 25 cm resolution. The Open Data features various satellite images around the world for end users to experience the power of VVHR optical data.

  3. AWS Public Blockchain Data

    • registry.opendata.aws
    Updated Sep 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Services (2022). AWS Public Blockchain Data [Dataset]. https://registry.opendata.aws/aws-public-blockchain/
    Explore at:
    Dataset updated
    Sep 23, 2022
    Dataset provided by
    Amazon Web Serviceshttp://aws.amazon.com/
    Description

    The AWS Public Blockchain Data initiative provides free access to blockchain datasets through collaboration with data providers. The data is optimized for analytics by being transformed into compressed Parquet files, partitioned by date for efficient querying.

    Datasets

    Blockchain dataset - Maintained by - Path:
    - Bitcoin - AWS - s3://aws-public-blockchain/v1.0/btc/
    - Ethereum - AWS - s3://aws-public-blockchain/v1.0/eth/
    - Arbitrum - SonarX - s3://aws-public-blockchain/v1.1/sonarx/arbitrum/
    - Aptos - SonarX - s3://aws-public-blockchain/v1.1/sonarx/aptos/
    - Base - SonarX - s3://aws-public-blockchain/v1.1/sonarx/base/
    - Provenance - SonarX - s3://aws-public-blockchain/v1.1/sonarx/provenance/
    - XRP Ledger - SonarX - s3://aws-public-blockchain/v1.1/sonarx/xrp/
    - Stellar(XDR files) - Stellar - s3://aws-public-blockchain/v1.1/stellar/
    - The Open Network (TON) - TON - s3://aws-public-blockchain/v1.1/ton/
    - Cronos - Cronos - s3://aws-public-blockchain/v1.1/cronos/

    Become a Data Provider

    We welcome additional blockchain data providers to join this initiative. If you're interested in contributing datasets to the AWS Public Blockchain Data program, please contact our team at aws-public-blockchain@amazon.com.

  4. o

    Data from: Sentinel-2

    • registry.opendata.aws
    Updated Apr 19, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sinergise (2018). Sentinel-2 [Dataset]. https://registry.opendata.aws/sentinel-2/
    Explore at:
    Dataset updated
    Apr 19, 2018
    Dataset provided by
    <a href="https://www.sinergise.com/">Sinergise</a>
    Description

    The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in on-going studies. L1C data are available from June 2015 globally. L2A data are available from November 2016 over Europe region and globally since January 2017.

  5. OpenStreetMap on AWS

    • registry.opendata.aws
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenStreetMap Foundation (OSMF) and Pacific Atlas (2025). OpenStreetMap on AWS [Dataset]. https://registry.opendata.aws/osm/
    Explore at:
    Dataset updated
    May 9, 2025
    Dataset provided by
    OpenStreetMap//www.openstreetmap.org/
    Description

    OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3 in both standard formats (OSM PBF, XML) and cloud-native formats optimized for analytics workloads.

  6. o

    NEXRAD on AWS

    • registry.opendata.aws
    Updated Apr 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2018). NEXRAD on AWS [Dataset]. https://registry.opendata.aws/noaa-nexrad/
    Explore at:
    Dataset updated
    Apr 19, 2018
    Dataset provided by
    <a href="https://www.unidata.ucar.edu/">Unidata</a>
    Description

    Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.

    Update

    The NEXRAD Level II archive data is moving to a new bucket: unidata-nexrad-level2 and SNS topic: arn:aws:sns:us-east-1:684042711724:NewNEXRADLevel2Archive. The old bucket and SNS topic are now deprecated and will no longer be available starting September 1, 2025.

  7. o

    CMAS Data Warehouse

    • registry.opendata.aws
    Updated Dec 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CMAS CENTER (2022). CMAS Data Warehouse [Dataset]. https://registry.opendata.aws/cmas-data-warehouse/
    Explore at:
    Dataset updated
    Dec 31, 2022
    Dataset provided by
    <a href="https://cmascenter.org/">CMAS CENTER</a>
    Description

    CMAS Data Warehouse on AWS collects and disseminates meteorology, emissions and air quality model input and output for Community Multiscale Air Quality (CMAQ) Model Applications. This dataset is available as part of the AWS Open Data Program, therefore egress fees are not charged to either the host or the person downloading the data. This S3 bucket is maintained as a public service by the University of North Carolina's CMAS Center, the US EPA’s Office of Research and Development, and the US EPA’s Office of Air and Radiation. Metadata and DOIs for datasets included in the CMAS Data Warehouse are available from the CMAS Dataverse site: https://dataverse.unc.edu/dataverse/cmascenter

  8. o

    Sentinel-1

    • registry.opendata.aws
    Updated Apr 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sinergise (2018). Sentinel-1 [Dataset]. https://registry.opendata.aws/sentinel-1/
    Explore at:
    Dataset updated
    Apr 20, 2018
    Dataset provided by
    <a href="https://www.sinergise.com/">Sinergise</a>
    Description

    Sentinel-1 is a pair of European radar imaging (SAR) satellites launched in 2014 and 2016. Its 6 days revisit cycle and ability to observe through clouds makes it perfect for sea and land monitoring, emergency response due to environmental disasters, and economic applications. This dataset represents the global Sentinel-1 GRD archive, from beginning to the present, converted to cloud-optimized GeoTIFF format.

  9. Oxford Nanopore Technologies Benchmark Datasets

    • registry.opendata.aws
    Updated Sep 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Nanopore Technologies (2020). Oxford Nanopore Technologies Benchmark Datasets [Dataset]. https://registry.opendata.aws/ont-open-data/
    Explore at:
    Dataset updated
    Sep 29, 2020
    Dataset provided by
    Oxford Nanopore Technologieshttp://nanoporetech.com/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The ont-open-data registry provides reference sequencing data from Oxford Nanopore Technologies to support, 1) Exploration of the characteristics of nanopore sequence data. 2) Assessment and reproduction of performance benchmarks 3) Development of tools and methods. The data deposited showcases DNA sequences from a representative subset of sequencing chemistries. The datasets correspond to publicly-available reference samples (e.g. Genome In A Bottle reference cell lines). Raw data are provided with metadata and scripts to describe sample and data provenance.

  10. o

    Data from: The Multilingual Amazon Reviews Corpus

    • registry.opendata.aws
    Updated May 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon (2020). The Multilingual Amazon Reviews Corpus [Dataset]. https://registry.opendata.aws/amazon-reviews-ml/
    Explore at:
    Dataset updated
    May 28, 2020
    Dataset provided by
    Amazon.comhttp://amazon.com/
    Description

    We present a collection of Amazon reviews specifically designed to aid research in multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. 'books', 'appliances', etc.)

  11. o

    Global Database of Events, Language and Tone (GDELT)

    • registry.opendata.aws
    Updated Apr 19, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unmanaged (2018). Global Database of Events, Language and Tone (GDELT) [Dataset]. https://registry.opendata.aws/gdelt/
    Explore at:
    Dataset updated
    Apr 19, 2018
    Dataset provided by
    Unmanaged
    Description

    This project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, emotions, quotes, images and events driving our global society every second of every day.

  12. o

    OpenAQ

    • registry.opendata.aws
    Updated Apr 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenAQ (2018). OpenAQ [Dataset]. https://registry.opendata.aws/openaq/
    Explore at:
    Dataset updated
    Apr 20, 2018
    Dataset provided by
    <a href="https://openaq.org">OpenAQ</a>
    Description

    Global, aggregated physical air quality data from public data sources provided by government, research-grade and other sources. These awesome groups do the hard work of measuring these data and publicly sharing them, and our community makes them more universally-accessible to both humans and machines.

  13. Allen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology Data...

    • registry.opendata.aws
    Updated Feb 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allen Institute (2023). Allen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology Data [Dataset]. https://registry.opendata.aws/allen-nd-open-data/
    Explore at:
    Dataset updated
    Feb 1, 2023
    Dataset provided by
    Allen Institute
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the raw and derived data we collect publicly with rich metadata, including preliminary data collected during methods development, as near to the time of collection as possible.

  14. Global Biodiversity Information Facility (GBIF) Species Occurrences

    • registry.opendata.aws
    Updated May 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Global Biodiversity Information Facility (GBIF) (2021). Global Biodiversity Information Facility (GBIF) Species Occurrences [Dataset]. https://registry.opendata.aws/gbif/
    Explore at:
    Dataset updated
    May 17, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Description

    The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.

  15. NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 &...

    • registry.opendata.aws
    Updated Apr 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA (2025). NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 & 19 [Dataset]. https://registry.opendata.aws/noaa-goes/
    Explore at:
    Dataset updated
    Apr 4, 2025
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description



    NEW GOES-19 Data!! On April 4, 2025 at 1500 UTC, the GOES-19 satellite will be declared the Operational GOES-East satellite. All products and services, including NODD, for GOES-East will transition to GOES-19 data at that time. GOES-19 will operate out of the GOES-East location of 75.2°W starting on April 1, 2025 and through the operational transition. Until the transition time and during the final stretch of Post Launch Product Testing (PLPT), GOES-19 products are considered non-operational regardless of their validation maturity level. Shortly following the transition of GOES-19 to GOES-East, all data distribution from GOES-16 will be turned off. GOES-16 will drift to the storage location at 104.7°W. GOES-19 data should begin flowing again on April 4th once this maneuver is complete.

    NEW GOES 16 Reprocess Data!! The reprocessed GOES-16 ABI L1b data mitigates systematic data issues (including data gaps and image artifacts) seen in the Operational products, and improves the stability of both the radiometric and geometric calibration over the course of the entire mission life. These data were produced by recomputing the L1b radiance products from input raw L0 data using improved calibration algorithms and look-up tables, derived from data analysis of the NIST-traceable, on-board sources. In addition, the reprocessed data products contain enhancements to the L1b file format, including limb pixels and pixel timestamps, while maintaining compatibility with the operational products. The datasets currently available span the operational life of GOES-16 ABI, from early 2018 through the end of 2024. The Reprocessed L1b dataset shows improvement over the Operational L1b products but may still contain data gaps or discrepancies. Please provide feedback to Dan Lindsey (dan.lindsey@noaa.gov) and Gary Lin (guoqing.lin-1@nasa.gov). More information can be found in the GOES-R ABI Reprocess User Guide.


    NOTICE: As of January 10th 2023, GOES-18 assumed the GOES-West position and all data files are deemed both operational and provisional, so no ‘preliminary, non-operational’ caveat is needed. GOES-17 is now offline, shifted approximately 105 degree West, where it will be in on-orbit storage. GOES-17 data will no longer flow into the GOES-17 bucket. Operational GOES-West products can be found in the GOES-18 bucket.

    GOES satellites (GOES-16, GOES-17, GOES-18 & GOES-19) provide continuous weather imagery and monitoring of meteorological and space environment data across North America. GOES satellites provide the kind of continuous monitoring necessary for intensive data analysis. They hover continuously over one position on the surface. The satellites orbit high enough to allow for a full-disc view of the Earth. Because they stay above a fixed spot on the surface, they provide a constant vigil for the atmospheric "triggers" for severe weather conditions such as tornadoes, flash floods, hailstorms, and hurricanes. When these conditions develop, the GOES satellites are able to monitor storm development and track their movements. SUVI products available in both NetCDF and FITS.

  16. o

    Amazon Bin Image Dataset

    • registry.opendata.aws
    Updated Apr 20, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon (2018). Amazon Bin Image Dataset [Dataset]. https://registry.opendata.aws/amazon-bin-imagery/
    Explore at:
    Dataset updated
    Apr 20, 2018
    Dataset provided by
    Amazon.comhttp://amazon.com/
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    The Amazon Bin Image Dataset contains over 500,000 images and metadata from bins of a pod in an operating Amazon Fulfillment Center. The bin images in this dataset are captured as robot units carry pods as part of normal Amazon Fulfillment Center operations.

  17. o

    NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and...

    • registry.opendata.aws
    Updated Jul 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (NLM) (2021). NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS [Dataset]. https://registry.opendata.aws/ncbi-pmc/
    Explore at:
    Dataset updated
    Jul 4, 2021
    Dataset provided by
    <a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
    Description

    PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on general license type:

    The PMC Open Access (OA) Subset, which includes all articles in PMC with a machine-readable Creative Commons license

    The Author Manuscript Dataset, which includes all articles collected under a funder policy in PMC and made available in machine-readable formats for text mining

    These datasets collectively span more than half of PMC’s total collection of full-text articles. PMC enables access to these datasets to expand the impact of open access and publicly-funded research; enable greater machine learning across the spectrum of scientific research; reach new audiences; and open new doors for discovery. The bucket in this registry contains individual articles in NISO Z39.96-2015 JATS XML format as well as in plain text as extracted from the XML. The bucket is updated daily with new and updated articles. Also included are file lists that include metadata for articles in each dataset.

  18. World Bank - Light Every Night

    • registry.opendata.aws
    Updated Jan 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank Group (2021). World Bank - Light Every Night [Dataset]. https://registry.opendata.aws/wb-light-every-night/
    Explore at:
    Dataset updated
    Jan 21, 2021
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Light Every Night - World Bank Nighttime Light Data – provides open access to all nightly imagery and data from the Visible Infrared Imaging Radiometer Suite Day-Night Band (VIIRS DNB) from 2012-2020 and the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) from 1992-2013. The underlying data are sourced from the NOAA National Centers for Environmental Information (NCEI) archive. Additional processing by the University of Michigan enables access in Cloud Optimized GeoTIFF format (COG) and search using the Spatial Temporal Asset Catalog (STAC) standard. The data is published and openly available under the terms of the World Bank’s open data license.

  19. Open Targets - Data Lakehouse Ready

    • registry.opendata.aws
    Updated Sep 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Services (2020). Open Targets - Data Lakehouse Ready [Dataset]. https://registry.opendata.aws/opentargets/
    Explore at:
    Dataset updated
    Sep 15, 2020
    Dataset provided by
    Amazon Web Serviceshttp://aws.amazon.com/
    Description

    This a Parquet representation of the Open Targets Platform's latest export. The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score and rank target-disease associations for drug target identification. The Open Targets Platform (https://www.targetvalidation.org) is a freely available resource for the integration of genetics, genomics, and chemical data to aid systematic drug target identification and prioritisation. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of the Registry of Open Data S3 bucket. Deploy this dataset's corresponding CloudFormation template to create the AWS Glue catalog entries into your account in about 30 seconds. That one step will enable you to write SQL with AWS Athena, build dashboards and charts with Amazon Quicksight, perform HPC with AWS EMR, or join into your AWS Redshift clusters. More detail in (the documentation)[https://github.com/aws-samples/data-lake-as-code/blob/roda/README.md.

  20. COVID-19 Data Lake

    • registry.opendata.aws
    Updated Apr 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Services (2020). COVID-19 Data Lake [Dataset]. https://registry.opendata.aws/aws-covid19-lake/
    Explore at:
    Dataset updated
    Apr 8, 2020
    Dataset provided by
    Amazon Web Serviceshttp://aws.amazon.com/
    Description

    A centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness, COVID-19. Globally, there are several efforts underway to gather this data, and we are working with partners to make this crucial data freely available and keep it up-to-date. Hosted on the AWS cloud, we have seeded our curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amazon Web Services (2021). Registry of Open Data on AWS [Dataset]. https://registry.opendata.aws/registry-open-data/
Organization logo

Registry of Open Data on AWS

Explore at:
Dataset updated
Aug 13, 2021
Dataset provided by
Amazon Web Serviceshttp://aws.amazon.com/
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Currently, only the ndjson form of the registry is populated here.

Search
Clear search
Close search
Google apps
Main menu