83 datasets found

m
Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First...
data.mendeley.com
Updated Jul 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hasmot Ali (2020). Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First Case And First Death [Dataset]. http://doi.org/10.17632/vw427wzzkk.4
Explore at:
Unique identifier
https://doi.org/10.17632/vw427wzzkk.4
Dataset updated
Jul 20, 2020
Authors
Hasmot Ali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contain informative data related to COVID-19 pandemic. Specially, figure out about the First Case and First Death information for every single country. First Case information consist of Date of First Case(s), Number of confirm Case(s) at First Day, Age of the patient(s) of First Case, Last Visited Country and the First Death information consist of Date of First Death and Age of the Patient who died first for every Country mentioning corresponding Continent. The datasets also contain the Binary Matrix of spread chain among different country and region.
o
A dataset of Covid-related misinformation videos and their spread on social...
explore.openaire.eu
data.niaid.nih.gov
Updated Feb 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksi Knuutila (2021). A dataset of Covid-related misinformation videos and their spread on social media [Dataset]. http://doi.org/10.5281/zenodo.4557827
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4557827
Dataset updated
Feb 23, 2021
Authors
Aleksi Knuutila
Description
This dataset contains metadata about all Covid-related YouTube videos which circulated on public social media, but which YouTube eventually removed because they contained false information. It describes 8,122 videos that were shared between November 2019 and June 2020. The dataset contains unique identifiers for the videos and social media accounts that shared the videos, statistics on social media engagement and metadata such as video titles and view counts where they were recoverable. We publish the data alongside the code used to produce on Github. The dataset has reuse potential for research studying narratives related to the coronavirus, the impact of social media on knowledge about health and the politics of social media platforms.
Z
Data from: WildfireSpreadTS: A dataset of multi-modal time series for...
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerard, Sebastian (2024). WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8006176
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Zhao, Yu
Sullivan, Josephine
Gerard, Sebastian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a multi-temporal, multi-modal remote-sensing dataset for predicting how active wildfires will spread at a resolution of 24 hours. The dataset consists of 13.607 images across 607 fire events in the United States from January 2018 to October 2021. For each fire event, the dataset contains a full time series of daily observations, containing detected active fires and variables related to fuel, topography and weather conditions. Documentation WildfireSpreadTS_Documentation.pdf includes further details about the dataset, following Gebru et al.'s "Datasheets for Datasets" framework. This documentation is similar to the supplementary material of the associated NeurIPS paper, excluding only information about experimental setup and results. For full details, please refer to the associated paper. Code: Getting started Get started working with the dataset at https://github.com/SebastianGer/WildfireSpreadTS. The code includes a PyTorch Dataset and Lightning DataModule to allow for easy access. We recommend converting the GeoTIFF files provided here to HDF5 files (bigger files, but much faster). The necessary code is also available in the repository.

This work is funded by Digital Futures in the project EO-AI4GlobalChange. The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at C3SE partially funded by the Swedish Research Council through grant agreement no. 2022-06725.
n
ECMWF ERA5t: 10 ensemble member surface level analysis parameter data
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Dec 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ECMWF ERA5t: 10 ensemble member surface level analysis parameter data [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=ensemble%20runs
Explore at:
Dataset updated
Dec 8, 2023
Description
This dataset contains ERA5 initial release (ERA5t) surface level analysis parameter data from 10 member ensemble runs. ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. Ensemble means and spreads were calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. See linked datasets for ensemble member and spread data. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble mean and ensemble spread data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed and, if required, amended before the full ERA5 release. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record.
Corona Virus (COVID-19) Dataset (Korea, Busan)
kaggle.com
Updated Mar 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JunHwanHuh (2020). Corona Virus (COVID-19) Dataset (Korea, Busan) [Dataset]. http://doi.org/10.34740/kaggle/dsv/993325
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/993325
Dataset updated
Mar 7, 2020
Dataset provided by
Kaggle
Authors
JunHwanHuh
Area covered
Korea, South Korea, Busan
Description
Context

Started in China, the Wuhan corona-virus is spreading around the world. Also, Korea is currently suffering from a virus that is spreading rapidly. The new coronavirus is difficult to analyze because it spreads through a variety of pathways, including asymptomatic infections.

Also, the infection propagation pattern differs from region to region, which requires specialized analysis for each city. We focused on constructing and analyzing data sets to prevent local infections in Busan City.

The following web app developed for visualizing and analyzing the local infection situation in Busan in a time series. http://corona-busan.co.kr

Content

In Korea, viruses are spreading rapidly due to certain religious activities. To analyze unusual situations, we included religion as a type of infection, and we constructed a confirmatory line over time to quickly identify contaminated areas and minimize additional infections.

Structure Description

Busan_Patient_Path.json In this data include 3-large hierachical category of patient (patient_info, infection_info, route_info)

Patient_info

patient_id :

age

sex

country

religion :

residential_info

city

district

latitude

longtitude

Infection_Info

Infection_Place

name

latitude

longtitude

Infection_type :

Infection_factor :

Isolation_place

name

latitude

longtitude

Route {Array}

name :

latitude

longtitude

start_time :

end_time :

Acknowledgements

Pusan National University(PNU) CSE, Chang-Hong Lee, and Jun-Hwan Huh has made the data sets and develop an analysis framework for the public good.

Source of data: Busan City, KCDC (Korea Centers for Disease Control & Prevention),

Thanks to the Korean Government and Busan city for making data available.

Inspiration

The virus spread caused by religion activity

Change in virus spread cases over at city level

Relationships between patients by infection type
Data from: Epilepsy-iEEG-Multicenter-Dataset
openneuro.org
Updated Dec 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Li; Sara Inati; Kareem Zaghloul; Nathan Crone; William Anderson; Emily Johnson; Iahn Cajigas; Damian Brusko; Jonathan Jagid; Angel Claudio; Andres Kanner; Jennifer Hopp; Stephanie Chen; Jennifer Haagensen; Sridevi Sarma (2020). Epilepsy-iEEG-Multicenter-Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds003029.v1.0.2
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds003029.v1.0.2
Dataset updated
Dec 2, 2020
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Adam Li; Sara Inati; Kareem Zaghloul; Nathan Crone; William Anderson; Emily Johnson; Iahn Cajigas; Damian Brusko; Jonathan Jagid; Angel Claudio; Andres Kanner; Jennifer Hopp; Stephanie Chen; Jennifer Haagensen; Sridevi Sarma
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Fragility Multi-Center Retrospective Study

iEEG and EEG data from 5 centers is organized in our study with a total of 100 subjects. We publish 4 centers' dataset here due to data sharing issues.

Acquisitions include ECoG and SEEG. Each run specifies a different snapshot of EEG data from that specific subject's session. For seizure sessions, this means that each run is a EEG snapshot around a different seizure event.

For additional clinical metadata about each subject, refer to the clinical Excel table in the publication.

Data Availability

NIH, JHH, UMMC, and UMF agreed to share. Cleveland Clinic did not, so requires an additional DUA.

All data, except for Cleveland Clinic was approved by their centers to be de-identified and shared. All data in this dataset have no PHI, or other identifiers associated with patient. In order to access Cleveland Clinic data, please forward all requests to Amber Sours, SOURSA@ccf.org:

Amber Sours, MPH Research Supervisor | Epilepsy Center Cleveland Clinic | 9500 Euclid Ave. S3-399 | Cleveland, OH 44195 (216) 444-8638

You will need to sign a data use agreement (DUA).

Sourcedata

For each subject, there was a raw EDF file, which was converted into the BrainVision format with mne_bids. Each subject with SEEG implantation, also has an Excel table, called electrode_layout.xlsx, which outlines where the clinicians marked each electrode anatomically. Note that there is no rigorous atlas applied, so the main points of interest are: WM, GM, VENTRICLE, CSF, and OUT, which represent white-matter, gray-matter, ventricle, cerebrospinal fluid and outside the brain. WM, Ventricle, CSF and OUT were removed channels from further analysis. These were labeled in the corresponding BIDS channels.tsv sidecar file as status=bad. The dataset uploaded to openneuro.org does not contain the sourcedata since there was an extra anonymization step that occurred when fully converting to BIDS.

Derivatives

Derivatives include: * fragility analysis * frequency analysis * graph metrics analysis * figures

These can be computed by following the following paper: Neural Fragility as an EEG Marker for the Seizure Onset Zone

Events and Descriptions

Within each EDF file, there contain event markers that are annotated by clinicians, which may inform you of specific clinical events that are occuring in time, or of when they saw seizures onset and offset (clinical and electrographic).

During a seizure event, specifically event markers may follow this time course:

* eeg onset, or clinical onset - the onset of a seizure that is either marked electrographically, or by clinical behavior. Note that the clinical onset may not always be present, since some seizures manifest without clinical behavioral changes. * Marker/Mark On - these are usually annotations within some cases, where a health practitioner injects a chemical marker for use in ICTAL SPECT imaging after a seizure occurs. This is commonly done to see which portions of the brain are active metabolically. * Marker/Mark Off - This is when the ICTAL SPECT stops imaging. * eeg offset, or clinical offset - this is the offset of the seizure, as determined either electrographically, or by clinical symptoms.

Other events included may be beneficial for you to understand the time-course of each seizure. Note that ICTAL SPECT occurs in all Cleveland Clinic data. Note that seizure markers are not consistent in their description naming, so one might encode some specific regular-expression rules to consistently capture seizure onset/offset markers across all dataset. In the case of UMMC data, all onset and offset markers were provided by the clinicians on an Excel sheet instead of via the EDF file. So we went in and added the annotations manually to each EDF file.

Seizure Electrographic and Clinical Onset Annotations

For various datasets, there are seizures present within the dataset. Generally there is only one seizure per EDF file. When seizures are present, they are marked electrographically (and clinically if present) via standard approaches in the epilepsy clinical workflow.

Clinical onset are just manifestation of the seizures with clinical syndromes. Sometimes the maker may not be present.

Seizure Onset Zone Annotations

What is actually important in the evaluation of datasets is the clinical annotations of their localization hypotheses of the seizure onset zone.

These generally include:

* early onset: the earliest onset electrodes participating in the seizure that clinicians saw * early/late spread (optional): the electrodes that showed epileptic spread activity after seizure onset. Not all seizures has spread contacts annotated.

Surgical Zone (Resection or Ablation) Annotations

For patients with the post-surgical MRI available, then the segmentation process outlined above tells us which electrodes were within the surgical removed brain region.

Otherwise, clinicians give us their best estimate, of which electrodes were resected/ablated based on their surgical notes.

For surgical patients whose postoperative medical records did not explicitly indicate specific resected or ablated contacts, manual visual inspection was performed to determine the approximate contacts that were located in later resected/ablated tissue. Postoperative T1 MRI scans were compared against post-SEEG implantation CT scans or CURRY coregistrations of preoperative MRI/post SEEG CT scans. Contacts of interest in and around the area of the reported resection were selected individually and the corresponding slice was navigated to on the CT scan or CURRY coregistration. After identifying landmarks of that slice (e.g. skull shape, skull features, shape of prominent brain structures like the ventricles, central sulcus, superior temporal gyrus, etc.), the location of a given contact in relation to these landmarks, and the location of the slice along the axial plane, the corresponding slice in the postoperative MRI scan was navigated to. The resected tissue within the slice was then visually inspected and compared against the distinct landmarks identified in the CT scans, if brain tissue was not present in the corresponding location of the contact, then the contact was marked as resected/ablated. This process was repeated for each contact of interest.

References

Adam Li, Chester Huynh, Zachary Fitzgerald, Iahn Cajigas, Damian Brusko, Jonathan Jagid, Angel Claudio, Andres Kanner, Jennifer Hopp, Stephanie Chen, Jennifer Haagensen, Emily Johnson, William Anderson, Nathan Crone, Sara Inati, Kareem Zaghloul, Juan Bulacio, Jorge Gonzalez-Martinez, Sridevi V. Sarma. Neural Fragility as an EEG Marker of the Seizure Onset Zone. bioRxiv 862797; doi: https://doi.org/10.1101/862797

Appelhoff, S., Sanderson, M., Brooks, T., Vliet, M., Quentin, R., Holdgraf, C., Chaumon, M., Mikulan, E., Tavabi, K., Höchenberger, R., Welke, D., Brunner, C., Rockhill, A., Larson, E., Gramfort, A. and Jas, M. (2019). MNE-BIDS: Organizing electrophysiological data into the BIDS format and facilitating their analysis. Journal of Open Source Software 4: (1896). https://doi.org/10.21105/joss.01896

Holdgraf, C., Appelhoff, S., Bickel, S., Bouchard, K., D'Ambrosio, S., David, O., … Hermes, D. (2019). iEEG-BIDS, extending the Brain Imaging Data Structure specification to human intracranial electrophysiology. Scientific Data, 6, 102. https://doi.org/10.1038/s41597-019-0105-7

Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6, 103. https://doi.org/10.1038/s41597-019-0104-8
ECMWF ERA5.1: ensemble spreads of surface level analysis parameter data for...
catalogue.ceda.ac.uk
data-search.nerc.ac.uk
Updated Mar 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (ECMWF) (2021). ECMWF ERA5.1: ensemble spreads of surface level analysis parameter data for 2000-2006 [Dataset]. https://catalogue.ceda.ac.uk/uuid/fba43af08c49445cb9150d524d8a2072
Explore at:
Dataset updated
Mar 12, 2021
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
European Centre for Medium-Range Weather Forecasts (ECMWF)
License
https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
Time period covered
Jan 1, 2000 - Dec 31, 2006
Area covered
Earth
Variables measured
time, latitude, longitude, Skin temperature, Total cloud cover, 2 metre temperature, cloud_area_fraction, Sea ice area fraction, sea_ice_area_fraction, Mean sea level pressure, and 7 more
Description
This dataset contains spreads for the ERA5.1 surface level analysis parameter data ensemble means (see linked dataset) over the period 2000-2006. ERA5.1 is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project re-run for 2000-2006 to improve upon the cold bias in the lower stratosphere seen in ERA5 (see technical memorandum 859 in the linked documentation section for further details). The ensemble means and spreads are calculated from the ERA5.1 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.

Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1).

The main ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data, ERA5t, are also available upto 5 days behind the present. A limited selection of data from these runs are also available via CEDA, whilst full access is available via the Copernicus Data Store.
Pakistan Corona Virus Dataset
kaggle.com
Updated Jun 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeeshan-ul-hassan Usmani (2020). Pakistan Corona Virus Dataset [Dataset]. https://www.kaggle.com/zusmani/pakistan-corona-virus-citywise-data/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Zeeshan-ul-hassan Usmani
Description
Context

Pakistan witnessed its first Corona virus patient on February 26th 2020. It's a bumpy ride since then. The cases are increasing gradually and we haven't seen the worst yet. While, there are few government resources for cumulative updates, there is no place where you can find the city level patients data. It's also not possible to find the running chronological tally of patients as they test positive. We have decided to create our own dataset for all the researchers out there with such details so we can model the infection spread and forecast the situation in coming days. We hope, by doing so, we will be able to inform policy makers on various intervention models, and healthcare professionals to be ready for the influx of new patients. We certainly hope, that this little contribution will go a long way for saving lives in Pakistan

Content

The dataset contains seven columns for date, number of cases, number of deaths, number of people recovered, travel history of those cases, and location of the cases (province and city).

The first version has the data from first case of February 26 2020 to April 19, 2020. We intend to publish weekly updates

Acknowledgements

Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Sana Rasheed, Pakistan Corona Virus Data, Kaggle Dataset Repository, April 19, 2020.”

Inspiration

Some ideas worth exploring:

Can we find the spread factor for the Corona virus in Pakistan?

How long it takes for a positive case to infect another in Pakistan?

How we can use this data to simulate lock down scenarios and find its impact on country's economy? Here is a good
read to get started - http://zeeshanusmani.com/urdu/corona-economic-impact/

How does Pakistan Corona virus spread compare against its neighbors and other developed counties?

What would be the impact of this infection spread on country's economy and people living under poverty? Here are two briefs to get you started

http://zeeshanusmani.com/urdu/corona/ http://zeeshanusmani.com/urdu/corona-what-to-learn/

How do we visualize this dataset to inform policy makers? Here is one example https://zeeshanusmani.com/corona/

Can we predict the number of cases in next 10 days and a month?
Data from: Zika Virus Epidemic
kaggle.com
Updated Jul 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2016). Zika Virus Epidemic [Dataset]. https://www.kaggle.com/cdc/zika-virus-epidemic/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2016
Dataset provided by
Kaggle
Authors
Centers for Disease Control and Prevention
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
An outbreak of the Zika virus, an infection transmitted mostly by the Aedes species mosquito (Ae. aegypti and Ae. albopictus), has been sweeping across the Americas and the Pacific since mid-2015. Although first isolated in 1947 in Uganda, a lack of previous research has challenged the scientific community to quickly understand its devastating effects as the epidemic continues to spread.

All Countries & Territories with Active Zika Virus Transmission http://www.cdc.gov/zika/images/zikamain_071416_880.jpg" width="600">

The data

This dataset shares publicly available data related to the ongoing Zika epidemic. It is being provided as a resource to the scientific community engaged in the public health response. The data provided here is not official and should be considered provisional and non-exhaustive. The data in reports may change over time, reflecting delays in reporting or changes in classifications. And while accurate representation of the reported data is the objective in the machine readable files shared here, that accuracy is not guaranteed. Before using any of these data, it is advisable to review the original reports and sources, which are provided whenever possible along with further information on the CDC Zika epidemic GitHub repo.

The dataset includes the following fields:

report_date - The report date is the date that the report was published. The date should be specified in standard ISO format (YYYY-MM-DD).

location - A location is specified for each observation following the specific names specified in the country place name database. This may be any place with a 'location_type' as listed below, e.g. city, state, country, etc. It should be specified at up to three hierarchical levels in the following format: [country]-[state/province]-[county/municipality/city], always beginning with the country name. If the data is for a particular city, e.g. Salvador, it should be specified: Brazil-Bahia-Salvador.

location_type - A location code is included indicating: city, district, municipality, county, state, province, or country. If there is need for an additional 'location_type', open an Issue to create a new 'location_type'.

data_field - The data field is a short description of what data is represented in the row and is related to a specific definition defined by the report from which it comes.

data_field_code - This code is defined in the country data guide. It includes a two letter country code (ISO-3166 alpha-2, list), followed by a 4-digit number corresponding to a specific report type and data type.

time_period - Optional. If the data pertains to a specific period of time, for example an epidemiological week, that number should be indicated here and the type of time period in the 'time_period_type', otherwise it should be NA.

time_period_type - Required only if 'time_period' is specified. Types will also be specified in the country data guide. Otherwise should be NA.

value - The observation indicated for the specific 'report_date', 'location', 'data_field' and when appropriate, 'time_period'.

unit - The unit of measurement for the 'data_field'. This should conform to the 'data_field' unit options as described in the country-specific data guide.

If you find the data useful, please support data sharing by referencing this dataset and the original data source. If you're interested in contributing to the Zika project from GitHub, you can read more here. The source for the Zika virus structure is available here.
ERA5 hourly data on single levels from 1940 to present
cds.climate.copernicus.eu
arcticdata.io
grib
Updated Jun 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.adbb2d47
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.adbb2d47
Dataset updated
Jun 9, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
Time period covered
Jan 1, 1940 - Jun 3, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".
J
Structural break threshold VARs for predicting US recessions using the...
jda-test.zbw.eu
journaldata.zbw.eu
.prg, csv, txt
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ana Beatriz Galvão; Ana Beatriz Galvão (2022). Structural break threshold VARs for predicting US recessions using the spread (replication data) [Dataset]. https://jda-test.zbw.eu/dataset/structural-break-threshold-vars-for-predicting-us-recessions-using-the-spread
Explore at:
.prg(16549), csv(91492), .prg(25000), csv(3896), txt(1812), .prg(13454), .prg(20301), csv(1015), .prg(2600)Available download formats
Dataset updated
Nov 4, 2022
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Ana Beatriz Galvão; Ana Beatriz Galvão
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper proposes a model to predict recessions that accounts for non-linearity and a structural break when the spread between long- and short-term interest rates is the leading indicator. Estimation and model selection procedures allow us to estimate and identify time-varying non-linearity in a VAR. The structural break threshold VAR (SBTVAR) predicts better the timing of recessions than models with constant threshold or with only a break. Using real-time data, the SBTVAR with spread as leading indicator is able to anticipate correctly the timing of the 2001 recession.
Z
Data from: A Data set for Information Spreading over the News
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Apr 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomaz Erjavec (2021). A Data set for Information Spreading over the News [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3950064
Explore at:
Dataset updated
Apr 11, 2021
Dataset provided by
Abdul Sittar
Dunja Mladenic
Tomaz Erjavec
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.

Introduction:

Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as as

How far does the information in the form of news reach out to the public?

Does the content of news remain the same or changes to a certain extent?

Do the cultural values impact the information especially when the same news will get translated in other languages?

Statistics about datasets:

Domain Event Type Articles Per Language Total Articles

1 Sports FIFA World Cup 983-en, 762-sp, 711-de, 10-sl, 216-pt 2679

2 Natural Disaster Earthquake 941-en, 999-sp, 937-de, 19-sl, 251-pt 3194

3 Climate Changes Global Warming 996-en, 298-sp, 545-de, 8-sl, 97-pt 1945
d
Field datasets- SINDRI
catalog.data.gov
search.dataone.org
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Field datasets- SINDRI [Dataset]. https://catalog.data.gov/dataset/field-datasets-sindri
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
A spread sheet documenting the wetness conditions for various fields in the study site. Additionally, this sheet documents mean spectral responses for Landsat and Worldview 3 bands as well as tillage and wetness indices, and the estimated percentage of residue and raw water content.
d
Wilding Conifer Survey Points 1998-2003 - Dataset - data.govt.nz - discover...
catalogue.data.govt.nz
portal.zero.govt.nz
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilding Conifer Survey Points 1998-2003 - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/wilding-conifer-survey-points-1998-20033
Explore at:
Description
Wilding Conifer Survey Data collected by Environment Canterbury and partners. This data is resurveyed every 10 years. View the report: Exotic wilding conifer spread within defined areas of Canterbury high country.
n
ECMWF ERA5: ensemble spreads of surface level analysis parameter data
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Jul 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ECMWF ERA5: ensemble spreads of surface level analysis parameter data [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?orgName=European%20Centre%20for%20Medium-Range%20Weather%20Forecasts%20(ECMWF)
Explore at:
Dataset updated
Jul 28, 2021
Description
This dataset contains ensemble spreads for the ERA5 surface level analysis parameter data ensemble means (see linked dataset). ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble means and spreads are calculated from the ERA5 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
n
Data from: Genetic diversity and spread dynamics of SARS-CoV-2 variants...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Desire Mtetwa (2024). Genetic diversity and spread dynamics of SARS-CoV-2 variants present in African populations [Dataset]. http://doi.org/10.5061/dryad.1c59zw42d
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.1c59zw42d
Dataset updated
May 31, 2024
Dataset provided by
Chinhoyi University of Technology
Authors
Desire Mtetwa
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Africa
Description
The dynamics of coronavirus disease-19 (COVID-19) have been extensively researched in many settings around the world, but little is known about these patterns in Africa. 7540 complete nucleotide genomes from 51 African nations were obtained and analysed from the National Center for Biotechnology Information (NCBI) and Global Initiative on Sharing Influenza Data (GISAID) databases to examine genetic diversity and spread dynamics of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) lineages circulating in Africa. Utilising a variety of clade and lineage nomenclature schemes, we looked at their diversity, and used maximum parsimony inference methods to recreate their evolutionary divergence and history. According to this study, only 465 of the 2610 Pango lineages found to have existed in the world circulated in Africa after three years of the COVID-19 pandemic outbreak, with five different lineages dominating at various points during the outbreak. We identified South Africa, Kenya, and Nigeria as key sources of viral transmissions between Sub-Saharan African nations. These findings provide insight into the viral strains that are circulating in Africa and their evolutionary patterns. Methods Dataset mining and workflow SARS-CoV-2 genome sequences collected from Africa were obtained from NCBI database and GISAID database on February 26, 2023. 24415 African sequences were retrieved from both databases so as to examine the number of lineages circulating within Africa. The two databases had only 8044 complete genome sequences combined from Africa, and these sequences excluding those with low coverage using NextClade were retrieved to determine spread dynamics. 5908 sequences from 23 African countries were available in the NCBI and 2137 sequences from 41 African countries from GISAID database. The sequences were aligned using the online version of the MAFFT multiple sequence alignment tool, with the Wuhan-Hu-1 (MN 908947.3) as the reference sequence, and sequences with more than 5.0% ambiguous letters were removed. Duplicates were removed using goalign dedup software and only high quality African complete sequences remained (n=7540). Phylogenetic reconstruction Using IQ-TREE multicore software version v1.6.12 and NextClade, phylogeny reconstruction on the dataset was performed numerous times. Lineage classification PANGOLin, a web application was used to classify sequences into their lineages. The objective was to determine the SARS-CoV-2 lineages that are circulating in Africa that are most important from an epidemiological perspective, as well as the lineage dynamics within and across the African continent, due to the fact that this naming system integrates genetic and geographic data concerning SARS-CoV-2 dynamics. Phylogeographic reconstruction VOC, (VOI) and VUM were designated based on the WHO framework as of 20 January 2022. We included one lineage, namely A.23.1 and labelled it as VOI for the purposes of this analysis. This lineage was included because it demonstrated the continued evolution of African lineages into potentially more transmissible variants. VOI, VOC, and VUM that emerged on the African continent were marked. These were A.23.1 (VOI), B.1.351 and B.1.1.529 (VOC), B.1.640, and B.1.525 (VUM). Genome sequences of these five lineages were extracted from NCBI database for phylogeographic reconstruction. A similar approach to that described above (including alignment using online MAFFT) was employed. Phylogeographic reconstruction for all variants circulating in Africa and all VOI, VOC, and VUM was conducted using PASTML.
d
Synthetic seismic data for the point-spread-function deconvolution study
search.dataone.org
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang, Jidong (2023). Synthetic seismic data for the point-spread-function deconvolution study [Dataset]. http://doi.org/10.7910/DVN/GT0TBT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/GT0TBT
Dataset updated
Nov 9, 2023
Dataset provided by
Harvard Dataverse
Authors
Yang, Jidong
Description
This dataset includes the models and synthetic seismic dataset for the point-spread-function deconvolution study.
D
Makerere University Beans Dataset
datasetninja.com
dataverse.harvard.edu
+1more
Updated Jul 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mugalu, Ben-Wycliff; Nakatumba-Nabende, Joyce; Katumba, Andrew (2022). Makerere University Beans Dataset [Dataset]. http://doi.org/10.7910/DVN/TCKVEW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/TCKVEW
Dataset updated
Jul 20, 2022
Dataset provided by
Dataset Ninja
Authors
Mugalu, Ben-Wycliff; Nakatumba-Nabende, Joyce; Katumba, Andrew
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This Makerere University Beans Image Dataset was created to provide an open and accessible, well labeled, sufficiently curated image dataset. This is to enable researchers to build various machine learning experiments to aid innovations that may include; bean crop disease diagnosis and spatial analysis. The dataset consists of bean image crops spread collected across the different regions in Uganda. Data were collected by random sampling from the areas where bean crop farming is practiced; these areas were identified by the experts. A few samples were collected from the identified areas to generate a dataset that represents the overall bean farming in the country.
H
Replication Data for: Data stream dataset of SARS-CoV-2 genome Journal...
dataverse.harvard.edu
search.dataone.org
Updated Jul 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Ulatan; Raquel De Melo Marcelo Augusto Costa o Costa Fernandes (2020). Replication Data for: Data stream dataset of SARS-CoV-2 genome Journal Pre-proof Data stream dataset of SARS-CoV-2 genome [Dataset]. http://doi.org/10.7910/DVN/TVPWB8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/TVPWB8
Dataset updated
Jul 5, 2020
Dataset provided by
Harvard Dataverse
Authors
Jeffrey Ulatan; Raquel De Melo Marcelo Augusto Costa o Costa Fernandes
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
As of May 25, 2020, the novel coronavirus disease (called COVID-19) spread to more than 185 countries/regions with more than 348,000 deaths and more than 5,550,000 confirmed cases. In the bioinformatics area, one of the crucial points is the analysis of the virus nucleotide sequences using approaches such as data stream techniques and algorithms. However, to make feasible this approach, it is necessary to transform the nucleotide sequences string to numerical stream representation. Thus, the dataset provides four kinds of data stream representation (DSR) of SARS-CoV-2 virus nucleotide sequences. The dataset provides the DSR of 1557 instances of SARS-CoV-2 virus, 11540 other instances of other viruses from the Virus-Host DB dataset, and three instances of Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).
J
Analyzing credit risk transmission to the nonfinancial sector in Europe: A...
journaldata.zbw.eu
jda-test.zbw.eu
csv, txt
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Gross; Pierre L. Siklos; Christian Gross; Pierre L. Siklos (2024). Analyzing credit risk transmission to the nonfinancial sector in Europe: A network approach (replication data) [Dataset]. http://doi.org/10.15456/jae.2022327.0712748903
Explore at:
csv(5214190), txt(2050), csv(2977651)Available download formats
Unique identifier
https://doi.org/10.15456/jae.2022327.0712748903
Dataset updated
Feb 20, 2024
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Christian Gross; Pierre L. Siklos; Christian Gross; Pierre L. Siklos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Europe
Description
We use a factor model and elastic net shrinkage to model a high-dimensional network of European credit default swap (CDS) spreads. Our empirical approach allows us to assess the joint transmission of bank and sovereign risk to the nonfinancial corporate sector. Our findings identify a sectoral clustering in the CDS network, where financial institutions are in the center and nonfinancial entities as well as sovereigns are grouped around the financial center. The network has a geographical component reflected in different patterns of real-sector risk transmission across countries. Our framework also provides dynamic estimates of risk transmission, a useful tool for systemic risk monitoring.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hasmot Ali (2020). Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First Case And First Death [Dataset]. http://doi.org/10.17632/vw427wzzkk.4

Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First Case And First Death

Explore at:

Unique identifier

https://doi.org/10.17632/vw427wzzkk.4

Dataset updated

Jul 20, 2020

Authors

Hasmot Ali

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Contain informative data related to COVID-19 pandemic. Specially, figure out about the First Case and First Death information for every single country. First Case information consist of Date of First Case(s), Number of confirm Case(s) at First Day, Age of the patient(s) of First Case, Last Visited Country and the First Death information consist of Date of First Death and Age of the Patient who died first for every Country mentioning corresponding Continent. The datasets also contain the Binary Matrix of spread chain among different country and region.

Clear search

Close search

Google apps

Main menu

Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First...

A dataset of Covid-related misinformation videos and their spread on social...

Data from: WildfireSpreadTS: A dataset of multi-modal time series for...

ECMWF ERA5t: 10 ensemble member surface level analysis parameter data

Corona Virus (COVID-19) Dataset (Korea, Busan)

Context

Content

Structure Description

Acknowledgements

Inspiration

Data from: Epilepsy-iEEG-Multicenter-Dataset

Fragility Multi-Center Retrospective Study

Data Availability

Sourcedata

Derivatives

Events and Descriptions

Seizure Electrographic and Clinical Onset Annotations

Seizure Onset Zone Annotations

Surgical Zone (Resection or Ablation) Annotations

References

ECMWF ERA5.1: ensemble spreads of surface level analysis parameter data for...

Pakistan Corona Virus Dataset

Context

Content

Acknowledgements

Inspiration

Data from: Zika Virus Epidemic

The data

ERA5 hourly data on single levels from 1940 to present

Structural break threshold VARs for predicting US recessions using the...

Data from: A Data set for Information Spreading over the News

Domain Event Type Articles Per Language Total Articles

Field datasets- SINDRI

Wilding Conifer Survey Points 1998-2003 - Dataset - data.govt.nz - discover...

ECMWF ERA5: ensemble spreads of surface level analysis parameter data

Data from: Genetic diversity and spread dynamics of SARS-CoV-2 variants...

Synthetic seismic data for the point-spread-function deconvolution study

Makerere University Beans Dataset

Replication Data for: Data stream dataset of SARS-CoV-2 genome Journal...

Analyzing credit risk transmission to the nonfinancial sector in Europe: A...

Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First Case And First Death