Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contain informative data related to COVID-19 pandemic. Specially, figure out about the First Case and First Death information for every single country. First Case information consist of Date of First Case(s), Number of confirm Case(s) at First Day, Age of the patient(s) of First Case, Last Visited Country and the First Death information consist of Date of First Death and Age of the Patient who died first for every Country mentioning corresponding Continent. The datasets also contain the Binary Matrix of spread chain among different country and region.
This dataset illustrates the fluid dynamics of human coughing and breathing by using schlieren imaging. This dataset was used to help inform the general public about the importance of face coverings during the COVID-19 global pandemic.
This dataset contains the spatiotemporal data used to train the spatiotemporal deep neural networks described in "Modeling the Spread of a Livestock Disease With Semi-Supervised Spatiotemporal Deep Neural Networks". The dataset consists of two sets of NumPy arrays. The first set: X_grid.npy and Y_grid.npy were used to train the convolutional LSTM, while the second set: X_graph.npy, Y_graph.npy, and edge_index.npy were used to train the graph convolutional LSTM. The data consists of spatiotemporally varying environmental and anthropogenic variables along with case reports of vesicular stomatitis. Resources in this dataset:Resource Title: NumPy Arrays of Spatiotemporal Features and VS Cases. File Name: vs_data.zipResource Description: This is a ZIP archive containing five NumPy arrays of spatiotemporal features and geotagged VS cases.Resource Software Recommended: NumPy,url: https://numpy.org/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political science. If these data would otherwise be normally distributed, their spread can be conveniently represented by a multivariate normal distribution truncated to the non-negative space under a unit simplex. Here this distribution is called the simplex-truncated multivariate normal distribution. For calculations on truncated distributions, it is often useful to obtain rapid estimates of their integral, mean and covariance; these quantities characterising the truncated distribution will generally possess different values to the corresponding non-truncated distribution.
In the paper Adams, Matthew (2022) Integral, mean and covariance of the simplex-truncated multivariate normal distribution. PLoS One, 17(7), Article number: e0272014. https://eprints.qut.edu.au/233964/, three different approaches that can estimate the integral, mean and covariance of any simplex-truncated multivariate normal distribution are described and compared. These three approaches are (1) naive rejection sampling, (2) a method described by Gessner et al. that unifies subset simulation and the Holmes-Diaconis-Ross algorithm with an analytical version of elliptical slice sampling, and (3) a semi-analytical method that expresses the integral, mean and covariance in terms of integrals of hyperrectangularly-truncated multivariate normal distributions, the latter of which are readily computed in modern mathematical and statistical packages. Strong agreement is demonstrated between all three approaches, but the most computationally efficient approach depends strongly both on implementation details and the dimension of the simplex-truncated multivariate normal distribution.
This dataset consists of all code and results for the associated article.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Concept: Difference between average cost of outstanding loans (ICC) and its average funding cost. Comprises both earmarked and nonearmarked operations. Source: Central Bank of Brazil – Statistics Department 27449-spread-of-the-icc---earmarked 27449-spread-of-the-icc---earmarked
This dataset contains ERA5 surface level analysis parameter data from 10 ensemble runs. ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble members were used to derive means and spread data (see linked datasets). Ensemble means and spreads were calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
This dataset provides information about the number of properties, residents, and average property values for Spread Oak Lane cross streets in Waterbury, CT.
The data included in this publication depict components of wildfire risk specifically for populated areas in the United States. These datasets represent where people live in the United States and the in situ risk from wildfire, i.e., the risk at the location where the adverse effects take place.National wildfire hazard datasets of annual burn probability and fire intensity, generated by the USDA Forest Service, Rocky Mountain Research Station and Pyrologix LLC, form the foundation of the Wildfire Risk to Communities data. Vegetation and wildland fuels data from LANDFIRE 2020 (version 2.2.0) were used as input to two different but related geospatial fire simulation systems. Annual burn probability was produced with the USFS geospatial fire simulator (FSim) at a relatively coarse cell size of 270 meters (m). To bring the burn probability raster data down to a finer resolution more useful for assessing hazard and risk to communities, we upsampled them to the native 30 m resolution of the LANDFIRE fuel and vegetation data. In this upsampling process, we also spread values of modeled burn probability into developed areas represented in LANDFIRE fuels data as non-burnable. Burn probability rasters represent landscape conditions as of the end of 2020. Fire intensity characteristics were modeled at 30 m resolution using a process that performs a comprehensive set of FlamMap runs spanning the full range of weather-related characteristics that occur during a fire season and then integrates those runs into a variety of results based on the likelihood of those weather types occurring. Before the fire intensity modeling, the LANDFIRE 2020 data were updated to reflect fuels disturbances occurring in 2021 and 2022. As such, the fire intensity datasets represent landscape conditions as of the end of 2022. The data products in this publication that represent where people live, reflect 2021 estimates of housing unit and population counts from the U.S. Census Bureau, combined with building footprint data from Onegeo and USA Structures, both reflecting 2022 conditions.The specific raster datasets included in this publication include:Building Count: Building Count is a 30-m raster representing the count of buildings in the building footprint dataset located within each 30-m pixel.Building Density: Building Density is a 30-m raster representing the density of buildings in the building footprint dataset (buildings per square kilometer [km²]).Building Coverage: Building Coverage is a 30-m raster depicting the percentage of habitable land area covered by building footprints.Population Count (PopCount): PopCount is a 30-m raster with pixel values representing residential population count (persons) in each pixel.Population Density (PopDen): PopDen is a 30-m raster of residential population density (people/km²).Housing Unit Count (HUCount): HUCount is a 30-m raster representing the number of housing units in each pixel.Housing Unit Density (HUDen): HUDen is a 30-m raster of housing-unit density (housing units/km²).Housing Unit Exposure (HUExposure): HUExposure is a 30-m raster that represents the expected number of housing units within a pixel potentially exposed to wildfire in a year. This is a long-term annual average and not intended to represent the actual number of housing units exposed in any specific year.Housing Unit Impact (HUImpact): HUImpact is a 30-m raster that represents the relative potential impact of fire to housing units at any pixel, if a fire were to occur. It is an index that incorporates the general consequences of fire on a home as a function of fire intensity and uses flame length probabilities from wildfire modeling to capture likely intensity of fire.Housing Unit Risk (HURisk): HURisk is a 30-m raster that integrates all four primary elements of wildfire risk - likelihood, intensity, susceptibility, and exposure - on pixels where housing unit density is greater than zero.Additional methodology documentation is provided with the data publication download. Metadata and Downloads.Note: Pixel values in this image service have been altered from the original raster dataset due to data requirements in web services. The service is intended primarily for data visualization. Relative values and spatial patterns have been largely preserved in the service, but users are encouraged to download the source data for quantitative analysis.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Concept: -- to be defined -- Source: Central Bank of Brazil - Statistics Department 27697-spread-of-the-icc---non-revolving-operations---households 27697-spread-of-the-icc---non-revolving-operations---households
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Novel Coronavirus (COVID-19) daily data of confirmed cases for affected countries and provinces of China reported between 31st December 2019 and 31st May 2020. The data was collected from the European Centre for Disease Prevention and Control (ECDC), and John Hopkin CSSA.
The monthly mean temperature of February to May 2020 of capital cities for the various nations.
This dataset contains ensemble spreads for the ERA5 surface level analysis parameter data ensemble means (see linked dataset). ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble means and spreads are calculated from the ERA5 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
This dataset contains ensemble spreads for the ERA5 initial release (ERA5t) surface level analysis parameter data ensemble means (see linked dataset). ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. The ensemble means and spreads are calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed and, if required, amended before the full ERA5 release. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record.
This dataset contains ERA5 initial release (ERA5t) surface level analysis parameter data from 10 member ensemble runs. ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. Ensemble means and spreads were calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. See linked datasets for ensemble member and spread data. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble mean and ensemble spread data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed and, if required, amended before the full ERA5 release. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Species distribution models (SDMs) are a tool for predicting the eventual geographical range of an emerging pathogen. Most SDMs, however, rely on an assumption of equilibrium with the environment, which an emerging pathogen, by definition, has not reached. To determine if some SDM approaches work better than others for modelling the spread of emerging, non-equilibrium pathogens, we studied time-sensitive predictive performance of SDMs for Batrachochytrium dendrobatidis, a devastating infectious fungus of amphibians, using multiple methods trained on time-incremented subsets of the available data. We split our data into timeline-based training and testing sets, and evaluated models on each set using standard performance criteria, including AUC, kappa, false negative rate and the Boyce index. Of eight models examined, we found that boosted regression trees and random forests performed best, closely followed by MaxEnt. As expected, predictive performance generally improved with the length of time series used for model training. These results provide information on how quickly the potential extent of an emerging disease may be determined, and identify which modelling frameworks are likely to provide useful information during the early phases of pathogen expansion.
The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.
Dataset Overview:
This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.
2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.
Sourced Directly from Reddit:
All data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.
Key Features:
Use Cases:
Data Quality and Reliability:
The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.
Integration and Usability:
The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.
User-Friendly Structure and Metadata:
The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.
Ideal For:
This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conducting acade...
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Department actively seeks to expand the range of datasets it shares through the Government’s Open Data portal. In this regard, it is the first government department to publish gender-based statistics. This data is drawn from the Department’s HR database system and provides on a headcount basis, a female-male breakdown at grade and grade equivalent level. These statistics will be published annually, using the position at the end of December. For baseline and comparison purposes, and to assist the reader in forming a picture of how the Department’s gender balance has evolved over the last 20+ years, we have also provided these reports as at December 2000, 2010, 2020, 2021, and 2022. The Department is quite unique in terms of the broad range of grades of its staff. At the end of 2022 there were 75 grades spread across three distinct grade streams; General Service, Professional & Technical, Industrial. We have uploaded a table which provides a breakdown of these grades within their respective streams and which, in the case of the Professional & Technical grades, also shows their General Service grade equivalent. By way of illustration of the diversity of the grades in the Department, our headcount at the end of December 2022 was 1,604 of which 905 were General Service staff serving across 18 different grades and representing 56.42% of our workforce. The Professional & Technical headcount was 533. These staff were spread across 44 grades and accounted for 33.23% of our staffing complement at that time. We had 166 Industrial staff at the end of December. These staff represented 10.35% of our workforce and were spread across 13 different grades.
The Cassini Ion and Neutral Mass Spectrometer (INMS) Packet data set contains all telemetry packets as received from the instrument. One standard product data type is defined for each INMS telemetry Packet. In each standard data product. one record is produced for each packet. Each item in the packet is converted from data numbers to dimensional values. The data set contains all science packets for the entire Cassini mission. The data set includes telemetry data from the instrument checkout periods, SOI and the entire Saturn tour.Each standard data product is organized as a spread sheet with one row for each packet. Each column in the spread sheet contains the contents of one item in the telemetry packet, converted from data number to dimensional quantities where appropriate.
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 30th which yielded over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to February 27th, to provide extra longitudinal coverage.
The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (101,400,452 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (20,244,746 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.
More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)
As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Concept: Difference between average cost of outstanding loans (ICC) and its average funding cost. Comprises both earmarked and nonearmarked operations. Source: Central Bank of Brazil – Statistics Department 27445-spread-of-the-icc---individuals 27445-spread-of-the-icc---individuals
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contain informative data related to COVID-19 pandemic. Specially, figure out about the First Case and First Death information for every single country. First Case information consist of Date of First Case(s), Number of confirm Case(s) at First Day, Age of the patient(s) of First Case, Last Visited Country and the First Death information consist of Date of First Death and Age of the Patient who died first for every Country mentioning corresponding Continent. The datasets also contain the Binary Matrix of spread chain among different country and region.