There's a story behind every dataset and here's your opportunity to share yours.
The COVID-19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories:
Social distancing Movement restrictions Public health measures Social and economic measures Lockdowns
Updated last 10/12/2020 The #COVID19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories: - Social distancing - Movement restrictions - Public health measures - Social and economic measures - Lockdowns Each category is broken down into several types of measures.
ID ISO COUNTRY REGION ADMIN_LEVEL_NAME PCODE LOG_TYPE CATEGORY MEASURE_TYPE TARGETED_POP_GROUP COMMENTS NON_COMPLIANCE DATE_IMPLEMENTED SOURCE SOURCE_TYPE LINK ENTRY_DATE ALTERNATIVE SOURCE
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Understanding how epidemics spread in a system is a crucial step to prevent and control outbreaks, with broad implications on the system’s functioning, health, and associated costs. This can be achieved by identifying the elements at higher risk of infection and implementing targeted surveillance and control measures. One important ingredient to consider is the pattern of disease-transmission contacts among the elements, however lack of data or delays in providing updated records may hinder its use, especially for time-varying patterns. Here we explore to what extent it is possible to use past temporal data of a system’s pattern of contacts to predict the risk of infection of its elements during an emerging outbreak, in absence of updated data. We focus on two real-world temporal systems; a livestock displacements trade network among animal holdings, and a network of sexual encounters in high-end prostitution. We define the node’s loyalty as a local measure of its tendency to maintain contacts with the same elements over time, and uncover important non-trivial correlations with the node’s epidemic risk. We show that a risk assessment analysis incorporating this knowledge and based on past structural and temporal pattern properties provides accurate predictions for both systems. Its generalizability is tested by introducing a theoretical model for generating synthetic temporal networks. High accuracy of our predictions is recovered across different settings, while the amount of possible predictions is system-specific. The proposed method can provide crucial information for the setup of targeted intervention strategies.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The guidance identifies core personal and community-based public health measures to mitigate the transmission of coronavirus disease (COVID-19).
The TROPESS Chemical Reanalysis O3 Spread 6-Hourly 3-dimensional Product contains the ozone ensemble spread, a measure of data assimilation analysis uncertainty. The data are part of the Tropospheric Chemical Reanalysis v2 (TCR-2) for the period 2005-2021. TCR-2 uses JPL's Multi-mOdel Multi-cOnstituent Chemical (MOMO-Chem) data assimilation framework that simultaneously optimizes both concentrations and emissions of multiple species from multiple satellite sensors.The data files are written in the netCDF version 4 file format, and each file contains a year of data at 6-hourly resolution, and a spatial resolution of 1.125 x 1.125 degrees at 27 pressure levels between 1000 and 60 hPa. The principal investigator for the TCR-2 data is Miyazaki, Kazuyuki.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistical data on the number of violators of precautionary and preventive measures to limit the spread of the coronavirus in Qatar, categorized by nationality, gender, and type of crime.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this work we present results of all the major global models and normalise the model results by looking at changes over time relative to a common base year value.
We give an analysis of the variability across the models, both before and after normalisation in order to give insights into variance at national and regional level.
A dataset of harmonised results (based on means) and measures of dispersion is presented, providing a baseline dataset for CBCA validation and analysis.
The dataset is intended as a goto dataset for country and regional results of consumption and production based accounts. The normalised mean for each country/region is the principle result that can be used to assess the magnitude and trend in the emission accounts. However, an additional key element of the dataset are the measures of robustness and spread of the results across the source models. These metrics give insight into the amount of trust should be placed in the individual country/region results.
The TROPESS Chemical Reanalysis O3 Spread Monthly 3-dimensional Product contains the ozone ensemble spread, a measure of data assimilation analysis uncertainty. The data are part of the Tropospheric Chemical Reanalysis v2 (TCR-2) for the period 2005-2021. TCR-2 uses JPL's Multi-mOdel Multi-cOnstituent Chemical (MOMO-Chem) data assimilation framework that simultaneously optimizes both concentrations and emissions of multiple species from multiple satellite sensors. The data files are written in the netCDF version 4 file format, and each file contains a year of data at monthly resolution, and a spatial resolution of 1.125 x 1.125 degrees at 27 pressure levels between 1000 and 60 hPa. The principal investigator for the TCR-2 data is Miyazaki, Kazuyuki.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The outbreak of the COVID-19 pandemic has prompted the German government and the 16 German federal states to announce a variety of public health measures in order to suppress the spread of the coronavirus. These non-pharmaceutical measures intended to curb transmission rates by increasing social distancing (i.e., diminishing interpersonal contacts) which restricts a range of individual behaviors. These measures span moderate recommendations such as physical distancing, up to the closures of shops and bans of gatherings and demonstrations. The implementation of these measures are not only a research goal for themselves but have implications for behavioral research conducted in this time (e.g., in form of potential confounder biases). Hence, longitudinal data that represent the measures can be a fruitful data source. The presented data set contains data on 14 governmental measures across the 16 German federal states. In comparison to existing datasets, the data set at hand is a fine-grained daily time series tracking the effective calendar date, introduction, extension, or phase-out of each respective measure. Based on self-regulation theory, measures were coded whether they did not restrict, partially restricted or fully restricted the respective behavioral pattern. The time frame comprises March 08, 2020 until May 15, 2020. The project is an open-source, ongoing project with planned continued updates in regular (approximately monthly) intervals. New variables include restrictions on travel and gastronomy. The variable trvl (travel) comprises the following categories: fully restricted (=2) reflecting a potential general ban to travel within Germany (except for sound reasons like health or business); partially restricted (=1): travels are allowed but may be restricted through prohibition of accommodation or entry ban for certain groups (e.g. people from risk areas); free (=0): no travel and accommodation restrictions in place). The variable gastr (gastronomy) comprises: fully restricted (=2): closure of restaurants or bars; partially restricted (=1): Only take-away or food delivery services are allowed; free (=0): restaurants are allowed to open without restrictions). Further, the variables msk (recommendations to wear a mask) and zoo (restrictions of zoo visits) have been adjusted.:
The TROPESS Chemical Reanalysis CO Spread Monthly 3-dimensional Product contains the carbon monoxide ensemble spread, a measure of data assimilation analysis uncertainty. The data are part of the Tropospheric Chemical Reanalysis v2 (TCR-2) for the period 2005-2021. TCR-2 uses JPL's Multi-mOdel Multi-cOnstituent Chemical (MOMO-Chem) data assimilation framework that simultaneously optimizes both concentrations and emissions of multiple species from multiple satellite sensors.The data files are written in the netCDF version 4 file format, and each file contains a year of data at monthly resolution, and a spatial resolution of 1.125 x 1.125 degrees at 27 pressure levels between 1000 and 60 hPa. The principal investigator for the TCR-2 data is Miyazaki, Kazuyuki.
The TROPESS Chemical Reanalysis NO2 Spread Monthly 3-dimensional Product contains the nitrogen dioxide ensemble spread, a measure of data assimilation analysis uncertainty. The data are part of the Tropospheric Chemical Reanalysis v2 (TCR-2) for the period 2005-2021. TCR-2 uses JPL's Multi-mOdel Multi-cOnstituent Chemical (MOMO-Chem) data assimilation framework that simultaneously optimizes both concentrations and emissions of multiple species from multiple satellite sensors.The data files are written in the netCDF version 4 file format, and each file contains a year of data at monthly resolution, and a spatial resolution of 1.125 x 1.125 degrees at 27 pressure levels between 1000 and 60 hPa. The principal investigator for the TCR-2 data is Miyazaki, Kazuyuki.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data included in this publication depict the 2024 version of components of wildfire risk for all lands in the United States that: 1) are landscape-wide (i.e., measurable at every pixel across the landscape); and 2) represent in situ risk - risk at the location where the adverse effects take place on the landscape.
National wildfire hazard datasets of annual burn probability and fire intensity, generated by the USDA Forest Service, Rocky Mountain Research Station and Pyrologix LLC, form the foundation of the Wildfire Risk to Communities data. Vegetation and wildland fuels data from LANDFIRE 2020 (version 2.2.0) were used as input to two different but related geospatial fire simulation systems. Annual burn probability was produced with the USFS geospatial fire simulator (FSim) at a relatively coarse cell size of 270 meters (m). To bring the burn probability raster data down to a finer resolution more useful for assessing hazard and risk to communities, we upsampled them to the native 30 m resolution of the LANDFIRE fuel and vegetation data. In this upsampling process, we also spread values of modeled burn probability into developed areas represented in LANDFIRE fuels data as non-burnable. Burn probability rasters represent landscape conditions as of the end of 2020. Fire intensity characteristics were modeled at 30 m resolution using a process that performs a comprehensive set of FlamMap runs spanning the full range of weather-related characteristics that occur during a fire season and then integrates those runs into a variety of results based on the likelihood of those weather types occurring. Before the fire intensity modeling, the LANDFIRE 2020 data were updated to reflect fuels disturbances occurring in 2021 and 2022. As such, the fire intensity datasets represent landscape conditions as of the end of 2022. Additional methodology documentation is provided in a methods document (\Supplements\WRC_V2_Methods_Landscape-wideRisk.pdf) packaged in the data download.
The specific raster datasets in this publication include:
Risk to Potential Structures (RPS): A measure that integrates wildfire likelihood and intensity with generalized consequences to a home on every pixel. For every place on the landscape, it poses the hypothetical question, "What would be the relative risk to a house if one existed here?" This allows comparison of wildfire risk in places where homes already exist to places where new construction may be proposed. This dataset is referred to as Risk to Homes in the Wildfire Risk to Communities web application.
Conditional Risk to Potential Structures (cRPS): The potential consequences of fire to a home at a given location, if a fire occurs there and if a home were located there. Referred to as Wildfire Consequence in the Wildfire Risk to Communities web application.
Exposure Type: Exposure is the spatial coincidence of wildfire likelihood and intensity with communities. This layer delineates where homes are directly exposed to wildfire from adjacent wildland vegetation, indirectly exposed to wildfire from indirect sources such as embers and home-to-home ignition, or not exposed to wildfire due to distance from direct and indirect ignition sources.
Burn Probability (BP): The annual probability of wildfire burning in a specific location. Referred to as Wildfire Likelihood in the Wildfire Risk to Communities web application.
Conditional Flame Length (CFL): The mean flame length for a fire burning in the direction of maximum spread (headfire) at a given location if a fire were to occur; an average measure of wildfire intensity.
Flame Length Exceedance Probability - 4 ft (FLEP4): The conditional probability that flame length at a pixel will exceed 4 feet if a fire occurs; indicates the potential for moderate to high wildfire intensity.
Flame Length Exceedance Probability - 8 ft (FLEP8): the conditional probability that flame length at a pixel will exceed 8 feet if a fire occurs; indicates the potential for high wildfire intensity.
Wildfire Hazard Potential (WHP): An index that quantifies the relative potential for wildfire that may be difficult to manage, used as a measure to help prioritize where fuel treatments may be needed.The geospatial data products described and distributed here are part of the Wildfire Risk to Communities project. This project was directed by Congress in the 2018 Consolidated Appropriations Act (i.e., 2018 Omnibus Act, H.R. 1625, Section 210: Wildfire Hazard Severity Mapping) to help U.S. communities understand components of their relative wildfire risk profile, the nature and effects of wildfire risk, and actions communities can take to mitigate risk. The first edition of these data represented the first time wildfire risk to communities had been mapped nationally with consistent methodology. They provided foundational information for comparing the relative wildfire risk among populated communities in the United States. In this version, the 2nd edition, we use improved modeling and mapping methodology and updated input data to generate the current suite of products.See the Wildfire Risk to Communities website at https://www.wildfirerisk.org for complete project information and an interactive web application for exploring some of the datasets published here. We deliver the data here as zip files by U.S. state (including AK and HI), and for the full extent of the continental U.S.
This data publication is a second edition and represents an update to any previous versions of Wildfire Risk to Communities risk datasets published by the USDA Forest Service. There are two companion data publications that are part of the WRC 2.0 data update: one that includes datasets of wildfire hazard and risk for populated areas of the nation, where housing units are currently present (Jaffe et al. 2024, https://doi.org/10.2737/RDS-2020-0060-2), and one that delineates wildfire risk reduction zones and provides tabular summaries of wildfire hazard and risk raster datasets (Dillon et al. 2024, https://doi.org/10.2737/RDS-2024-0030).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets with novel extended IP flow called NetTiSA flow
Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:
Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286
@article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }
This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.
NetTiSA flow feature vector
The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.
Flow features
The flow features are:
Packets is the number of packets in the direction from the source to the destination IP address.
Packets in reverse order is the number of packets in the direction from the destination to the source IP address.
Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.
Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.
Statistical and Time-based features
The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:
Mean represents mean of the payload lengths of packets
Min is the minimal value from payload lengths of all packets in a flow
Max is the maximum value from payload lengths of all packets in a flow
Standard deviation is a measure of the variation of payload lengths from the mean payload length
Root mean square is the measure of the magnitude of payload lengths of packets
Average dispersion is the average absolute difference between each payload length of the packet and the mean value
Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution
Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )
Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)
Min from time differences is the minimal value from all time differences, i.e., min space between packets.
Max from time differences is the maximum value from all time differences, i.e., max space between packets.
Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })
Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})
where \(s_n\) is number of switches.
Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:
Max minus min is the difference between minimum and maximum payload lengths
Percent deviation is the dispersion of the average absolute difference to the mean value
Variance is the spread measure of the data from its mean
Burstiness is the degree of peakedness in the central part of the distribution
Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement
Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.
Duration is the duration of the flow
The NetTiSA flow is implemented into IP flow exporter ipfixprobe.
Description of dataset files
In the following table is a description of each dataset file:
File name
Detection problem
Citation of the original raw dataset
botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The COVID-19 India Containment Zone Classification dataset categorizes Indian districts into Red, Orange, and Green Zones based on COVID-19 case metrics as of May 4. This classification aids in understanding the spread and control of COVID-19 across different regions.
2) Data Utilization (1) COVID-19 India Containment Zone data has characteristics that: • It includes detailed district-level information on the zone classification (Red, Orange, Green) based on COVID-19 metrics. This information is crucial for analyzing the spread of the virus, the effectiveness of containment measures, and for planning public health strategies. (2) COVID-19 India Containment Zone data can be used to: • Public Health Management: Assists in resource allocation, planning containment measures, and implementing targeted lockdowns based on zone classification. • Research and Analysis: Supports epidemiological studies, modeling the spread of the virus, and assessing the impact of containment measures in different zones.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The infections caused by various bacterial pathogens both in clinical and community settings represent a significant threat to public healthcare worldwide. The growing resistance to antimicrobial drugs acquired by bacterial species causing healthcare-associated infections has already become a life-threatening danger noticed by the World Health Organization. Several groups or lineages of bacterial isolates usually called 'the clones of high risk' often drive the spread of resistance within particular species. Thus, it is vitally important to reveal and track the spread of such clones and the mechanisms by which they acquire antibiotic resistance and enhance their survival skills. Currently, the analysis of whole genome sequences for bacterial isolates of interest is increasingly used for these purposes, including epidemiological surveillance and developing of spread prevention measures. However, the availability and uniformity of the data derived from the genomic sequences often represents a bottleneck for such investigations. In this dataset, we present the results of a comprehensive genomic epidemiology analysis of 17,546 genomes of a dangerous bacterial pathogen Acinetobacter baumannii. Important typing information including multilocus sequence typing (MLST)-based sequence types (STs), intrinsic blaOXA-51-like gene variants, capsular (KL) and oligosaccharide (OCL) types, CRISPR-Cas systems, and cgMLST profiles are presented, as well as the assignment of particular isolates to nine known international clones of high risk. The presence of antimicrobial resistance genes within the genomes is also reported. These data will be useful for researchers in the field of A. baumannii genomic epidemiology, resistance analysis and prevention measure development.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the stay-at-home index, the number of new infections, and the government’s measures against the spread of COVID-19 for the 47 prefectures of Japan, which were used in Watanabe and Yabu (2020).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Social media is a vast pool of content, and among all the content available for users to access, news is an element that is accessed most frequently. These news can be posted by politicians, news channels, newspaper websites, or even common civilians. These posts have to be checked for their authenticity, since spreading misinformation has been a real concern in today’s times, and many firms are taking steps to make the common people aware of the consequences of spread misinformation. The measure of authenticity of the news posted online cannot be definitively measured, since the manual classification of news is tedious and time-consuming, and is also subject to bias. Published paper: http://www.ijirset.com/upload/2020/june/115_4_Source.PDF
Data preprocessing has been done on the dataset Getting Real about Fake News and skew has been eliminated.
In an era where fake WhatsApp forwards and Tweets are capable of influencing naive minds, tools and knowledge have to be put to practical use in not only mitigating the spread of misinformation but also to inform people about the type of news they consume. Development of practical applications for users to gain insight from the articles they consume, fact-checking websites, built-in plugins and article parsers can further be refined, made easier to access, and more importantly, should create more awareness.
Getting Real about Fake News seemed the most promising for preprocessing, feature extraction, and model classification. The reason is due to the fact that all the other datasets lacked the sources from where the article/statement text was produced and published from. Citing the sources for article text is crucial to check the trustworthiness of the news and further helps in labelling the data as fake or untrustworthy.
Thanks to the dataset’s comprehensiveness in terms of citing the source information of the text along with author names, date of publication and labels.
The data included in this publication depict the 2024 version of components of wildfire risk for all lands in the United States that: 1) are landscape-wide (i.e., measurable at every pixel across the landscape); and 2) represent in situ risk - risk at the location where the adverse effects take place on the landscape.National wildfire hazard datasets of annual burn probability and fire intensity, generated by the USDA Forest Service, Rocky Mountain Research Station and Pyrologix LLC, form the foundation of the Wildfire Risk to Communities data. Vegetation and wildland fuels data from LANDFIRE 2020 (version 2.2.0) were used as input to two different but related geospatial fire simulation systems. Annual burn probability was produced with the USFS geospatial fire simulator (FSim) at a relatively coarse cell size of 270 meters (m). To bring the burn probability raster data down to a finer resolution more useful for assessing hazard and risk to communities, we upsampled them to the native 30 m resolution of the LANDFIRE fuel and vegetation data. In this upsampling process, we also spread values of modeled burn probability into developed areas represented in LANDFIRE fuels data as non-burnable. Burn probability rasters represent landscape conditions as of the end of 2020. Fire intensity characteristics were modeled at 30 m resolution using a process that performs a comprehensive set of FlamMap runs spanning the full range of weather-related characteristics that occur during a fire season and then integrates those runs into a variety of results based on the likelihood of those weather types occurring. Before the fire intensity modeling, the LANDFIRE 2020 data were updated to reflect fuels disturbances occurring in 2021 and 2022. As such, the fire intensity datasets represent landscape conditions as of the end of 2022. Additional methodology documentation is provided in a methods document (\Supplements\WRC_V2_Methods_Landscape-wideRisk.pdf) packaged in the data download.The specific raster datasets in this publication include:Risk to Potential Structures (RPS): A measure that integrates wildfire likelihood and intensity with generalized consequences to a home on every pixel. For every place on the landscape, it poses the hypothetical question, "What would be the relative risk to a house if one existed here?" This allows comparison of wildfire risk in places where homes already exist to places where new construction may be proposed. This dataset is referred to as Risk to Homes in the Wildfire Risk to Communities web application.Conditional Risk to Potential Structures (cRPS): The potential consequences of fire to a home at a given location, if a fire occurs there and if a home were located there. Referred to as Wildfire Consequence in the Wildfire Risk to Communities web application.Exposure Type: Exposure is the spatial coincidence of wildfire likelihood and intensity with communities. This layer delineates where homes are directly exposed to wildfire from adjacent wildland vegetation, indirectly exposed to wildfire from indirect sources such as embers and home-to-home ignition, or not exposed to wildfire due to distance from direct and indirect ignition sources.Burn Probability (BP): The annual probability of wildfire burning in a specific location. Referred to as Wildfire Likelihood in the Wildfire Risk to Communities web application.Conditional Flame Length (CFL): The mean flame length for a fire burning in the direction of maximum spread (headfire) at a given location if a fire were to occur; an average measure of wildfire intensity.Flame Length Exceedance Probability - 4 ft (FLEP4): The conditional probability that flame length at a pixel will exceed 4 feet if a fire occurs; indicates the potential for moderate to high wildfire intensity.Flame Length Exceedance Probability - 8 ft (FLEP8): the conditional probability that flame length at a pixel will exceed 8 feet if a fire occurs; indicates the potential for high wildfire intensity.Wildfire Hazard Potential (WHP): An index that quantifies the relative potential for wildfire that may be difficult to manage, used as a measure to help prioritize where fuel treatments may be needed.Additional methodology documentation is provided with the data publication download. Metadata and Downloads.Note: Pixel values in this image service have been altered from the original raster dataset due to data requirements in web services. The service is intended primarily for data visualization. Relative values and spatial patterns have been largely preserved in the service, but users are encouraged to download the source data for quantitative analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Fire Weather Index (FWI) is a numeric rating of fire intensity, dependent on weather conditions. This is a good indicator of fire danger because it contains both a component of fuel availability (drought conditions) and a measure of ease of spread.
This is part of a larger dataset providing gridded field calculations from the Canadian Fire Weather Index System using weather forcings from the European Centre for Medium-range Weather Forecasts (ECMWF) ERA5 reanalysis dataset (Hersbach et al., 2019), and replaces the homonymous indices based on ERA-Interim (Vitolo et al., 2019). The dataset has been developed through a collaboration between the Joint Research Centre and ECMWF under the umbrella of the Global Wildfires Information System (GWIS), a joint initiative of the GEO and the Copernicus Work Programs.
The dataset consists of seven indices, each of which describes a different aspect of the effect that fuel moisture and wind have on fire ignition probability and its behavior, if started. The indices are called: Fine Fuel Moisture Code (FFMC), Duff Moisture Code (DMC), Drought Code (DC), Initial Spread Index (ISI), Build Up Index (BUI), Fire Weather Index (FWI) and Daily Severity Rating (DSR). For convenience, each index is archived separately on Zenodo.
Data are generated using the open source software GEFF v3.0 (https://git.ecmwf.int/projects/CEMSF/repos/geff), which now uses settings and parameters provided by the JRC (more info here https://git.ecmwf.int/projects/CEMSF/repos/geff/browse/NEWS.md). The caliver R package (Vitolo et al. 2017, 2018) contains useful functions to process this dataset.
Details:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Japanese beetle Popillia japonica was introduced on Terceira Island (Azores) early in the 1970s. Mild temperatures, high relative humidity, and heavy rain created the perfect conditions for the beetle's establishment and rapid spread. Despite initial control efforts, the beetle quickly spread to the island's interior agricultural regions and threatened the local plants and horticultural lands. Since 1974, adult populations have been monitored in Terceira Island using pheromone and floral lure traps distributed across the island. The data revealed a distribution pattern across three circular zones with decreasing population densities and a movement of the infestation's central core to the island's interior to more conducive zones for the beetle's development. In 1989, 16 years after the first insects were discovered on the island, the pest had taken over all the available space. A contingency plan was drawn up to establish protective measures to prevent the spread of the Popillia japonica to Madeira and Portugal mainland in 1985 (Decreto Legislativo Regional 11/85/A, de 23 de Agosto). Later, it was actualized to comply with legislation of the European Union (EU), paying particular attention to categorizing this insect as a priority pest. Although these preventive measures were applied, the pest spread to other islands over the years; currently, eight of the nine islands of the Archipelago are infested. Although preventive measures have been applied, the pest has spread to other islands over the years, and currently, eight of the nine islands of the Archipelago are infested. In 1996, the Japanese beetle was detected in Faial; in 2003, on the island of São Miguel; in 2006, in the island of Pico; in 2007, on Flores and São Jorge islands; in 2013, in Corvo; and 2017, in Graciosa. Only Santa Maria has not recorded the pest's presence. The Japanese beetle completes its life cycle in a year, with individuals starting to emerge from the ground at the end of May and reaching their peak densities in early August. The last beetles were seen as late as the end of October. The first and second larval instars typically have a brief lifespan, and by early October, most of the population has reached the third instar. The third instar grubs stop feeding and pupate at the beginning of May. The pupal stage lasts less than a month, and no pupae were seen after late July. Adults eat the foliage, floral parts, and occasionally, the fruits of various agricultural plants and ornamentals. At the same time, the grubs live off the roots of the pastures that make up most of the island. It is important to clarify that the adult beetle pest can damage around 414 host plants belonging to 94 families, which may cause elevated crop damage, which makes this a priority pest to maintain under control. The data presented here is related to the Popillia japonica captured in the Azores from 2008 to 2023, which resulted from the work of the operational services on each island of the Secretaria Regional da Agricultura e Alimentação. It is a compilation of the officials’ records from the local authorities who contributed to this data from their fieldwork monitoring of Popillia japonica during these 16 years
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The infections caused by various bacterial pathogens both in clinical and community settings represent a significant threat to public healthcare worldwide. The growing resistance to antimicrobial drugs acquired by bacterial species causing healthcare-associated infections has already become a life-threatening danger noticed by the World Health Organization. Several groups or lineages of bacterial isolates usually called 'the clones of high risk' often drive the spread of resistance within particular species.
Thus, it is vitally important to reveal and track the spread of such clones and the mechanisms by which they acquire antibiotic resistance and enhance their survival skills. Currently, the analysis of whole genome sequences for bacterial isolates of interest is increasingly used for these purposes, including epidemiological surveillance and developing of spread prevention measures. However, the availability and uniformity of the data derived from the genomic sequences often represents a bottleneck for such investigations.
In this dataset, we present the results of a genomic epidemiology analysis of 61,857 genomes of a dangerous bacterial pathogen Klebsiella pneumoniae obtained from NCBI Genbank database. Important typing information including multilocus sequence typing (MLST)-based sequence types (STs), capsular (KL) and oligosaccharide (OL) types, CRISPR-Cas systems, and cgMLST profiles are presented, as well as the assignment of particular isolates to clonal groups (CG). The presence of antimicrobial resistance and virulence genes within the genomes is also reported.
These data will be useful for researchers in the field of K. pneumoniae genomic epidemiology, resistance analysis and prevention measure development.
There's a story behind every dataset and here's your opportunity to share yours.
The COVID-19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories:
Social distancing Movement restrictions Public health measures Social and economic measures Lockdowns
Updated last 10/12/2020 The #COVID19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories: - Social distancing - Movement restrictions - Public health measures - Social and economic measures - Lockdowns Each category is broken down into several types of measures.
ID ISO COUNTRY REGION ADMIN_LEVEL_NAME PCODE LOG_TYPE CATEGORY MEASURE_TYPE TARGETED_POP_GROUP COMMENTS NON_COMPLIANCE DATE_IMPLEMENTED SOURCE SOURCE_TYPE LINK ENTRY_DATE ALTERNATIVE SOURCE
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?