44 datasets found

COVID-19 Measures Dataset (All World)
kaggle.com
Updated Jan 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mesum Raza Hemani (2021). COVID-19 Measures Dataset (All World) [Dataset]. https://www.kaggle.com/mesumraza/covid19-measures-dataset-all-world/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mesum Raza Hemani
Area covered
World
Description
Context

There's a story behind every dataset and here's your opportunity to share yours.

The COVID-19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories:

Social distancing Movement restrictions Public health measures Social and economic measures Lockdowns

Content

Updated last 10/12/2020 The #COVID19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories: - Social distancing - Movement restrictions - Public health measures - Social and economic measures - Lockdowns Each category is broken down into several types of measures.

ID ISO COUNTRY REGION ADMIN_LEVEL_NAME PCODE LOG_TYPE CATEGORY MEASURE_TYPE TARGETED_POP_GROUP COMMENTS NON_COMPLIANCE DATE_IMPLEMENTED SOURCE SOURCE_TYPE LINK ENTRY_DATE ALTERNATIVE SOURCE

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Data from: Variation in trends of consumption based carbon accounts
zenodo.org
bin, csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Wood; Richard Wood; Daniel Moran; Daniel Moran; Konstantin Stadler; Konstantin Stadler; João F. D. Rodrigues; João F. D. Rodrigues (2020). Variation in trends of consumption based carbon accounts [Dataset]. http://doi.org/10.5281/zenodo.3187310
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3187310
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Richard Wood; Richard Wood; Daniel Moran; Daniel Moran; Konstantin Stadler; Konstantin Stadler; João F. D. Rodrigues; João F. D. Rodrigues
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this work we present results of all the major global models and normalise the model results by looking at changes over time relative to a common base year value.
We give an analysis of the variability across the models, both before and after normalisation in order to give insights into variance at national and regional level.
A dataset of harmonised results (based on means) and measures of dispersion is presented, providing a baseline dataset for CBCA validation and analysis.

The dataset is intended as a goto dataset for country and regional results of consumption and production based accounts. The normalised mean for each country/region is the principle result that can be used to assess the magnitude and trend in the emission accounts. However, an additional key element of the dataset are the measures of robustness and spread of the results across the source models. These metrics give insight into the amount of trust should be placed in the individual country/region results.

Code at https://doi.org/10.5281/zenodo.3181930
D
Background data for: Advancing our understanding of dispersion measures in...
dataverse.no
bin, text/tsv, txt
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Sönning; Lukas Sönning (2024). Background data for: Advancing our understanding of dispersion measures in corpus research [Dataset]. http://doi.org/10.18710/FVHTFM
Explore at:
text/tsv(48718), text/tsv(4972), txt(15220), bin(6290), text/tsv(50076558), text/tsv(50076560)Available download formats
Unique identifier
https://doi.org/10.18710/FVHTFM
Dataset updated
Nov 26, 2024
Dataset provided by
DataverseNO
Authors
Lukas Sönning; Lukas Sönning
License
https://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/FVHTFMhttps://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/FVHTFM
Time period covered
Jan 1, 1961 - Dec 31, 1961
Area covered
United States
Description
Dataset description This dataset contains background data and supplementary material for Sönning (forthcoming), a study that looks at the behavior of dispersion measures when applied to text-level frequency data. For the literature survey reported in that study, which examines how dispersion measures are used in corpus-based work, it includes tabular files listing the 730 research articles that were examined as well as annotations for those studies that measured dispersion in the corpus-linguistic (and lexicographic) sense. As for the corpus data that were used to train the statistical model parameters underlying the simulation study reported in that paper, the dataset contains a term-document matrix for the 49,604 unique word forms (after conversion to lower-case) that occur in the Brown Corpus. Further, R scripts are included that document in detail how the Brown Corpus XML files, which are available from the Natural Language Toolkit (Bird et al. 2009; https://www.nltk.org/), were processed to produce this data arrangement. Abstract: Related publication This paper offers a survey of recent corpus-based work, which shows that dispersion is typically measured across the text files in a corpus. Systematic insights into the behavior of measures in such distributional settings are currently lacking, however. After a thorough discussion of six prominent indices, we investigate their behavior on relevant frequency distributions, which are designed to mimic actual corpus data. Our evaluation considers different distributional settings, i.e. various combinations of frequency and dispersion values. The primary focus is on the response of measures to relatively high and low sub-frequencies, i.e. texts in which the item or structure of interest is over- or underrepresented (if not absent). We develop a simple method for constructing sensitivity profiles, which allow us to draw instructive comparisons among measures. We observe that these profiles vary considerably across distributional settings. While D and DP appear to show the most balanced response contours, our findings suggest that much work remains to be done to understand the performance of measures on items with normalized frequencies below 100 per million words.
Predicting Epidemic Risk from Past Temporal Contact Data
plos.figshare.com
zip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eugenio Valdano; Chiara Poletto; Armando Giovannini; Diana Palma; Lara Savini; Vittoria Colizza (2023). Predicting Epidemic Risk from Past Temporal Contact Data [Dataset]. http://doi.org/10.1371/journal.pcbi.1004152
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1004152
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Eugenio Valdano; Chiara Poletto; Armando Giovannini; Diana Palma; Lara Savini; Vittoria Colizza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Understanding how epidemics spread in a system is a crucial step to prevent and control outbreaks, with broad implications on the system’s functioning, health, and associated costs. This can be achieved by identifying the elements at higher risk of infection and implementing targeted surveillance and control measures. One important ingredient to consider is the pattern of disease-transmission contacts among the elements, however lack of data or delays in providing updated records may hinder its use, especially for time-varying patterns. Here we explore to what extent it is possible to use past temporal data of a system’s pattern of contacts to predict the risk of infection of its elements during an emerging outbreak, in absence of updated data. We focus on two real-world temporal systems; a livestock displacements trade network among animal holdings, and a network of sexual encounters in high-end prostitution. We define the node’s loyalty as a local measure of its tendency to maintain contacts with the same elements over time, and uncover important non-trivial correlations with the node’s epidemic risk. We show that a risk assessment analysis incorporating this knowledge and based on past structural and temporal pattern properties provides accurate predictions for both systems. Its generalizability is tested by introducing a theoretical model for generating synthetic temporal networks. High accuracy of our predictions is recovered across different settings, while the amount of possible predictions is system-specific. The proposed method can provide crucial information for the setup of targeted intervention strategies.
u
Community-based measures to mitigate the spread of coronavirus disease...
data.urbandatacentre.ca
beta.data.urbandatacentre.ca
+2more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Community-based measures to mitigate the spread of coronavirus disease (COVID-19) in Canada [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-ca81fcd4-8da8-4816-9a6e-d233e491f71d
Explore at:
Dataset updated
Sep 30, 2024
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
The guidance identifies core personal and community-based public health measures to mitigate the transmission of coronavirus disease (COVID-19).
d
Violators of Precautionary and Preventive Measures to Limit The Spread of...
data.gov.qa
csv, excel, json
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Violators of Precautionary and Preventive Measures to Limit The Spread of Corona Virus According to Nationality, Gender and Crime Type [Dataset]. https://www.data.gov.qa/explore/dataset/violators-of-precautionary-and-preventive-measures-to-limit-the-spread-of-corona-virus-according-to-nationality-gender-and-crime-type/
Explore at:
json, csv, excelAvailable download formats
Dataset updated
May 22, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical data on the number of violators of precautionary and preventive measures to limit the spread of the coronavirus in Qatar, categorized by nationality, gender, and type of crime.
D
Biber et al.'s (2016) set of 150 BNC items for the analysis of dispersion...
dataverse.azure.uit.no
dataverse.no
bin, csv, txt
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Sönning; Lukas Sönning (2025). Biber et al.'s (2016) set of 150 BNC items for the analysis of dispersion measures: Dataset for "Evaluation of text-level measures of lexical dispersion" [Dataset]. http://doi.org/10.18710/ATCQZW
Explore at:
csv(60972), csv(60419), csv(59196), csv(60692), csv(59481), csv(59501), csv(59186), txt(9769), csv(66526), csv(62252), csv(59701), csv(59973), csv(59183), csv(59144), csv(60221), csv(59276), csv(59235), csv(59087), csv(63350), csv(61106), csv(60188), csv(59135), csv(61219), csv(64412), csv(63979), csv(61144), csv(59790), csv(60695), csv(61385), csv(59248), csv(64894), csv(59271), csv(59195), csv(59222), csv(61352), csv(63508), csv(65159), csv(62994), csv(59306), csv(67500), csv(61468), csv(59096), csv(60683), csv(59112), csv(59573), csv(59162), csv(59233), csv(63264), csv(60171), csv(60314), csv(64311), csv(59077), csv(59284), csv(59089), csv(60425), csv(63378), csv(63926), csv(62302), csv(60163), csv(65577), csv(59277), csv(59697), csv(63068), csv(61581), csv(61149), csv(59581), csv(59486), csv(59161), csv(61342), csv(59166), csv(59101), csv(64888), csv(59803), csv(59180), csv(59100), csv(59270), csv(62736), csv(59985), csv(59247), csv(59083), csv(59200), csv(60011), csv(59610), csv(60104), csv(60961), csv(59225), csv(59443), csv(61624), csv(61075), csv(59109), csv(60127), csv(59221), csv(61777), csv(60950), csv(59782), csv(61272), csv(59156), csv(59123), csv(62042), csv(59765), csv(60409), csv(59536), csv(59201), csv(59137), csv(59902), csv(59917), csv(64643), csv(59716), csv(59216), csv(59085), csv(59114), csv(59086), csv(60406), csv(59095), csv(65358), bin(61605), csv(63216), csv(60084), csv(59191), csv(61775), csv(66834), csv(61885), csv(60914), csv(62843), csv(59806), csv(60837), csv(61489), csv(59891), csv(59174), csv(68813), csv(60228), csv(64626), csv(59399), csv(59262), csv(60390), csv(59209), csv(59098), csv(59827), csv(59849), csv(59395), csv(60712), csv(63044), csv(62569), csv(59650), csv(59106), csv(59255), csv(64030), csv(59212), csv(59298), csv(64503)Available download formats
Unique identifier
https://doi.org/10.18710/ATCQZW
Dataset updated
Jan 7, 2025
Dataset provided by
DataverseNO
Authors
Lukas Sönning; Lukas Sönning
License
https://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/ATCQZWhttps://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/ATCQZW
Time period covered
1975 - 1993
Area covered
United Kingdom
Description
This dataset contains frequencies for a set of 150 word forms in the BNC. The set of items was compiled by Biber et al. (2016) for the purpose of analyzing the behavior of dispersion measures in different distributional settings. It was therefore assembled to cover a broad range of frequency and dispersion levels. For each form, the dataset lists (i) the number occurrences in each of the 4049 text files in the BNC, including zero counts; and (ii) the length of each text file, i.e. the number of word tokens it contains.
p
Lockdown data-V6.0.csv
psycharchives.org
Updated Jun 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Lockdown data-V6.0.csv [Dataset]. https://www.psycharchives.org/en/item/8a0c3db3-d4bf-46dd-8ffc-557430d45ddd
Explore at:
Dataset updated
Jun 4, 2020
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The outbreak of the COVID-19 pandemic has prompted the German government and the 16 German federal states to announce a variety of public health measures in order to suppress the spread of the coronavirus. These non-pharmaceutical measures intended to curb transmission rates by increasing social distancing (i.e., diminishing interpersonal contacts) which restricts a range of individual behaviors. These measures span moderate recommendations such as physical distancing, up to the closures of shops and bans of gatherings and demonstrations. The implementation of these measures are not only a research goal for themselves but have implications for behavioral research conducted in this time (e.g., in form of potential confounder biases). Hence, longitudinal data that represent the measures can be a fruitful data source. The presented data set contains data on 14 governmental measures across the 16 German federal states. In comparison to existing datasets, the data set at hand is a fine-grained daily time series tracking the effective calendar date, introduction, extension, or phase-out of each respective measure. Based on self-regulation theory, measures were coded whether they did not restrict, partially restricted or fully restricted the respective behavioral pattern. The time frame comprises March 08, 2020 until May 15, 2020. The project is an open-source, ongoing project with planned continued updates in regular (approximately monthly) intervals. New variables include restrictions on travel and gastronomy. The variable trvl (travel) comprises the following categories: fully restricted (=2) reflecting a potential general ban to travel within Germany (except for sound reasons like health or business); partially restricted (=1): travels are allowed but may be restricted through prohibition of accommodation or entry ban for certain groups (e.g. people from risk areas); free (=0): no travel and accommodation restrictions in place). The variable gastr (gastronomy) comprises: fully restricted (=2): closure of restaurants or bars; partially restricted (=1): Only take-away or food delivery services are allowed; free (=0): restaurants are allowed to open without restrictions). Further, the variables msk (recommendations to wear a mask) and zoo (restrictions of zoo visits) have been adjusted.:
Gender, Age, and Emotion Detection from Voice
kaggle.com
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/datasets/rohitzaman/gender-age-and-emotion-detection-from-voice/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rohit Zaman
Description
Context

Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

Content

Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

Acknowledgements

Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
[Dataset] Single-shot capable surface acoustic wave dispersion measurement...
zenodo.org
bin, pdf, zip
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georg Watzl; Georg Watzl; Stefan Eder; Martin Ryzy; Martin Ryzy; Edgar Scherleitner; Edgar Scherleitner; Mike Hettich; Mike Hettich; Martin Schagerl; Martin Schagerl; Clemens Grünsteidl; Clemens Grünsteidl; Stefan Eder (2025). [Dataset] Single-shot capable surface acoustic wave dispersion measurement of a layered plate [Dataset]. http://doi.org/10.5281/zenodo.15584666
Explore at:
bin, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15584666
Dataset updated
Jun 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georg Watzl; Georg Watzl; Stefan Eder; Martin Ryzy; Martin Ryzy; Edgar Scherleitner; Edgar Scherleitner; Mike Hettich; Mike Hettich; Martin Schagerl; Martin Schagerl; Clemens Grünsteidl; Clemens Grünsteidl; Stefan Eder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 3, 2025
Description
Research data for the purpose of reproducing the results presented in the journal publication titled "Single-shot capable surface acoustic wave dispersion measurement of a layered plate"
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Kishor Datta Gupta
Nafiz Sadman
Nishat Anjum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Coronavirus Records Dataset: 2021
kaggle.com
Updated Jul 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2022). Coronavirus Records Dataset: 2021 [Dataset]. https://www.kaggle.com/iamsouravbanerjee/covid19-dataset-world-and-continent-wise/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

Coronavirus disease 2019 (COVID-19) is a contagious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was identified in Wuhan, China, in December 2019. The disease has since spread worldwide, leading to an ongoing pandemic.

Symptoms of COVID-19 are variable, but often include fever, cough, headache, fatigue, breathing difficulties, and loss of smell and taste. Symptoms may begin one to fourteen days after exposure to the virus. At least a third of people who are infected do not develop noticeable symptoms. Of those people who develop symptoms noticeable enough to be classed as patients, most (81%) develop mild to moderate symptoms (up to mild pneumonia), while 14% develop severe symptoms (dyspnea, hypoxia, or more than 50% lung involvement on imaging), and 5% suffer critical symptoms (respiratory failure, shock, or multiorgan dysfunction). Older people are at a higher risk of developing severe symptoms. Some people continue to experience a range of effects (long COVID) for months after recovery, and damage to organs has been observed. Multi-year studies are underway to further investigate the long-term effects of the disease.

COVID-19 transmits when people breathe in air contaminated by droplets and small airborne particles containing the virus. The risk of breathing these in is highest when people are in close proximity, but they can be inhaled over longer distances, particularly indoors. Transmission can also occur if splashed or sprayed with contaminated fluids in the eyes, nose, or mouth, and, rarely, via contaminated surfaces. People remain contagious for up to 20 days and can spread the virus even if they do not develop symptoms.

Several testing methods have been developed to diagnose the disease. The standard diagnostic method is by detection of the virus' nucleic acid by real-time reverse transcription-polymerase chain reaction (rRT-PCR), transcription-mediated amplification (TMA), or by reverse transcription loop-mediated isothermal amplification (RT-LAMP) from a nasopharyngeal swab.

Preventive measures include physical or social distancing, quarantining, ventilation of indoor spaces, covering coughs and sneezes, hand washing, and keeping unwashed hands away from the face. The use of face masks or coverings has been recommended in public settings to minimize the risk of transmissions.

While work is underway to develop drugs that inhibit the virus (and several vaccines for it have been approved and distributed in various countries, which have since initiated mass vaccination campaigns), the primary treatment is symptomatic. Management involves the treatment of symptoms, supportive care, isolation, and experimental measures.

Source - https://en.wikipedia.org/wiki/COVID-19

Content

This Dataset is a collection of records for COVID-19 (World and Continent wise).

Structure of the Dataset

https://i.imgur.com/sbvsXhr.png" alt="">

Acknowledgements

This Dataset is created from: https://www.worldometers.info/. If you want to learn more, you can visit the Website.

Cover Photo by Hakan Nural on Unsplash
c
covid 19 India containment zone classification Dataset
cubig.ai
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). covid 19 India containment zone classification Dataset [Dataset]. https://cubig.ai/store/products/182/covid-19-india-containment-zone-classification-dataset
Explore at:
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Area covered
India
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The COVID-19 India Containment Zone Classification dataset categorizes Indian districts into Red, Orange, and Green Zones based on COVID-19 case metrics as of May 4. This classification aids in understanding the spread and control of COVID-19 across different regions.

2) Data Utilization (1) COVID-19 India Containment Zone data has characteristics that: • It includes detailed district-level information on the zone classification (Red, Orange, Green) based on COVID-19 metrics. This information is crucial for analyzing the spread of the virus, the effectiveness of containment measures, and for planning public health strategies. (2) COVID-19 India Containment Zone data can be used to: • Public Health Management: Assists in resource allocation, planning containment measures, and implementing targeted lockdowns based on zone classification. • Research and Analysis: Supports epidemiological studies, modeling the spread of the virus, and assessing the impact of containment measures in different zones.
The African region covid-19 dataset
kaggle.com
zip
Updated Apr 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Derek Kweku (2020). The African region covid-19 dataset [Dataset]. https://www.kaggle.com/derek560/the-african-region-covid19-dataset
Explore at:
zip(56052 bytes)Available download formats
Dataset updated
Apr 10, 2020
Authors
Derek Kweku
Description
Context

There's a story behind every dataset and here's your opportunity to share yours.

As the spread of the novel covid-19 continues to run into countries it is important for us to keep records of every Information on it. Therefore, this dataset is built basically to cover the update from Africa.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. It contains Information on the dates the cases were recorded across Africa. Detailing the death, confirmed and recovery cases in each country.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Ethical AI Club John Hopkins University Runmila Institute WHO CDC Ghana Health Service

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered? We should be able to see contributors answering questions about how Africa should prepare and put in the right measures to contain the spread. A better understanding from the Data scientists.
Z
dataset for "A Scintillation Arc Survey of 22 Pulsars with Low to Moderate...
data.niaid.nih.gov
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stinebring, Dan (2022). dataset for "A Scintillation Arc Survey of 22 Pulsars with Low to Moderate Dispersion Measures" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6413232
Explore at:
Dataset updated
Nov 7, 2022
Dataset provided by
Rickett, Barney
Mathis, Lele
Stinebring, Dan
Ocker, Stella Koch
Jussila, Adam P.
Hill, Alex S.
Minter, Anthony
McLaughlin, Maura A.
Ransom, Scott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the 54 Dynamic Spectrum files (.FITS format) behind the paper "A Scintillation Arc Survey of 22 Pulsars with Low to Moderate Dispersion Measures," submitted to the AAS Journals on 2022 April 06. We also include the Tables, Figures, and Text from the work, as well as some explanatory README files.
R
Pig Skin Disease Single Label Dataset
universe.roboflow.com
zip
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haru (2025). Pig Skin Disease Single Label Dataset [Dataset]. https://universe.roboflow.com/haru-0icei/pig-skin-disease-single-label
Explore at:
zipAvailable download formats
Dataset updated
Apr 3, 2025
Dataset authored and provided by
Haru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Pig Swine Skin Disease
Description
PigDetect: AI-Powered Pig Skin Disease Detection & Monitoring PigDetect is an innovative platform designed to assist farmers in the early detection and management of pig skin diseases. By utilizing cutting-edge image processing and machine learning algorithms, the system allows farmers to capture images of their pigs and instantly receive a diagnosis. PigDetect helps farmers make informed decisions about the health of their livestock, reducing potential outbreaks and increasing farm productivity. ** Key Features:**

[ ] Real-Time

Skin Disease Detection: Upload or capture images of pigs, and PigDetect will analyze them for signs of common skin diseases such as mange, ringworm, and erysipelas.

[ ] Geolocation

[ ] Tracking:

PigDetect maps the location of reported cases, helping farmers track disease spread and take preventive measures.

[ ] Notifications:

The system sends real-time notifications to farmers when a skin disease outbreak is detected within a certain radius of their farm. Admin Controls: Farm administrators can manage blog posts, monitor disease data, and interact with PigDetect's community of users.

[ ] PigDetect

provides farmers with a powerful tool to improve the health and welfare of their pigs, ultimately boosting productivity and preventing potential losses.
Surface wave dispersion measurements for the Pacific, 2023 - 2024
data.europa.eu
hosted-metadata.bgs.ac.uk
+2more
unknown
Updated Aug 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (BGS) (2024). Surface wave dispersion measurements for the Pacific, 2023 - 2024 [Dataset]. https://data.europa.eu/data/datasets/surface-wave-dispersion-measurements-for-the-pacific-2023-2024?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
Aug 23, 2024
Dataset provided by
British Geological Surveyhttps://www.bgs.ac.uk/
Authors
British Geological Survey (BGS)
Description
Synthetic and real dispersion measurements for paths across the Pacific, consists of 2 datasets; SS3DPacific_new - This is a data set of surface-wave dispersion measurements. The dispersion is measured between a synthetic reference seismogram (computed with normal-mode summation using the MINEOS software in the radial model stw105 from Kustowski et al., 2008), and a real observed seismogram. This data set is used by Latallerie et al. (2024) to build a Vs model of the Pacific upper-mantle with full 3D resolution and uncertainty using SOLA inversion (Zaroli 2016) and finite-frequency theory (Zhou 2009). Data are for a set of source-receiver pairs for frequencies ranging from 6 to 21 mHz, every 1mHz. The measurement algorithm uses the multi-taper technique (Thompson 1982). The first 5 Slepians are used (Slepian 1978). A datum is the average of measurements over these tapers, and the uncertainty is the standard deviation. SS3DPacificSyn_new - This is a data set of surface-wave dispersion measurements. The dispersion is measured between a synthetic reference seismogram (computed with normal-mode summation using the MINEOS software in the radial model stw105 from Kustowski et al., 2008), and a synthetic seismogram computed using the spectral element method software Specfem in the 3D model S362ANI from Kustowski etl al. (2018). This data set is used by Latallerie et al. (2024) in a synthetic tomography study to retrieve the Vs structure of the input 3D model S362ANI in the Pacific upper-mantle with full 3D resolution and uncertainty using SOLA inversion (Zaroli 2016) and finite-frequency theory (Zhou 2009). Measurements are provided for source-receiver pairs for frequencies ranging from 6 to 21 mHz, every 1mHz. The measurement algorithm uses the multi-taper technique (Thompson 1982). The first 5 Slepians (Slepian 1978) are used. A datum is the average of measurements over these tapers, and the uncertainty is the standard deviation.
g
Monitoring populations of Popillia japonica (Newman, 1838) over 16 years in...
gbif.org
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mário B. Teixeira; António O. Soares; Lucas Lamelas-López; David H. Lopes; José A. Mota; Paulo A. V. Borges; Nelson Simões; Mário B. Teixeira; António O. Soares; Lucas Lamelas-López; David H. Lopes; José A. Mota; Paulo A. V. Borges; Nelson Simões (2024). Monitoring populations of Popillia japonica (Newman, 1838) over 16 years in the Azorean Islands [Dataset]. http://doi.org/10.15468/gk6p48
Explore at:
Unique identifier
https://doi.org/10.15468/gk6p48
Dataset updated
Oct 2, 2024
Dataset provided by
GBIF
Universidade dos Açores
Authors
Mário B. Teixeira; António O. Soares; Lucas Lamelas-López; David H. Lopes; José A. Mota; Paulo A. V. Borges; Nelson Simões; Mário B. Teixeira; António O. Soares; Lucas Lamelas-López; David H. Lopes; José A. Mota; Paulo A. V. Borges; Nelson Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 1, 2008 - Nov 30, 2023
Area covered

Description
The Japanese beetle Popillia japonica was introduced on Terceira Island (Azores) early in the 1970s. Mild temperatures, high relative humidity, and heavy rain created the perfect conditions for the beetle's establishment and rapid spread. Despite initial control efforts, the beetle quickly spread to the island's interior agricultural regions and threatened the local plants and horticultural lands. Since 1974, adult populations have been monitored in Terceira Island using pheromone and floral lure traps distributed across the island. The data revealed a distribution pattern across three circular zones with decreasing population densities and a movement of the infestation's central core to the island's interior to more conducive zones for the beetle's development. In 1989, 16 years after the first insects were discovered on the island, the pest had taken over all the available space. A contingency plan was drawn up to establish protective measures to prevent the spread of the Popillia japonica to Madeira and Portugal mainland in 1985 (Decreto Legislativo Regional 11/85/A, de 23 de Agosto). Later, it was actualized to comply with legislation of the European Union (EU), paying particular attention to categorizing this insect as a priority pest. Although these preventive measures were applied, the pest spread to other islands over the years; currently, eight of the nine islands of the Archipelago are infested. Although preventive measures have been applied, the pest has spread to other islands over the years, and currently, eight of the nine islands of the Archipelago are infested. In 1996, the Japanese beetle was detected in Faial; in 2003, on the island of São Miguel; in 2006, in the island of Pico; in 2007, on Flores and São Jorge islands; in 2013, in Corvo; and 2017, in Graciosa. Only Santa Maria has not recorded the pest's presence. The Japanese beetle completes its life cycle in a year, with individuals starting to emerge from the ground at the end of May and reaching their peak densities in early August. The last beetles were seen as late as the end of October. The first and second larval instars typically have a brief lifespan, and by early October, most of the population has reached the third instar. The third instar grubs stop feeding and pupate at the beginning of May. The pupal stage lasts less than a month, and no pupae were seen after late July. Adults eat the foliage, floral parts, and occasionally, the fruits of various agricultural plants and ornamentals. At the same time, the grubs live off the roots of the pastures that make up most of the island. It is important to clarify that the adult beetle pest can damage around 414 host plants belonging to 94 families, which may cause elevated crop damage, which makes this a priority pest to maintain under control. The data presented here is related to the Popillia japonica captured in the Azores from 2008 to 2023, which resulted from the work of the operational services on each island of the Secretaria Regional da Agricultura e Alimentação. It is a compilation of the officials’ records from the local authorities who contributed to this data from their fieldwork monitoring of Popillia japonica during these 16 years
Spectral dataset of daylights and surface properties of natural objects...
zenodo.org
bin, csv
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Takuma Morimoto; Takuma Morimoto; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa (2024). Spectral dataset of daylights and surface properties of natural objects measured in Japan [Dataset]. http://doi.org/10.5281/zenodo.5217752
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5217752
Dataset updated
Aug 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Takuma Morimoto; Takuma Morimoto; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Japan
Description
This is a spectral dataset of natural objects and daylights collected in Japan.

We collected 359 natural objects and measured the reflectance of all objects and the transmittance of 75 leaves. We also measured daylights from dawn till dusk on four different days using a white plate placed (i) under the direct sun and (ii) under the casted shadow (in total 359 measurements). We also separately measured daylights at five different locations (including a sports ground, a space between tall buildings and a forest) with minimum time intervals to reveal the influence of surrounding environments on the spectral composition of daylights reaching the ground (in total 118 measurements).

If you use this dataset in your research, please cite the following publication.

Morimoto, T., Zhang, C., Fukuda, K., & Uchikawa, K. (2022). Spectral measurement of daylights and surface properties of natural objects in Japan. Optics express, 30(3), 3183. https://doi.org/10.1364/OE.441063

Dataset contains following Excel spread sheets and csv files:

(A) Surface properties of natural objects

(A-1) Reflectance_ver1-2.xlsx and .csv

(A-2) Transmittance_FrontSideUp_ver1-2.xlsx and .csv

(A-2) Transmittance_BackSideUp_ver1-2.xlsx and .csv

(B) Daylight measurements

(B-1) Daylight_TimeLapse_v1-2.xlsx and .csv

(B-2) Daylight_DifferentLocations_v1-2.xlsx and .csv

Data description

(A) Surface properties

(A-1) Reflectance_ver1-2.xlsx and .csv

This file contains surface spectral reflectance data (380 - 780 nm, 5 nm step) of 359 natural objects, including 200 flowers, 113 leaves, 23 fruits, 6 vegetables, 8 barks, and 9 stones measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.

For the analysis presented in the paper, we identified reflectance pairs that have a Pearson’s correlation coefficient across 401 spectral channels of more than 0.999 and removed one of reflectances from each pair. The column 'Used in analysis' indicates whether or not each sample is used for the analysis (TRUE indicates used and FALSE indicate not used).

At the time of collection, we noted the scientific names of flowers, leaves and barks from a name board provided by the Tokyo Institute of Technology in which samples are collected. If not available, we used a smartphone software which automatically identifies the scientific name from an input image (PictureThis - Plant Identifier developed by Glority Global Group Ltd.). The names of 2 flowers and 9 stones whose name could not be identified through either method were left blank.

(A-2) Transmittance_FrontSideUp_v1-2.xlsx and .csv

This file contains surface spectral transmittance data (380 - 780 nm, 5 nm step) for 75 leaves measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.

For this data, the transmittance was measured with the front-side of leaves up (the light was transmitted from the back side of the leaves). This is the data presented in the associated article.

(A-3) Transmittance_BackSideUp_v1-2.xlsx and .csv

Spectral transmittance data of the same leaves presented in (A-2).

For this data, the transmittance was measured with the back-side of leaves up (the light was transmitted from the front side of the leaves).

(B) Daylight measurements

(B-1) Daylight_TimeLapse_ver1-2.xlsx and .csv

This file contains daylight spectra from sunrise to sunset on four different days (2013/11/20, 2013/12/24, 2014/07/03 and 2014/10/27) measured by a spectrophotometer (SR-LEDW, Topcon, Tokyo, Japan) with a wavelength range from 380 nm to 780 nm with 1 nm step. We measured the reflected light from the white calibration plate placed either under a direct sunlight or under a casted shadow.

The column 'Cloud cover' provides visual estimate of percentage of cloud cover across the sky at the time of each measurement. The column 'Red lamp' indicates whether an aircraft warning lamp at the measurement site was on (circle) or off (blank).

(B-2) Daylight_DifferentLocations_ver1-2.xlsx and .csv

This file includes daylight spectra measured at five different sites within the Suzukakedai Campus of Tokyo Institute of Technology with minimum time gap on 2014/07/08, using a spectroradiometer (IM-1000, Topcon) from 380 nm to 780 nm with 1 nm step. The instrument was oriented either towards the sun or towards the zenith sky. When the instrument was oriented to the sun, we measured spectra in two ways: (i) one using a black cylinder covering the photodetector and (ii) the other without using a cylinder.

The column 'Cylinder' indicates whether the black cylinder was used (circle) or not (cross). The column 'Cloud cover' shows the visual estimate of percentage of cloud cover at the time of each measurement. The column 'Sun hidden in clouds' denotes whether the measurement was taken when the sun was covered by clouds (circle) or not (blank).
d
Risk Modeling Data | 3,300 Global Issuers | Dataset for Portfolio Risk...
datarade.ai
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucror Analytics (2024). Risk Modeling Data | 3,300 Global Issuers | Dataset for Portfolio Risk Modelling | Market-implied Credit Risk Modelling | Data for Inhouse Risk Models [Dataset]. https://datarade.ai/data-products/risk-modeling-data-3-300-global-issuers-dataset-for-portf-lucror-analytics
Explore at:
.json, .csv, .xls, .sqlAvailable download formats
Dataset updated
Nov 29, 2024
Dataset authored and provided by
Lucror Analytics
Area covered
Chad, Romania, Sri Lanka, Ecuador, Macao, Uruguay, France, United Republic of, Togo, Yemen
Description
Lucror Analytics: Proprietary Risk Modelling Data for Credit Quality & Bond Valuation

At Lucror Analytics, we provide cutting-edge corporate data solutions tailored to fixed income professionals and organizations implementing quant-driven strategies. Our risk modelling data encompasses issuer and issue-level credit quality, bond fair value metrics, and proprietary scores designed to offer nuanced, actionable insights into global bond markets that help you stay ahead of the curve. Covering over 3,300 global issuers and over 80,000 bonds, we empower our clients with robust risk modelling data to make data-driven decisions with confidence and precision.

By leveraging our proprietary C-Score, V-Score, and V-Score I models, which utilize CDS and OAS data, we provide unparalleled granularity in credit analysis and valuation. Whether you are a portfolio manager, credit analyst, or institutional investor, Lucror’s risk modelling data solutions deliver actionable insights to enhance strategies, identify mispricing opportunities, and assess credit risk.

What Makes Lucror’s Risk Modelling Data Unique?

Proprietary Credit and Valuation Models Developed for Risk Modelling Our proprietary C-Score, V-Score, and V-Score I are designed to provide a deeper understanding of credit quality and bond valuation:

C-Score: A composite score (0-100) reflecting an issuer's credit quality based on market pricing signals such as CDS spreads. Responsive to near-real-time market changes, the C-Score offers granular differentiation within and across credit rating categories, helping investors identify mispricing opportunities.

V-Score: Measures the deviation of an issue’s option-adjusted spread (OAS) from the market fair value, indicating whether a bond is overvalued or undervalued relative to the market.

V-Score I: Similar to the V-Score but benchmarked against industry-specific fair value OAS, offering insights into relative valuation within an industry context.

These models provide foundational risk modelling data for fixed income strategies aimed at outperforming benchmarks.

Comprehensive Global Coverage Our risk modelling data covers over 3,300 issuers and 80,000 bonds across global markets, ensuring 90%+ overlap with prominent IG and HY benchmark indices. This extensive coverage provides valuable insights into issuers across sectors and geographies, enabling users to analyze issuer credit risk and market dynamics comprehensively.

Risk Modelling Data Customization and Flexibility We recognize that different users have unique requirements. Lucror Analytics offers tailored datasets delivered in customizable formats, frequencies, and levels of granularity, ensuring that our risk modelling data integrates seamlessly into your workflows.

High-Frequency, High-Quality Risk Modeling Data Our C-Score, V-Score, and V-Score I models and metrics are updated daily using end-of-day (EOD) data from S&P. This ensures that users have access to current and accurate risk modelling data, empowering timely and informed decision-making.

How Is the Risk Modelling Data for Sourced? Lucror Analytics employs a rigorous methodology to source, structure, transform and process data, ensuring reliability and actionable insights:

Proprietary Quantitative Risk Models: Our scores are derived from proprietary quant algorithms based on CDS spreads, OAS, and other issuer and bond data.

Global Data Partnerships: Our collaborations with S&P and other reputable data providers ensure comprehensive and accurate risk modelling datasets.

Cleaning and Structuring of Risk Modelling Data: Advanced processes ensure data integrity, transforming raw inputs into actionable credit risk insights.

Primary Use Cases

Risk Management Updated daily, Lucror’s risk modelling data provides dynamic insights into market and credit risks. Organizations can use this data to monitor shifts in credit quality, assess valuation anomalies, and adjust exposure proactively.

Quant-driven Portfolio Construction & Rebalancing Lucror’s C-Score provides a granular view of issuer credit quality, allowing portfolio managers to evaluate risks and identify mispricing opportunities. With CDS-driven insights and daily updates, clients can incorporate near-real-time issuer/bond movements into their credit assessments using risk modelling data.

Portfolio Optimization The V-Score and V-Score I allow portfolio managers to identify undervalued or overvalued bonds, supporting strategies that optimize returns relative to credit risk. By benchmarking valuations against market and industry standards, users can uncover potential mean-reversion opportunities and enhance portfolio performance with risk modelling data.

Strategic Decision-Making Our comprehensive risk modelling data enables financial institutions to make informed strategic decisions. Whether it’s assessing the fair value of bonds, analyzing industry-specific credit risk, or underst...

Facebook

Twitter

Click to copy link

Link copied

Cite

Mesum Raza Hemani (2021). COVID-19 Measures Dataset (All World) [Dataset]. https://www.kaggle.com/mesumraza/covid19-measures-dataset-all-world/discussion

COVID-19 Measures Dataset (All World)

What measures did governments across the world took against COVID-19 spread?

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 23, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Mesum Raza Hemani

Area covered

World

Description

Context

There's a story behind every dataset and here's your opportunity to share yours.

The COVID-19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories:

Social distancing Movement restrictions Public health measures Social and economic measures Lockdowns

Content

Updated last 10/12/2020 The #COVID19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories: - Social distancing - Movement restrictions - Public health measures - Social and economic measures - Lockdowns Each category is broken down into several types of measures.

ID ISO COUNTRY REGION ADMIN_LEVEL_NAME PCODE LOG_TYPE CATEGORY MEASURE_TYPE TARGETED_POP_GROUP COMMENTS NON_COMPLIANCE DATE_IMPLEMENTED SOURCE SOURCE_TYPE LINK ENTRY_DATE ALTERNATIVE SOURCE

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Clear search

Close search

Google apps

Main menu

COVID-19 Measures Dataset (All World)

Context

Content

Acknowledgements

Inspiration

Data from: Variation in trends of consumption based carbon accounts

Background data for: Advancing our understanding of dispersion measures in...

Predicting Epidemic Risk from Past Temporal Contact Data

Community-based measures to mitigate the spread of coronavirus disease...

Violators of Precautionary and Preventive Measures to Limit The Spread of...

Biber et al.'s (2016) set of 150 BNC items for the analysis of dispersion...

Lockdown data-V6.0.csv

Gender, Age, and Emotion Detection from Voice

Context

Content

Acknowledgements

[Dataset] Single-shot capable surface acoustic wave dispersion measurement...

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

Coronavirus Records Dataset: 2021

Context

Content

Structure of the Dataset

Acknowledgements

covid 19 India containment zone classification Dataset

The African region covid-19 dataset

Context

Content

Acknowledgements

Inspiration

dataset for "A Scintillation Arc Survey of 22 Pulsars with Low to Moderate...

Pig Skin Disease Single Label Dataset

Surface wave dispersion measurements for the Pacific, 2023 - 2024

Monitoring populations of Popillia japonica (Newman, 1838) over 16 years in...

Spectral dataset of daylights and surface properties of natural objects...

Risk Modeling Data | 3,300 Global Issuers | Dataset for Portfolio Risk...

COVID-19 Measures Dataset (All World)

What measures did governments across the world took against COVID-19 spread?

Context

Content

Acknowledgements

Inspiration