8 datasets found

u
Data from: Article dataset: 'How stable are visions for protected area...
recerca.uoc.edu
data.niaid.nih.gov
+1more
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lo, Veronica B.P.G.; López-Rodríguez, María D.; Metzger, Marc J.; Oteros-Rozas, E.; Cebrián Piqueres, Miguel A.; Ruiz-Mallén, Isabel; March, H.; Raymond, Christopher M.; Lo, Veronica B.P.G.; López-Rodríguez, María D.; Metzger, Marc J.; Oteros-Rozas, E.; Cebrián Piqueres, Miguel A.; Ruiz-Mallén, Isabel; March, H.; Raymond, Christopher M. (2021). Article dataset: 'How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic' [Dataset]. https://recerca.uoc.edu/documentos/668fc459b9e7c03b01bdaaf6
Explore at:
Dataset updated
2021
Authors
Lo, Veronica B.P.G.; López-Rodríguez, María D.; Metzger, Marc J.; Oteros-Rozas, E.; Cebrián Piqueres, Miguel A.; Ruiz-Mallén, Isabel; March, H.; Raymond, Christopher M.; Lo, Veronica B.P.G.; López-Rodríguez, María D.; Metzger, Marc J.; Oteros-Rozas, E.; Cebrián Piqueres, Miguel A.; Ruiz-Mallén, Isabel; March, H.; Raymond, Christopher M.
Description
This dataset includes the raw data of a survey of 38 stakeholders in the region of the Sierra de Guadarrama National Park, Spain (2019). The "Dataset" sheet presents respondents' Likert-scale answers (ranging from 1 to 5) of agreement regarding values, perceived changes and perceived drivers of change of the national park landscapes. The complete methodology is described in: Lo, V.B., López-Rodríguez, M.D., Metzger, M., Oteros-Rozas, E., Cebrián-Piqueras, M. A., Ruiz-Mallén, I., March, H., Raymond, C.M. (in press) ‘How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic.’ People and Nature. Accompanying supplementary data (interview script, tiles and canvasses, coding, follow-up survey questions) are available as supplementary information to this article available for open access on the People and Nature website.
u
Ministry of Justice Synthetic Data First Datasets, 2011-2023
datacatalogue.ukdataservice.ac.uk
Updated Jun 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry of Justice (2025). Ministry of Justice Synthetic Data First Datasets, 2011-2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-9399-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-9399-1
Dataset updated
Jun 18, 2025
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Ministry of Justice
Time period covered
Jan 1, 2011 - Mar 30, 2023
Area covered
England and Wales
Description
The Ministry of Justice (MoJ) Data First Synthetic Data Project aims to improve engagement with Data First datasets by making synthetic versions of content available to enable more rapid development of research proposals and to thereby enhance the potential for linked administrative data to improve understanding and outcomes across justice systems. The project has led the development of two components: a dataset generation platform and an initial release of lo-fidelity, synthetic data tables.

This study includes a synthetically-generated version of the Ministry of Justice Data First Family Court datasets. Synthetic versions of all 43 tables in the MoJ Data First data ecosystem have been created. These versions can be used / joined in the same way as the real datasets. As well as underpinning training, synthetic datasets should enable researchers to explore research questions and to design research proposals prior to submitting these for approval. The code created during this exploration and design process should then enable initial results to be obtained as soon as data access is granted.

The Ministry of Justice Data First family court dataset provides data on cases heard by the family court in England and Wales, and the people involved, from 2011, and has been extracted from the FamilyMan management information system, used by His Majesty's Courts and Tribunals Service (HMCTS) to manage cases within the family courts (County Courts).

Information is included on individual divorce and Family Law Act, adoption, private and public law cases, the people involved as parties to the case (their role and characteristics), key case dates, processes and outcomes.

There are three tables for cases, people and events, which can be joined together. A case will usually have multiple people involved (for example an applicant and respondent) and may have many events, (for example hearings, applications and orders made by the court) which are each included as a separate record. These depend on the type of case and its progress.

As part of Data First, records have been de-identified and deduplicated, using our probabilistic record linkage package, Splink, so that a unique identifier is assigned to all records believed to relate to the same person, allowing for longitudinal analysis and investigation of repeat appearances. This opens up the potential to address questions on, for example, common transitions between family law case types and patterns associated with repeat use of the family court system.

The Ministry of Justice Data First linking dataset can be used in combination with this and other Data First datasets to join up administrative records about people from across justice services to increase understanding around users’ interactions, pathways and outcomes.
Lower layer Super Output Area population estimates (supporting information)
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Nov 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). Lower layer Super Output Area population estimates (supporting information) [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/lowersuperoutputareamidyearpopulationestimates
Explore at:
xlsxAvailable download formats
Dataset updated
Nov 25, 2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Mid-year (30 June) estimates of the usual resident population for Lower layer Super Output Areas (LSOAs) in England and Wales by single year of age and sex.
All the Earthquakes Dataset : from 1990-2023
kaggle.com
zip
Updated Aug 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Lo Bello (2023). All the Earthquakes Dataset : from 1990-2023 [Dataset]. https://www.kaggle.com/datasets/alessandrolobello/the-ultimate-earthquake-dataset-from-1990-2023/code
Explore at:
zip(121537542 bytes)Available download formats
Dataset updated
Aug 7, 2023
Authors
Alessandro Lo Bello
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description of Earthquakes Dataset (1990-2023)

The earthquakes dataset is an extensive collection of data containing information about all the earthquakes recorded worldwide from 1990 to 2023. The dataset comprises approximately three million rows, with each row representing a specific earthquake event. Each entry in the dataset contains a set of relevant attributes related to the earthquake, such as the date and time of the event, the geographical location (latitude and longitude), the magnitude of the earthquake, the depth of the epicenter, the type of magnitude used for measurement, the affected region, and other pertinent information.

Features - time in millisecconds - place - status
- tsunami (boolean value) - significance - data_type - magnitudo - state - longitude - latitude
- depth - date

Importance and Utility of the Dataset:

Earthquake Analysis and Prediction: The dataset provides a valuable data source for scientists and researchers interested in analyzing spatial and temporal distribution patterns of earthquakes. By studying historical data, trends, and patterns, it becomes possible to identify high-risk seismic zones and develop predictive models to forecast future seismic events more accurately.

Safety and Prevention: Understanding factors contributing to earthquake frequency and severity can assist authorities and safety experts in implementing preventive measures at both local and global levels. These data can enhance the design and construction of earthquake-resistant infrastructures, reducing material damage and safeguarding human lives.

Seismological Science: The dataset offers a critical resource for seismologists and geologists studying the dynamics of the Earth's crust and various geological faults. Analyzing details of recorded earthquakes allows for a deeper comprehension of geological processes leading to seismic activity.

Study of Tectonic Movements: The dataset can be utilized to analyze patterns of tectonic movements in specific areas over the years. This may help identify seasonal or long-term seismic activity, providing additional insights into plate tectonic behavior.

Public Information and Awareness: Earthquake data can be made accessible to the public through portals and applications, enabling individuals to monitor seismic activity in their regions of interest and promoting awareness and preparedness for earthquakes.

In summary, the earthquakes dataset represents a fundamental information source for scientific research, public safety, and community awareness. By analyzing historical data and building predictive models, this dataset can significantly contribute to mitigating seismic risks and protecting people and infrastructure from the consequences of earthquakes.
e
Special Eurobarometer 472: Sport and physical activity
data.europa.eu
zip
Updated Mar 21, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Directorate-General for Communication (2018). Special Eurobarometer 472: Sport and physical activity [Dataset]. https://data.europa.eu/data/datasets/s2164_88_4_472_eng?locale=en
Explore at:
zipAvailable download formats
Dataset updated
Mar 21, 2018
Dataset authored and provided by
Directorate-General for Communication
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The latest Eurobarometer on sport and physical activity follows three previous surveys conducted in 2002, 2009 and 2013. It was carried out in the 28 EU Member States in December 2017 and 28,031 EU citizens from different social and demographic categories were interviewed. The survey looked at frequency and levels of engagement in sport and other physical activity, for example the amount of time people spend doing vigorous and moderate physical activity, as well as walking and sitting down. It also took into consideration activities such as cycling, dancing or gardening. The survey also focused on where EU citizens engage in sport and other physical activity, whether in a club or in informal settings such as outdoors or on the way to/from work. Finally, it looked at the reasons why people engage in sport and other physical activity, as well as the barriers to practising sport more regularly and what kind of opportunities or support from local authorities they could get in their area. A final chapter is then dedicated to volunteering in sport.

The results by volumes are distributed as follows:

Volume A: Countries

Volume AA: Groups of countries

Volume A' (AP): Trends

Volume AA' (AAP): Trends of groups of countries

Volume B: EU/socio-demographics

Volume B' (BP) : Trends of EU/ socio-demographics

Volume C: Country/socio-demographics ---- Researchers may also contact GESIS - Leibniz Institute for the Social Sciences: https://www.gesis.org/eurobarometer
Agri-food CO2 emission dataset - Forecasting ML
kaggle.com
zip
Updated Jul 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Lo Bello (2023). Agri-food CO2 emission dataset - Forecasting ML [Dataset]. https://www.kaggle.com/datasets/alessandrolobello/agri-food-co2-emission-dataset-forecasting-ml/code
Explore at:
zip(722843 bytes)Available download formats
Dataset updated
Jul 17, 2023
Authors
Alessandro Lo Bello
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The agricultural CO2 emission dataset has been constructed by merging and reprocessing approximately a dozen individual datasets from the Food and Agriculture Organization (FAO) and data from IPCC. These datasets were, cleaned, preprocessed and merged together to create a comprehensive and cohesive dataset for analysis and forecasting purposes.

The dataset, as demonstrated in the notebook, describes CO2 emissions related to agri-food, which amount to approximately 62% of the global annual emissions.

Indeed, the emissions from the agri-food sector are significant when studying climate change. As the dataset shows, these emissions contribute to a substantial portion of the global annual emissions. Understanding and addressing the environmental impact of the agri-food industry is crucial for mitigating climate change and developing sustainable practices within this sector.

For a better understanding of the dataset, I have written a notebook where I perform an analysis of the relationship between emissions, climate change and geografic Area. Additionally, I provide an example of regression to predict the percentage variations in temperatures.

Dataset Features:

Savanna fires: Emissions from fires in savanna ecosystems.

Forest fires: Emissions from fires in forested areas.

Crop Residues: Emissions from burning or decomposing leftover plant material after crop harvesting.

Rice Cultivation: Emissions from methane released during rice cultivation.

Drained organic soils (CO2): Emissions from carbon dioxide released when draining organic soils.

Pesticides Manufacturing: Emissions from the production of pesticides.

Food Transport: Emissions from transporting food products.

Forestland: Land covered by forests.

Net Forest conversion: Change in forest area due to deforestation and afforestation.

Food Household Consumption: Emissions from food consumption at the household level.

Food Retail: Emissions from the operation of retail establishments selling food.

On-farm Electricity Use: Electricity consumption on farms.

Food Packaging: Emissions from the production and disposal of food packaging materials.

Agrifood Systems Waste Disposal: Emissions from waste disposal in the agrifood system.

Food Processing: Emissions from processing food products.

Fertilizers Manufacturing: Emissions from the production of fertilizers.

IPPU: Emissions from industrial processes and product use.

Manure applied to Soils: Emissions from applying animal manure to agricultural soils.

Manure left on Pasture: Emissions from animal manure on pasture or grazing land.

Manure Management: Emissions from managing and treating animal manure.

Fires in organic soils: Emissions from fires in organic soils.

Fires in humid tropical forests: Emissions from fires in humid tropical forests.

On-farm energy use: Energy consumption on farms.

Rural population: Number of people living in rural areas.

Urban population: Number of people living in urban areas.

Total Population - Male: Total number of male individuals in the population.

Total Population - Female: Total number of female individuals in the population.

total_emission: Total greenhouse gas emissions from various sources.

Average Temperature °C: The average increasing of temperature (by year) in degrees Celsius,

Importance and Context:

The agricultural sector contributes to approximately, how i'll demostrate in my notebook, 62% of the total global CO2 emissions, making it a significant contributor to climate change. This dataset plays a crucial role in understanding and monitoring the impact of agricultural activities on CO2 emissions. By leveraging machine learning techniques, it enables the forecasting of future emissions, allowing policymakers and researchers to develop targeted strategies and interventions for sustainable agricultural practices. This dataset serves as a valuable resource for climate scientists, environmental researchers, and policymakers striving to mitigate the environmental impact of the agricultural sector.

Author note:

CO2 is recorded in kilotonnes (kt): 1 kt represents 1000 kg of CO2.

The feature "Average Temperature C°", which can be used as the target for machine learning models, represents the average yearly temperature increase. For example, if it is 0.12, it means that the temperature in that specific location increased by 0.12 degrees Celsius.

Forestland is the only feature that exhibits negative emissions due to its role as a carbon sink. Through photosynthesis, forests absorb and store carbon dioxide, effectively removing it from the atmosphere. Sustainable forest management, along with afforestation and reforestation efforts, further contribute to negative emission...
Balanced Augmented Covid CXR Dataset
kaggle.com
zip
Updated Sep 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrinal Tyagi (2022). Balanced Augmented Covid CXR Dataset [Dataset]. https://www.kaggle.com/tr1gg3rtrash/balanced-augmented-covid-cxr-dataset
Explore at:
zip(1215244749 bytes)Available download formats
Dataset updated
Sep 7, 2022
Authors
Mrinal Tyagi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus.

Most people infected with the virus will experience mild to moderate respiratory illness and recover without requiring special treatment. However, some will become seriously ill and require medical attention. Older people and those with underlying medical conditions like cardiovascular disease, diabetes, chronic respiratory disease, or cancer are more likely to develop serious illness. . Anyone can get sick with COVID-19 and become seriously ill or die at any age.

Making of the dataset

We have used a readily available Chest X-Ray dataset which was very much imbalanced. It was having four classes Covid, Normal, Lung Opacity (LO), and Viral Pneumonia (VP) which had 3616, 10,192, 6012, and 1345 images respectively. We have employed under-sampling (RUS) and over-sampling (or, data-augmentation) by image processing techniques to make the dataset perfectly balanced. The entire scheme is shown in the following figure.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4636937%2F22792aa085df2031a5f2b90347ac51a9%2FCapture.PNG?generation=1662563867428011&alt=media" alt="Illustration image">

First, a novel SVD-based image processing technique is employed for minor classes VP and Covid. This SVD-based image processing technique produces images, that have a little bit of different luminance and contrast. Moreover, CLAHE 0.5 is deployed all over the dataset since it enhances the features from the CXR dataset. We have empirically chosen CLAHE 0.5 and CLAHE 1.0 (for VP) so that there will not be excess contrast enhancement. This is to clarify that we have chosen the numbers 8769, 7662, 8192, and 5410 for Covid, LO, Normal, and VP classes respectively for only training purposes. The testing images are given in different folder which is not pre-processed. In our Augmented CXR dataset, the number of images per class is not exactly the same. Indeed, we have observed that for the Covid class, the statistics of the images are very dissimilar. That means the intra-class variance of the Covid class is considerably higher than other classes. Therefore, the number of such images for the Covid class should be a little bit higher for better convergence of the CNN model. For justification, we utilized a correlation coefficient in order to compute the mean intra-class variance for each class and have found the ratio of the images we have taken in this augmented dataset is very similar to their intra-class variance. Hence, we believe that this augmented dataset is more balanced than the original dataset.

Acknowledgement

Cover Photo by Umanoide from Unsplash

Citations

If you want to use our dataset, please cite our paper -> https://www.sciencedirect.com/science/article/pii/S0010482522008009?via%3Dihub
Parkinson's Telemonitoring
kaggle.com
zip
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Porinita Hoque (2023). Parkinson's Telemonitoring [Dataset]. https://www.kaggle.com/datasets/porinitahoque/parkinsons-telemonitoring
Explore at:
zip(295344 bytes)Available download formats
Dataset updated
Jul 4, 2023
Authors
Porinita Hoque
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Description

Oxford Parkinson's Disease Telemonitoring Dataset

Summary

Source:

The dataset was created by Athanasios Tsanas (tsanasthanasis@gmail.com) and Max Little (littlem@physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers in the US and Intel Corporation who developed the telemonitoring device to record the speech signals. The original study used a range of linear and nonlinear regression methods to predict the clinician's Parkinson's disease symptom score on the UPDRS scale.

Data Set Information:

This dataset is composed of a range of biomedical voice measurements from 42 people with early-stage Parkinson's disease recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. The recordings were automatically captured in the patient's homes.

Columns in the table contain subject number, subject age, subject gender, time interval from baseline recruitment date, motor UPDRS, total UPDRS, and 16 biomedical voice measures. Each row corresponds to one of 5,875 voice recording from these individuals. The main aim of the data is to predict the motor and total UPDRS scores ('motor_UPDRS' and 'total_UPDRS') from the 16 voice measures.

The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around 200 recordings per patient, the subject number of the patient is identified in the first column. For further information or to pass on comments, please contact Athanasios Tsanas (tsanasthanasis@gmail.com) or Max Little (littlem@physics.ox.ac.uk).

Further details are contained in the following reference * if you use this dataset, please cite: Athanasios Tsanas, Max A. Little, Patrick E. McSharry, Lorraine O. Ramig (2009), 'Accurate telemonitoring of Parkinson's disease progression by non-invasive speech tests', IEEE Transactions on Biomedical Engineering (to appear). Further details about the biomedical voice measures can be found in: Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2009), "Suitability of dysphonia measurements for telemonitoring of Parkinson's disease", IEEE Transactions on Biomedical Engineering, 56(4):1015-1022

Attribute Information:

subject# - Integer that uniquely identifies each subject age - Subject age sex - Subject gender '0' - male, '1' - female test_time - Time since recruitment into the trial. The integer part is the number of days since recruitment. motor_UPDRS - Clinician's motor UPDRS score, linearly interpolated total_UPDRS - Clinician's total UPDRS score, linearly interpolated Jitter(%),Jitter(Abs),Jitter:RAP,Jitter:PPQ5,Jitter:DDP - Several measures of variation in fundamental frequency Shimmer,Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,Shimmer:APQ11,Shimmer:DDA - Several measures of variation in amplitude NHR,HNR - Two measures of ratio of noise to tonal components in the voice RPDE - A nonlinear dynamical complexity measure DFA - Signal fractal scaling exponent PPE - A nonlinear measure of fundamental frequency variation

Relevant Papers:

Little MA, McSharry PE, Hunter EJ, Ramig LO (2009), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering, 56(4):1015-1022

Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM. 'Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection', BioMedical Engineering OnLine 2007, 6:23 (26 June 2007)

Citation Request:

If you use this dataset, please cite the following paper:A Tsanas, MA Little, PE McSharry, LO Ramig (2009)'Accurate telemonitoring of Parkinson's disease progression by non-invasive speech tests', IEEE Transactions on Biomedical Engineering (to appear).

Source: http://archive.ics.uci.edu/ml/datasets/Parkinsons+Telemonitoring
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lo, Veronica B.P.G.; López-Rodríguez, María D.; Metzger, Marc J.; Oteros-Rozas, E.; Cebrián Piqueres, Miguel A.; Ruiz-Mallén, Isabel; March, H.; Raymond, Christopher M.; Lo, Veronica B.P.G.; López-Rodríguez, María D.; Metzger, Marc J.; Oteros-Rozas, E.; Cebrián Piqueres, Miguel A.; Ruiz-Mallén, Isabel; March, H.; Raymond, Christopher M. (2021). Article dataset: 'How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic' [Dataset]. https://recerca.uoc.edu/documentos/668fc459b9e7c03b01bdaaf6

Data from: Article dataset: 'How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic'

Explore at:

Dataset updated

2021

Authors

Description

This dataset includes the raw data of a survey of 38 stakeholders in the region of the Sierra de Guadarrama National Park, Spain (2019). The "Dataset" sheet presents respondents' Likert-scale answers (ranging from 1 to 5) of agreement regarding values, perceived changes and perceived drivers of change of the national park landscapes. The complete methodology is described in: Lo, V.B., López-Rodríguez, M.D., Metzger, M., Oteros-Rozas, E., Cebrián-Piqueras, M. A., Ruiz-Mallén, I., March, H., Raymond, C.M. (in press) ‘How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic.’ People and Nature. Accompanying supplementary data (interview script, tiles and canvasses, coding, follow-up survey questions) are available as supplementary information to this article available for open access on the People and Nature website.

Clear search

Close search

Google apps

Main menu

Data from: Article dataset: 'How stable are visions for protected area...

Ministry of Justice Synthetic Data First Datasets, 2011-2023

Lower layer Super Output Area population estimates (supporting information)

All the Earthquakes Dataset : from 1990-2023

Special Eurobarometer 472: Sport and physical activity

The results by volumes are distributed as follows:

Agri-food CO2 emission dataset - Forecasting ML

Dataset Features:

Importance and Context:

Author note:

Balanced Augmented Covid CXR Dataset

Making of the dataset

Acknowledgement

Citations

Parkinson's Telemonitoring

Description

Summary

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Citation Request:

Source: http://archive.ics.uci.edu/ml/datasets/Parkinsons+Telemonitoring

Data from: Article dataset: 'How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic'