Facebook
TwitterThis dataset includes the raw data of a survey of 38 stakeholders in the region of the Sierra de Guadarrama National Park, Spain (2019). The "Dataset" sheet presents respondents' Likert-scale answers (ranging from 1 to 5) of agreement regarding values, perceived changes and perceived drivers of change of the national park landscapes. The complete methodology is described in: Lo, V.B., López-Rodríguez, M.D., Metzger, M., Oteros-Rozas, E., Cebrián-Piqueras, M. A., Ruiz-Mallén, I., March, H., Raymond, C.M. (in press) ‘How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic.’ People and Nature. Accompanying supplementary data (interview script, tiles and canvasses, coding, follow-up survey questions) are available as supplementary information to this article available for open access on the People and Nature website.
Facebook
TwitterThis study includes a synthetically-generated version of the Ministry of Justice Data First Family Court datasets. Synthetic versions of all 43 tables in the MoJ Data First data ecosystem have been created. These versions can be used / joined in the same way as the real datasets. As well as underpinning training, synthetic datasets should enable researchers to explore research questions and to design research proposals prior to submitting these for approval. The code created during this exploration and design process should then enable initial results to be obtained as soon as data access is granted.
The Ministry of Justice Data First family court dataset provides data on cases heard by the family court in England and Wales, and the people involved, from 2011, and has been extracted from the FamilyMan management information system, used by His Majesty's Courts and Tribunals Service (HMCTS) to manage cases within the family courts (County Courts).
Information is included on individual divorce and Family Law Act, adoption, private and public law cases, the people involved as parties to the case (their role and characteristics), key case dates, processes and outcomes.
There are three tables for cases, people and events, which can be joined together. A case will usually have multiple people involved (for example an applicant and respondent) and may have many events, (for example hearings, applications and orders made by the court) which are each included as a separate record. These depend on the type of case and its progress.
As part of Data First, records have been de-identified and deduplicated, using our probabilistic record linkage package, Splink, so that a unique identifier is assigned to all records believed to relate to the same person, allowing for longitudinal analysis and investigation of repeat appearances. This opens up the potential to address questions on, for example, common transitions between family law case types and patterns associated with repeat use of the family court system.
The Ministry of Justice Data First linking dataset can be used in combination with this and other Data First datasets to join up administrative records about people from across justice services to increase understanding around users’ interactions, pathways and outcomes.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Mid-year (30 June) estimates of the usual resident population for Lower layer Super Output Areas (LSOAs) in England and Wales by single year of age and sex.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description of Earthquakes Dataset (1990-2023)
The earthquakes dataset is an extensive collection of data containing information about all the earthquakes recorded worldwide from 1990 to 2023. The dataset comprises approximately three million rows, with each row representing a specific earthquake event. Each entry in the dataset contains a set of relevant attributes related to the earthquake, such as the date and time of the event, the geographical location (latitude and longitude), the magnitude of the earthquake, the depth of the epicenter, the type of magnitude used for measurement, the affected region, and other pertinent information.
Features
- time in millisecconds
- place
- status
- tsunami (boolean value)
- significance
- data_type
- magnitudo
- state
- longitude
- latitude
- depth
- date
Importance and Utility of the Dataset:
Earthquake Analysis and Prediction: The dataset provides a valuable data source for scientists and researchers interested in analyzing spatial and temporal distribution patterns of earthquakes. By studying historical data, trends, and patterns, it becomes possible to identify high-risk seismic zones and develop predictive models to forecast future seismic events more accurately.
Safety and Prevention: Understanding factors contributing to earthquake frequency and severity can assist authorities and safety experts in implementing preventive measures at both local and global levels. These data can enhance the design and construction of earthquake-resistant infrastructures, reducing material damage and safeguarding human lives.
Seismological Science: The dataset offers a critical resource for seismologists and geologists studying the dynamics of the Earth's crust and various geological faults. Analyzing details of recorded earthquakes allows for a deeper comprehension of geological processes leading to seismic activity.
Study of Tectonic Movements: The dataset can be utilized to analyze patterns of tectonic movements in specific areas over the years. This may help identify seasonal or long-term seismic activity, providing additional insights into plate tectonic behavior.
Public Information and Awareness: Earthquake data can be made accessible to the public through portals and applications, enabling individuals to monitor seismic activity in their regions of interest and promoting awareness and preparedness for earthquakes.
In summary, the earthquakes dataset represents a fundamental information source for scientific research, public safety, and community awareness. By analyzing historical data and building predictive models, this dataset can significantly contribute to mitigating seismic risks and protecting people and infrastructure from the consequences of earthquakes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The latest Eurobarometer on sport and physical activity follows three previous surveys conducted in 2002, 2009 and 2013. It was carried out in the 28 EU Member States in December 2017 and 28,031 EU citizens from different social and demographic categories were interviewed. The survey looked at frequency and levels of engagement in sport and other physical activity, for example the amount of time people spend doing vigorous and moderate physical activity, as well as walking and sitting down. It also took into consideration activities such as cycling, dancing or gardening. The survey also focused on where EU citizens engage in sport and other physical activity, whether in a club or in informal settings such as outdoors or on the way to/from work. Finally, it looked at the reasons why people engage in sport and other physical activity, as well as the barriers to practising sport more regularly and what kind of opportunities or support from local authorities they could get in their area. A final chapter is then dedicated to volunteering in sport.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The agricultural CO2 emission dataset has been constructed by merging and reprocessing approximately a dozen individual datasets from the Food and Agriculture Organization (FAO) and data from IPCC. These datasets were, cleaned, preprocessed and merged together to create a comprehensive and cohesive dataset for analysis and forecasting purposes.
The dataset, as demonstrated in the notebook, describes CO2 emissions related to agri-food, which amount to approximately 62% of the global annual emissions.
Indeed, the emissions from the agri-food sector are significant when studying climate change. As the dataset shows, these emissions contribute to a substantial portion of the global annual emissions. Understanding and addressing the environmental impact of the agri-food industry is crucial for mitigating climate change and developing sustainable practices within this sector.
For a better understanding of the dataset, I have written a notebook where I perform an analysis of the relationship between emissions, climate change and geografic Area. Additionally, I provide an example of regression to predict the percentage variations in temperatures.
The agricultural sector contributes to approximately, how i'll demostrate in my notebook, 62% of the total global CO2 emissions, making it a significant contributor to climate change. This dataset plays a crucial role in understanding and monitoring the impact of agricultural activities on CO2 emissions. By leveraging machine learning techniques, it enables the forecasting of future emissions, allowing policymakers and researchers to develop targeted strategies and interventions for sustainable agricultural practices. This dataset serves as a valuable resource for climate scientists, environmental researchers, and policymakers striving to mitigate the environmental impact of the agricultural sector.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus.
Most people infected with the virus will experience mild to moderate respiratory illness and recover without requiring special treatment. However, some will become seriously ill and require medical attention. Older people and those with underlying medical conditions like cardiovascular disease, diabetes, chronic respiratory disease, or cancer are more likely to develop serious illness. . Anyone can get sick with COVID-19 and become seriously ill or die at any age.
We have used a readily available Chest X-Ray dataset which was very much imbalanced. It was having four classes Covid, Normal, Lung Opacity (LO), and Viral Pneumonia (VP) which had 3616, 10,192, 6012, and 1345 images respectively. We have employed under-sampling (RUS) and over-sampling (or, data-augmentation) by image processing techniques to make the dataset perfectly balanced. The entire scheme is shown in the following figure.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4636937%2F22792aa085df2031a5f2b90347ac51a9%2FCapture.PNG?generation=1662563867428011&alt=media" alt="Illustration image">
First, a novel SVD-based image processing technique is employed for minor classes VP and Covid. This SVD-based image processing technique produces images, that have a little bit of different luminance and contrast. Moreover, CLAHE 0.5 is deployed all over the dataset since it enhances the features from the CXR dataset. We have empirically chosen CLAHE 0.5 and CLAHE 1.0 (for VP) so that there will not be excess contrast enhancement. This is to clarify that we have chosen the numbers 8769, 7662, 8192, and 5410 for Covid, LO, Normal, and VP classes respectively for only training purposes. The testing images are given in different folder which is not pre-processed. In our Augmented CXR dataset, the number of images per class is not exactly the same. Indeed, we have observed that for the Covid class, the statistics of the images are very dissimilar. That means the intra-class variance of the Covid class is considerably higher than other classes. Therefore, the number of such images for the Covid class should be a little bit higher for better convergence of the CNN model. For justification, we utilized a correlation coefficient in order to compute the mean intra-class variance for each class and have found the ratio of the images we have taken in this augmented dataset is very similar to their intra-class variance. Hence, we believe that this augmented dataset is more balanced than the original dataset.
Cover Photo by Umanoide from Unsplash
If you want to use our dataset, please cite our paper -> https://www.sciencedirect.com/science/article/pii/S0010482522008009?via%3Dihub
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Oxford Parkinson's Disease Telemonitoring Dataset
The dataset was created by Athanasios Tsanas (tsanasthanasis@gmail.com) and Max Little (littlem@physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers in the US and Intel Corporation who developed the telemonitoring device to record the speech signals. The original study used a range of linear and nonlinear regression methods to predict the clinician's Parkinson's disease symptom score on the UPDRS scale.
This dataset is composed of a range of biomedical voice measurements from 42 people with early-stage Parkinson's disease recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. The recordings were automatically captured in the patient's homes.
Columns in the table contain subject number, subject age, subject gender, time interval from baseline recruitment date, motor UPDRS, total UPDRS, and 16 biomedical voice measures. Each row corresponds to one of 5,875 voice recording from these individuals. The main aim of the data is to predict the motor and total UPDRS scores ('motor_UPDRS' and 'total_UPDRS') from the 16 voice measures.
The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around 200 recordings per patient, the subject number of the patient is identified in the first column. For further information or to pass on comments, please contact Athanasios Tsanas (tsanasthanasis@gmail.com) or Max Little (littlem@physics.ox.ac.uk).
Further details are contained in the following reference * if you use this dataset, please cite: Athanasios Tsanas, Max A. Little, Patrick E. McSharry, Lorraine O. Ramig (2009), 'Accurate telemonitoring of Parkinson's disease progression by non-invasive speech tests', IEEE Transactions on Biomedical Engineering (to appear). Further details about the biomedical voice measures can be found in: Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2009), "Suitability of dysphonia measurements for telemonitoring of Parkinson's disease", IEEE Transactions on Biomedical Engineering, 56(4):1015-1022
subject# - Integer that uniquely identifies each subject age - Subject age sex - Subject gender '0' - male, '1' - female test_time - Time since recruitment into the trial. The integer part is the number of days since recruitment. motor_UPDRS - Clinician's motor UPDRS score, linearly interpolated total_UPDRS - Clinician's total UPDRS score, linearly interpolated Jitter(%),Jitter(Abs),Jitter:RAP,Jitter:PPQ5,Jitter:DDP - Several measures of variation in fundamental frequency Shimmer,Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,Shimmer:APQ11,Shimmer:DDA - Several measures of variation in amplitude NHR,HNR - Two measures of ratio of noise to tonal components in the voice RPDE - A nonlinear dynamical complexity measure DFA - Signal fractal scaling exponent PPE - A nonlinear measure of fundamental frequency variation
If you use this dataset, please cite the following paper:A Tsanas, MA Little, PE McSharry, LO Ramig (2009)'Accurate telemonitoring of Parkinson's disease progression by non-invasive speech tests', IEEE Transactions on Biomedical Engineering (to appear).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis dataset includes the raw data of a survey of 38 stakeholders in the region of the Sierra de Guadarrama National Park, Spain (2019). The "Dataset" sheet presents respondents' Likert-scale answers (ranging from 1 to 5) of agreement regarding values, perceived changes and perceived drivers of change of the national park landscapes. The complete methodology is described in: Lo, V.B., López-Rodríguez, M.D., Metzger, M., Oteros-Rozas, E., Cebrián-Piqueras, M. A., Ruiz-Mallén, I., March, H., Raymond, C.M. (in press) ‘How stable are visions for protected area management? Stakeholder perspectives before and during a pandemic.’ People and Nature. Accompanying supplementary data (interview script, tiles and canvasses, coding, follow-up survey questions) are available as supplementary information to this article available for open access on the People and Nature website.