Facebook
TwitterMetadata for the OpenFEMA API data set fields. It contains descriptions, data types, and other attributes for each field.rnrnIf you have media inquiries about this dataset please email the FEMA News Desk FEMA-News-Desk@dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open government program please contact the OpenFEMA team via email OpenFEMA@fema.dhs.gov.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Shreshth Vashisht
Released under Apache 2.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
V2 is out!!! V2
Simple "Reflection" method dataset inspired by mattshumer
This is the prompt and response version. Find ShareGPT version here
This dataset was synthetically generated using Glaive AI.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Comprehensive Mental Health Insights: A Diverse Dataset of 1000 Individuals Across Professions, Countries, and Lifestyles
This dataset provides a rich collection of anonymized mental health data for 1000 individuals, representing a wide range of ages, genders, occupations, and countries. It aims to shed light on the various factors affecting mental health, offering valuable insights into stress levels, sleep patterns, work-life balance, and physical activity.
Key Features: Demographics: The dataset includes individuals from various countries such as the USA, India, the UK, Canada, and Australia. Each entry captures key demographic information such as age, gender, and occupation (e.g., IT, Healthcare, Education, Engineering).
Mental Health Conditions: The dataset contains data on whether the individuals have reported any mental health issues (Yes/No), along with the severity of these conditions categorized into Low, Medium, or High.
Consultation History: For individuals with mental health conditions, the dataset notes whether they have consulted a mental health professional.
Stress Levels: Each individual’s stress level is classified as Low, Medium, or High, providing insights into how different factors such as work hours or sleep may correlate with mental well-being.
Lifestyle Factors: The dataset includes information on sleep duration, work hours per week, and weekly physical activity hours, offering a detailed picture of how lifestyle factors contribute to mental health.
This dataset can be used for research, analysis, or machine learning models to predict mental health trends, uncover correlations between work-life balance and mental well-being, and explore the impact of stress and physical activity on mental health.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Facebook
TwitterThis data set contains a number of variables from collected on children and their parents who took part in the SMILE trial at assessment and follow up. It does not include data on age and gender as we want to be certain that no child or parent can be identified through the data. Researchers can apply to access a fuller data set (https://data.bris.ac.uk/data/dataset/1myzti8qnv48g2sxtx6h5nice7) containing age and gender through application to the University of Bristol's Data Access Committee, please refer to the data access request form (http://bit.ly/data-bris-request) for details on how to apply for access. Complete download (zip, 1.5 MiB)
Facebook
TwitterThis database, compiled by Matthews and Fung (1987), provides information on the distribution and environmental characteristics of natural wetlands. The database was developed to evaluate the role of wetlands in the annual emission of methane from terrestrial sources. The original data consists of five global 1-degree latitude by 1-degree longitude arrays. This subset, for the study area of the Large Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) in South America, retains all five arrays at the 1-degree resolution but only for the area of interest (i.e., longitude 85 deg to 30 deg W, latitude 25 deg S to 10 deg N). The arrays are (1) wetland data source, (2) wetland type, (3) fractional inundation, (4) vegetation type, and (5) soil type. The data subsets are in both ASCII GRID and binary image file formats.The data base is the result of the integration of three independent digital sources: (1) vegetation classified according to the United Nations Educational Scientific and Cultural Organization (UNESCO) system (Matthews, 1983), (2) soil properties from the Food and Agriculture Organization (FAO) soil maps (Zobler, 1986), and (3) fractional inundation in each 1-degree cell compiled from a global map survey of Operational Navigation Charts (ONC). With vegetation, soil, and inundation characteristics of each wetland site identified, the data base has been used for a coherent and systematic estimate of methane emissions from wetlands and for an analysis of the causes for uncertainties in the emission estimate.The complete global data base is available from NASA/GISS [http://www.giss.nasa.gov] and NCAR data set ds765.5 [http://www.ncar.ucar.edu]; the global vegetation types data are available from ORNL DAAC [http://www.daac.ornl.gov].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a preprocessed version of the publicly available MNIST handwritten digit dataset, formatted for use in the research paper "A fast dictionary-learning-based classification scheme using undercomplete dictionaries".The data has been converted into vector form and sorted into .mat files by class label, ranging from 0 to 9. The files are formated as training and testing where the training data has X_train as vectorized images and Y_train as the corresponding labels and X_test and Y_test as the image and labels for testing dataset.**Contents:**X_train_vector_sort_MNISTY_train_MNISTX_test_vector_MNISTY_test_MNIST**Usage:**The dataset is intended for direct use with the code available at:https://github.com/saeedmohseni97/fast-udl-classification
Facebook
Twitterhttps://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Atomic Bose-Einstein condensates (BECs) are widely investigated systems that exhibit quantum phenomena on a macroscopic scale. For example, they can be manipulated to contain solitonic excitations including conventional solitons, vortices, and many more. Broadly speaking, solitonic excitations are solitary waves that retain their size and shape and often propagate at a constant speed. They are present in many systems, at scales ranging from microscopic, to terrestrial and even astronomical. However, unlike naturally occurring physical systems, the parameters governing BECs are under strict experimental control. The enlarged Dark solitons in BECs dataset v.2.0 dataset was created to enable the implementation of machine learning (ML) techniques to automate the analysis of data coming from cold atom experiments. It includes quantitative estimates of all longitudinal solitons quality as well as new fine-grained solitonic excitation categories of all detected excitations. It is freely available to the whole ML and physics community the opportunity to develop novel ML techniques for cold atom systems and to further explore the intersection of ML and quantum physics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Smaller Data Set Auto Labelled is a dataset for object detection tasks - it contains Trees annotations for 472 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterFirst I'm give credits to Raviiloveyou who create the original Taxi trip fare predictor data set.
Modify the Taxi Set to included taxi fares from Philadelphia, PA.
The following costs are calculations have been updated in the dataset to include all fares for taxis
First 1/10 mile (flag drop) or fraction thereof: $2.70
Each additional 1/10 mile or fraction thereof: $0.25
Each 37.6 seconds of wait time: $0.25
Include speed of the taxis in KPH (Kilometers per Hour)
Columns are the following: Trip Duration in second (part of the original data set)
Trip Duration in minutes
Trip Duration in Hours
Distance Traveled in Kilometers (part of the original data set)
KPH speed of the taxis in Kilometers per Hour
Wait Time Cost: Each 37.6 seconds of wait time: $0.25 is taxi time used to get the person to the location
Distance Cost: Each additional 1/10 mile (.1 mile = 0.160934 KM) or fraction thereof: $0.25
Fare w Flag: starting cost is $2.70 added into Wait Time Cost plus Distance Cost
TIP: how much money did the taxi drive get for the trip (part of the original data set)
Miscellaneous fees: part of the original data set
Total Fare New: is the total cost of the trip
Num of passengers: is the number of passengers Note there is no addition cost per passenger for Philadelphia, PA Taxis.
surge applied: (part of the original data set)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Global Retail Sales Data provided here is a self-generated synthetic dataset created using Random Sampling techniques provided by the Numpy Package. The dataset emulates information regarding merchandise sales through a retail website set up by a popular fictional influencer based in the US between the '23-'24 period. The influencer would sell clothing, ornaments and other products at variable rates through the retail website to all of their followers across the world. Imagine that the influencer executes high levels of promotions for the materials they sell, prompting more ratings and reviews from their followers, pushing more user engagement.
This dataset is placed to help with practicing Sentiment Analysis or/and Time Series Analysis of sales, etc. as they are very important topics for Data Analyst prospects. The column description is given as follows:
Order ID: Serves as an identifier for each order made.
Order Date: The date when the order was made.
Product ID: Serves as an identifier for the product that was ordered.
Product Category: Category of Product sold(Clothing, Ornaments, Other).
Buyer Gender: Genders of people that have ordered from the website (Male, Female).
Buyer Age: Ages of the buyers.
Order Location: The city where the order was made from.
International Shipping: Whether the product was shipped internationally or not. (Yes/No)
Sales Price: Price tag for the product.
Shipping Charges: Extra charges for international shipments.
Sales per Unit: Sales cost while including international shipping charges.
Quantity: Quantity of the product bought.
Total Sales: Total sales made through the purchase.
Rating: User rating given for the order.
Review: User review given for the order.
Facebook
Twitteranzhiyu-c/image-data-set dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
The Booking Hotel Listings Dataset provides a structured and in-depth view of accommodations worldwide, offering essential data for travel industry professionals, market analysts, and businesses. This dataset includes key details such as hotel names, locations, star ratings, pricing, availability, room configurations, amenities, guest reviews, sustainability features, and cancellation policies.
With this dataset, users can:
Analyze market trends to understand booking behaviors, pricing dynamics, and seasonal demand.
Enhance travel recommendations by identifying top-rated hotels based on reviews, location, and amenities.
Optimize pricing and revenue strategies by benchmarking property performance and availability patterns.
Assess guest satisfaction through sentiment analysis of ratings and reviews.
Evaluate sustainability efforts by examining eco-friendly features and certifications.
Designed for hospitality businesses, travel platforms, AI-powered recommendation engines, and pricing strategists, this dataset enables data-driven decision-making to improve customer experience and business performance.
Use Cases
Booking Hotel Listings in Greece
Gain insights into Greece’s diverse hospitality landscape, from luxury resorts in Santorini to boutique hotels in Athens. Analyze review scores, availability trends, and traveler preferences to refine booking strategies.
Booking Hotel Listings in Croatia
Explore hotel data across Croatia’s coastal and inland destinations, ideal for travel planners targeting visitors to Dubrovnik, Split, and Plitvice Lakes. This dataset includes review scores, pricing, and sustainability features.
Booking Hotel Listings with Review Scores Greater Than 9
A curated selection of high-rated hotels worldwide, ideal for luxury travel planners and market researchers focused on premium accommodations that consistently exceed guest expectations.
Booking Hotel Listings in France with More Than 1000 Reviews
Analyze well-established and highly reviewed hotels across France, ensuring reliable guest feedback for market insights and customer satisfaction benchmarking.
This dataset serves as an indispensable resource for travel analysts, hospitality businesses, and data-driven decision-makers, providing the intelligence needed to stay competitive in the ever-evolving travel industry.
Facebook
TwitterThis data set contains the Coordinated Energy and Water Cycle Observation Project (CEOP) Enhanced Observing Periods 3 and 4 (EOP-3 and 4) CEOP Asia-Australia Monsoon Project (CAMP) Northeast Thailand Hourly Surface Data Set. This dataset contains the complete EOP-3 and EOP-4 time periods (i.e., 1 October 2002 through 31 December 2004). This data set contains both ASCII data and netCDF data. The ASCII data file covers the entire time period for all stations. The netCDF data file covers the entire time period with one netCDF file for each station.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains three healthcare datasets in Hindi and Punjabi, translated from English. The datasets cover medical diagnoses, disease names, and related healthcare information. The data has been carefully cleaned and formatted to ensure accuracy and usability for various applications, including machine learning, NLP, and healthcare analysis.
Diagnosis: Description of the medical condition or disease. Symptoms: List of symptoms associated with the diagnosis. Treatment: Common treatments or recommended procedures. Severity: Severity level of the disease (e.g., mild, moderate, severe). Risk Factors: Known risk factors associated with the condition. Language: Specifies the language of the dataset (Hindi, Punjabi, or English). The purpose of these datasets is to facilitate research and development in regional language processing, especially in the healthcare sector.
Column Descriptions: Original Data Columns: patient_id – Unique identifier for each patient. age – Age of the patient. gender – Gender of the patient (e.g., Male/Female/Other). Diagnosis – The diagnosed medical condition or disease. Remarks – Additional notes or comments from the doctor. doctor_id – Unique identifier for the doctor treating the patient. Patient History – Medical history of the patient, including previous conditions. age_group – Categorized age group (e.g., Child, Adult, Senior). gender_numeric – Numeric encoding for gender (e.g., 0 = Female, 1 = Male). symptoms – List of symptoms reported by the patient. treatment – Recommended treatment or medication. timespan – Duration of the illness or treatment period. Diagnosis Category – General category of the diagnosis (e.g., Cardiovascular, Neurological). Pseudonymized Data Columns: These columns replace personally identifiable information with anonymized versions for privacy compliance:
Pseudonymized_patient_id – An anonymized patient identifier. Pseudonymized_age – Anonymized age value. Pseudonymized_gender – Anonymized gender field. Pseudonymized_Diagnosis – Diagnosis field with anonymized identifiers. Pseudonymized_Remarks – Anonymized doctor notes. Pseudonymized_doctor_id – Anonymized doctor identifier. Pseudonymized_Patient History – Anonymized version of patient history. Pseudonymized_age_group – Anonymized version of age groups. Pseudonymized_gender_numeric – Anonymized numeric encoding of gender. Pseudonymized_symptoms – Anonymized symptom descriptions. Pseudonymized_treatment – Anonymized treatment descriptions. Pseudonymized_timespan – Anonymized illness/treatment duration. Pseudonymized_Diagnosis Category – Anonymized category of diagnosis.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset complements the following study: Code problem similarity detection using code clones and pretrained models (SCSE22-0384). This study explores a new approach of detecting similar algorithmic-style code problems from websites such as LeetCode and Codeforces, by comparing the similarity of the solution source codes, an application of type IV code clone detection. It is based on 107,000 submissions in 3 different languages (Python, C++ and Java) from 3,000 problems on Codeforces between 2020 to 2023. Experiments were carried out using 3 different pre-trained models on this dataset (C4-CodeBERT, GraphCodeBERT, UniXcoder). UniXcoder performed the best with an F1 score of 0.905. As such, UniXcoder was used as the backbone of the code problem similarity checker (CPSC) which is used to identify the top similar problems (out of all the problems in the dataset) to an input source code. Based on the tests conducted in this project, his approach achieves state-of-the-art results when it comes to detecting similarity between various code problems. More research can be done, in domains where type IV code clone detection can be useful.
Facebook
TwitterMetadata for the OpenFEMA API data set fields. It contains descriptions, data types, and other attributes for each field.rnrnIf you have media inquiries about this dataset please email the FEMA News Desk FEMA-News-Desk@dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open government program please contact the OpenFEMA team via email OpenFEMA@fema.dhs.gov.