Facebook
TwitterMetadata for the OpenFEMA API data set fields. It contains descriptions, data types, and other attributes for each field.rnrnIf you have media inquiries about this dataset please email the FEMA News Desk FEMA-News-Desk@dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open government program please contact the OpenFEMA team via email OpenFEMA@fema.dhs.gov.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
All publishing, licensing, etc. credit goes to the CDC. Thank you CDC for maintaining public health datasets.
The directory contains over 2,000 CSV files that are publicly available as of 1/28/2025.
The datasets were released by the CDC. You can find the original datasets at data.cdc.gov.
Files downloaded from archive.org.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of OR tables between the interaction of rs7522462 and rs11945978 in the WTCCC data with the shared controls (left) and the interaction of the proxy SNPs, rs296533 and rs2089509 in the IBDGC data (right). The legend to this table is the same as that of Table 3.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset, titled "Anabolic Steroids", provides a meticulously curated compilation of nearly 50 steroids. It includes detailed information on their original names, common names, medicinal applications, abuse potential, side effects, historical context, and relative molecular mass (RMM). The dataset aims to serve as a resource for exploring the dual nature of anabolic steroids—both their therapeutic benefits and their misuse in sports and bodybuilding.
Anabolic steroids are synthetic derivatives of testosterone that have been used for decades in medicine to treat conditions like anemia, muscle-wasting diseases, and hormone deficiencies. However, they are also widely abused for performance enhancement and aesthetic purposes. This dataset captures a comprehensive view of these compounds, making it valuable for researchers, educators, and data enthusiasts.
While this dataset is relatively small (approx 50 entries), it offers rich opportunities for exploratory analysis and domain-specific insights. Potential applications include:
Exploratory Data Analysis (EDA):
Domain-Specific Insights:
Educational Use:
This dataset has been ethically compiled from publicly available sources such as scientific journals, chemical databases, and educational websites. No proprietary or confidential information has been included. The data was aggregated to ensure accuracy and relevance while respecting intellectual property rights.
The following sources were instrumental in compiling this dataset: 1. PubChem Database – For verifying chemical properties and molecular mass values. 2. Wikipedia – For historical context and general information on anabolic steroids. 3. NIST Chemistry WebBook – For accurate molecular mass values and chemical details. 4. Scientific Journals – Referenced for medicinal uses, side effects documentation, and abuse patterns. 5. DALL·E 3 by OpenAI – Used to generate illustrative images related to anabolic steroids to complement dataset visualizations.
The misuse of anabolic steroids poses significant health risks and ethical concerns. While anabolic steroids have legitimate medical applications, their abuse for performance enhancement or aesthetic purposes can lead to severe physical and psychological side effects. Common adverse effects include liver damage, cardiovascular strain, hormonal imbalances, infertility, aggression, and mental health issues such as depression. Prolonged misuse can also result in irreversible damage to vital organs and an increased risk of life-threatening conditions like heart attacks or strokes. Beyond individual health risks, steroid abuse undermines the integrity of sports and creates unfair advantages in competitive environments. It is crucial to prioritize natural methods of achieving fitness goals and seek professional guidance for any medical conditions requiring treatment.
This dataset is not intended for machine learning due to its small size but serves as an excellent resource for exploratory data analysis (EDA), visualization projects, and domain-specific research into anabolic steroids' pharmacology and societal impact.
Facebook
Twitterhttps://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
This dataset is the Version 2.0 of microsoft/FStarDataSet.
Primary-Objective
This dataset's primary objective is to train and evaluate Proof-oriented Programming with AI (PoPAI, in short). Given a specification of a program and proof in F*, the objective of a AI model is to synthesize the implemantation (see below for details about the usage of this dataset, including the input and output).
Data Format
Each of the examples in this dataset are organized as dictionaries… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/FStarDataSet-V2.
Facebook
TwitterDataset Card for "AI-Generated-vs-Real-Images-Datasets"
More Information needed
Facebook
Twitterthe Department of Energy’s Enterprise Project Management Organization (EPMO), providing leadership and assistance in developing and implementing DOE-wide policies, procedures, programs, and management systems pertaining to project management, and independently monitors, assesses, and reports on project execution performance. The office validates project performance baselines–scope, cost and schedule–of the Department’s largest construction and environmental clean-up projects prior to budget request to Congress—an active project portfolio totaling over $30 billion. The office also serves as Executive Secretariat for the Department’s Energy Systems Acquisition Advisory Board (ESAAB) and the Project Management Risk Committee (PMRC). In these capacities, the Director is accountable to the Deputy Secretary.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this data set, 6 objects including 2 targets and 4 non-targets lay on the sea sand bottom. Upon this experiment, the transmitted signal is Wide-Band Linear Frequency Modulated Pulse (WLFM) which covers frequency range 5-110 KHz. Targets lay on the bottom rotate 180 degrees with 1 degree accuracy via electromotor. Off target to 10 meters backscattered echoes are accumulated. Fine dataset takes key role in sonar target classification. Regarding massive raw data obtained from previous stage, above massive calculation will be expected. To reduce calculation burden relating to classifying and extracting feature, it is essential to detect targets out of total received data. To implement this, the intensity of the received signal is used. It is inevitable to consider multi-path propagation, secondary reflections, and reverberation due to shoal of the region. The researcher attempts to eliminate artifact tract after detecting stage and before extracting feature by the use of a matched filter.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current version of the TOD (Task Oriented Dialogues) fused dataset contains samples from MultiWOZ2.2 (Zang et al., 2020), SpokenWOZ (Si et al., 2023), FRAMES (Asri et al., 2017), DSTC3 (Henderson et al., 2014a) and SGD (Rastogi et al., 2020) datasets. These datasets have been selected due to them all being high quality, with significant human validation and data cleaning. Additionally, this selection of datasets provides coverage across unique attributes, such as utterance-level audio files (Si et al., 2023).
The fused dataset requires several domains, necessitated by the scope of ELOQUENCE project (https://eloquenceai.eu) and the individual pilots. These datasets are stored using the ‘.arrow’ file extension so that speed and efficiency of data loading is optimised, as well as being compliant with the popular HuggingFace dataset library (HuggingFace, 2024). The dataset is also available at https://huggingface.co/datasets/Brunel-AI/ELOQUENCE. Currently, several datasets have been implemented within this fused dataset. However, due to the flexibility with which the schema has been defined, there is scope for additional datasets to be implemented across later iterations as further needs are identified. The JSON schema, as well as further explanation for attributes across all domains, is provided within Appendix 10.1 in ELOQUENCE deliverable 1.1.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Global Retail Sales Data provided here is a self-generated synthetic dataset created using Random Sampling techniques provided by the Numpy Package. The dataset emulates information regarding merchandise sales through a retail website set up by a popular fictional influencer based in the US between the '23-'24 period. The influencer would sell clothing, ornaments and other products at variable rates through the retail website to all of their followers across the world. Imagine that the influencer executes high levels of promotions for the materials they sell, prompting more ratings and reviews from their followers, pushing more user engagement.
This dataset is placed to help with practicing Sentiment Analysis or/and Time Series Analysis of sales, etc. as they are very important topics for Data Analyst prospects. The column description is given as follows:
Order ID: Serves as an identifier for each order made.
Order Date: The date when the order was made.
Product ID: Serves as an identifier for the product that was ordered.
Product Category: Category of Product sold(Clothing, Ornaments, Other).
Buyer Gender: Genders of people that have ordered from the website (Male, Female).
Buyer Age: Ages of the buyers.
Order Location: The city where the order was made from.
International Shipping: Whether the product was shipped internationally or not. (Yes/No)
Sales Price: Price tag for the product.
Shipping Charges: Extra charges for international shipments.
Sales per Unit: Sales cost while including international shipping charges.
Quantity: Quantity of the product bought.
Total Sales: Total sales made through the purchase.
Rating: User rating given for the order.
Review: User review given for the order.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Smaller Data Set Auto Labelled is a dataset for object detection tasks - it contains Trees annotations for 472 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
Facebook
Twitteranzhiyu-c/image-data-set dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThe Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data retrieval needs: facility data from the Integrated Compliance Information System for Clean Water Act permitted dischargers, under the National Pollutant Discharge Elimination System (NPDES).
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Comprehensive Mental Health Insights: A Diverse Dataset of 1000 Individuals Across Professions, Countries, and Lifestyles
This dataset provides a rich collection of anonymized mental health data for 1000 individuals, representing a wide range of ages, genders, occupations, and countries. It aims to shed light on the various factors affecting mental health, offering valuable insights into stress levels, sleep patterns, work-life balance, and physical activity.
Key Features: Demographics: The dataset includes individuals from various countries such as the USA, India, the UK, Canada, and Australia. Each entry captures key demographic information such as age, gender, and occupation (e.g., IT, Healthcare, Education, Engineering).
Mental Health Conditions: The dataset contains data on whether the individuals have reported any mental health issues (Yes/No), along with the severity of these conditions categorized into Low, Medium, or High.
Consultation History: For individuals with mental health conditions, the dataset notes whether they have consulted a mental health professional.
Stress Levels: Each individual’s stress level is classified as Low, Medium, or High, providing insights into how different factors such as work hours or sleep may correlate with mental well-being.
Lifestyle Factors: The dataset includes information on sleep duration, work hours per week, and weekly physical activity hours, offering a detailed picture of how lifestyle factors contribute to mental health.
This dataset can be used for research, analysis, or machine learning models to predict mental health trends, uncover correlations between work-life balance and mental well-being, and explore the impact of stress and physical activity on mental health.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
TwitterThe table concept_relationship is part of the dataset MedAlign, available at https://stanford.redivis.com/datasets/48nr-frxd97exb. It contains 58831134 rows across 8 variables.
Facebook
TwitterThe National Flood Hazard Layer (NFHL) data incorporates all Digital Flood Insurance Rate Map(DFIRM) databases published by FEMA, and any Letters Of Map Revision (LOMRs) that have been issued against those databases since their publication date. The DFIRM Database is the digital, geospatial version of the flood hazard information shown on the published paper Flood Insurance Rate Maps(FIRMs). The primary risk classifications used are the 1-percent-annual-chance flood event, the 0.2-percent-annual-chance flood event, and areas of minimal flood risk. The NFHL data are derived from Flood Insurance Studies (FISs), previously published Flood Insurance Rate Maps (FIRMs), flood hazard analyses performed in support of the FISs and FIRMs, and new mapping data where available. The FISs and FIRMs are published by the Federal Emergency Management Agency (FEMA). The specifications for the horizontal control of DFIRM data are consistent with those required for mapping at a scale of 1:12,000. The NFHL data contain layers in the Standard DFIRM datasets except for S_Label_Pt and S_Label_Ld. The NFHL is available as State or US Territory data sets. Each State or Territory data set consists of all DFIRMs and corresponding LOMRs available on the publication date of the data set.
Facebook
TwitterThe Cassini Radio Science Enceladus Gravity Science Experiment (ENGR8) Raw Data Archive is a time-ordered collection of radio science raw data acquired on May 1, 2, and 3, 2012, during the Cassini Extended Mission.
Facebook
TwitterMetadata for the OpenFEMA API data set fields. It contains descriptions, data types, and other attributes for each field.rnrnIf you have media inquiries about this dataset please email the FEMA News Desk FEMA-News-Desk@dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open government program please contact the OpenFEMA team via email OpenFEMA@fema.dhs.gov.