100+ datasets found
  1. f

    Comparison of OR tables between two datasets for one CD interaction.

    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Liu; Haiming Xu; Suchao Chen; Xianfeng Chen; Zhenguo Zhang; Zhihong Zhu; Xueying Qin; Landian Hu; Jun Zhu; Guo-Ping Zhao; Xiangyin Kong (2023). Comparison of OR tables between two datasets for one CD interaction. [Dataset]. http://doi.org/10.1371/journal.pgen.1001338.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS Genetics
    Authors
    Yang Liu; Haiming Xu; Suchao Chen; Xianfeng Chen; Zhenguo Zhang; Zhihong Zhu; Xueying Qin; Landian Hu; Jun Zhu; Guo-Ping Zhao; Xiangyin Kong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of OR tables between the interaction of rs7522462 and rs11945978 in the WTCCC data with the shared controls (left) and the interaction of the proxy SNPs, rs296533 and rs2089509 in the IBDGC data (right). The legend to this table is the same as that of Table 3.

  2. Anabolic Steroids Dataset

    • kaggle.com
    zip
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanchana1990 (2024). Anabolic Steroids Dataset [Dataset]. https://www.kaggle.com/datasets/kanchana1990/anabolic-steroids-dataset
    Explore at:
    zip(2487 bytes)Available download formats
    Dataset updated
    Dec 23, 2024
    Authors
    Kanchana1990
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Dataset Overview

    This dataset, titled "Anabolic Steroids", provides a meticulously curated compilation of nearly 50 steroids. It includes detailed information on their original names, common names, medicinal applications, abuse potential, side effects, historical context, and relative molecular mass (RMM). The dataset aims to serve as a resource for exploring the dual nature of anabolic steroids—both their therapeutic benefits and their misuse in sports and bodybuilding.

    Anabolic steroids are synthetic derivatives of testosterone that have been used for decades in medicine to treat conditions like anemia, muscle-wasting diseases, and hormone deficiencies. However, they are also widely abused for performance enhancement and aesthetic purposes. This dataset captures a comprehensive view of these compounds, making it valuable for researchers, educators, and data enthusiasts.

    Data Science Applications

    While this dataset is relatively small (approx 50 entries), it offers rich opportunities for exploratory analysis and domain-specific insights. Potential applications include:

    • Exploratory Data Analysis (EDA):

      • Analyze trends in medicinal vs. non-medicinal use.
      • Study correlations between molecular mass and reported side effects.
      • Visualize the historical development of anabolic steroids over time.
    • Domain-Specific Insights:

      • Examine the evolution of steroid formulations from the 1930s to the present.
      • Investigate patterns in therapeutic uses versus abuse potential.
    • Educational Use:

      • Serve as a teaching tool for understanding data cleaning, visualization, and analysis.
      • Provide insights into the pharmacological and chemical properties of anabolic steroids.

    Column Descriptors

    1. Original Name: The scientific or chemical name of the steroid compound (e.g., Testosterone).
    2. Common Name: The popular or brand name under which the steroid is marketed (e.g., Testoviron).
    3. Medicinal Use: Approved therapeutic applications of the steroid (e.g., treating anemia or hormone replacement therapy).
    4. Abused For: Non-medical uses often associated with performance enhancement or bodybuilding (e.g., bulking cycles, lean muscle retention).
    5. Side Effects: Documented adverse effects resulting from steroid use or abuse (e.g., liver toxicity, gynecomastia).
    6. History: A brief historical context about the steroid's development or usage (e.g., year introduced, medical approval status).
    7. Relative Molecular Mass (g/mol): The molar mass of the steroid compound, useful for chemical analysis.

    Ethically Mined Data

    This dataset has been ethically compiled from publicly available sources such as scientific journals, chemical databases, and educational websites. No proprietary or confidential information has been included. The data was aggregated to ensure accuracy and relevance while respecting intellectual property rights.

    Acknowledgements

    The following sources were instrumental in compiling this dataset: 1. PubChem Database – For verifying chemical properties and molecular mass values. 2. Wikipedia – For historical context and general information on anabolic steroids. 3. NIST Chemistry WebBook – For accurate molecular mass values and chemical details. 4. Scientific Journals – Referenced for medicinal uses, side effects documentation, and abuse patterns. 5. DALL·E 3 by OpenAI – Used to generate illustrative images related to anabolic steroids to complement dataset visualizations.

    Discouraging Steroid Usage and Highlighting Harms

    The misuse of anabolic steroids poses significant health risks and ethical concerns. While anabolic steroids have legitimate medical applications, their abuse for performance enhancement or aesthetic purposes can lead to severe physical and psychological side effects. Common adverse effects include liver damage, cardiovascular strain, hormonal imbalances, infertility, aggression, and mental health issues such as depression. Prolonged misuse can also result in irreversible damage to vital organs and an increased risk of life-threatening conditions like heart attacks or strokes. Beyond individual health risks, steroid abuse undermines the integrity of sports and creates unfair advantages in competitive environments. It is crucial to prioritize natural methods of achieving fitness goals and seek professional guidance for any medical conditions requiring treatment.

    Notes for Kaggle Users

    This dataset is not intended for machine learning due to its small size but serves as an excellent resource for exploratory data analysis (EDA), visualization projects, and domain-specific research into anabolic steroids' pharmacology and societal impact.

  3. Hydrographic and Impairment Statistics Database: THRB

    • catalog.data.gov
    • datasets.ai
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2025). Hydrographic and Impairment Statistics Database: THRB [Dataset]. https://catalog.data.gov/dataset/hydrographic-and-impairment-statistics-database-thrb
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    National Park Servicehttp://www.nps.gov/
    Description

    Hydrographic and Impairment Statistics (HIS) is a National Park Service (NPS) Water Resources Division (WRD) project established to track certain goals created in response to the Government Performance and Results Act of 1993 (GPRA). One water resources management goal established by the Department of the Interior under GRPA requires NPS to track the percent of its managed surface waters that are meeting Clean Water Act (CWA) water quality standards. This goal requires an accurate inventory that spatially quantifies the surface water hydrography that each bureau manages and a procedure to determine and track which waterbodies are or are not meeting water quality standards as outlined by Section 303(d) of the CWA. This project helps meet this DOI GRPA goal by inventorying and monitoring in a geographic information system for the NPS: (1) CWA 303(d) quality impaired waters and causes; and (2) hydrographic statistics based on the United States Geological Survey (USGS) National Hydrography Dataset (NHD). Hydrographic and 303(d) impairment statistics were evaluated based on a combination of 1:24,000 (NHD) and finer scale data (frequently provided by state GIS layers).

  4. Mental Health Dataset

    • kaggle.com
    zip
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhadra Mohit (2024). Mental Health Dataset [Dataset]. https://www.kaggle.com/datasets/bhadramohit/mental-health-dataset
    Explore at:
    zip(13276 bytes)Available download formats
    Dataset updated
    Oct 22, 2024
    Authors
    Bhadra Mohit
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Comprehensive Mental Health Insights: A Diverse Dataset of 1000 Individuals Across Professions, Countries, and Lifestyles

    This dataset provides a rich collection of anonymized mental health data for 1000 individuals, representing a wide range of ages, genders, occupations, and countries. It aims to shed light on the various factors affecting mental health, offering valuable insights into stress levels, sleep patterns, work-life balance, and physical activity.

    Key Features: Demographics: The dataset includes individuals from various countries such as the USA, India, the UK, Canada, and Australia. Each entry captures key demographic information such as age, gender, and occupation (e.g., IT, Healthcare, Education, Engineering).

    Mental Health Conditions: The dataset contains data on whether the individuals have reported any mental health issues (Yes/No), along with the severity of these conditions categorized into Low, Medium, or High.

    Consultation History: For individuals with mental health conditions, the dataset notes whether they have consulted a mental health professional.

    Stress Levels: Each individual’s stress level is classified as Low, Medium, or High, providing insights into how different factors such as work hours or sleep may correlate with mental well-being.

    Lifestyle Factors: The dataset includes information on sleep duration, work hours per week, and weekly physical activity hours, offering a detailed picture of how lifestyle factors contribute to mental health.

    This dataset can be used for research, analysis, or machine learning models to predict mental health trends, uncover correlations between work-life balance and mental well-being, and explore the impact of stress and physical activity on mental health.

  5. Multilingual Healthcare Text Dataset (Hi, En, Pu)

    • kaggle.com
    zip
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kajol Bagga (2025). Multilingual Healthcare Text Dataset (Hi, En, Pu) [Dataset]. https://www.kaggle.com/datasets/kajolagga/multilingual-healthcare-text-dataset-hi-en-pu
    Explore at:
    zip(421647 bytes)Available download formats
    Dataset updated
    Feb 13, 2025
    Authors
    Kajol Bagga
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains three healthcare datasets in Hindi and Punjabi, translated from English. The datasets cover medical diagnoses, disease names, and related healthcare information. The data has been carefully cleaned and formatted to ensure accuracy and usability for various applications, including machine learning, NLP, and healthcare analysis.

    Diagnosis: Description of the medical condition or disease. Symptoms: List of symptoms associated with the diagnosis. Treatment: Common treatments or recommended procedures. Severity: Severity level of the disease (e.g., mild, moderate, severe). Risk Factors: Known risk factors associated with the condition. Language: Specifies the language of the dataset (Hindi, Punjabi, or English). The purpose of these datasets is to facilitate research and development in regional language processing, especially in the healthcare sector.

    Column Descriptions: Original Data Columns: patient_id – Unique identifier for each patient. age – Age of the patient. gender – Gender of the patient (e.g., Male/Female/Other). Diagnosis – The diagnosed medical condition or disease. Remarks – Additional notes or comments from the doctor. doctor_id – Unique identifier for the doctor treating the patient. Patient History – Medical history of the patient, including previous conditions. age_group – Categorized age group (e.g., Child, Adult, Senior). gender_numeric – Numeric encoding for gender (e.g., 0 = Female, 1 = Male). symptoms – List of symptoms reported by the patient. treatment – Recommended treatment or medication. timespan – Duration of the illness or treatment period. Diagnosis Category – General category of the diagnosis (e.g., Cardiovascular, Neurological). Pseudonymized Data Columns: These columns replace personally identifiable information with anonymized versions for privacy compliance:

    Pseudonymized_patient_id – An anonymized patient identifier. Pseudonymized_age – Anonymized age value. Pseudonymized_gender – Anonymized gender field. Pseudonymized_Diagnosis – Diagnosis field with anonymized identifiers. Pseudonymized_Remarks – Anonymized doctor notes. Pseudonymized_doctor_id – Anonymized doctor identifier. Pseudonymized_Patient History – Anonymized version of patient history. Pseudonymized_age_group – Anonymized version of age groups. Pseudonymized_gender_numeric – Anonymized numeric encoding of gender. Pseudonymized_symptoms – Anonymized symptom descriptions. Pseudonymized_treatment – Anonymized treatment descriptions. Pseudonymized_timespan – Anonymized illness/treatment duration. Pseudonymized_Diagnosis Category – Anonymized category of diagnosis.

  6. h

    AI-Generated-vs-Real-Images-Datasets

    • huggingface.co
    Updated Aug 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hem Bahadur Gurung (2025). AI-Generated-vs-Real-Images-Datasets [Dataset]. https://huggingface.co/datasets/Hemg/AI-Generated-vs-Real-Images-Datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2025
    Authors
    Hem Bahadur Gurung
    Description

    Dataset Card for "AI-Generated-vs-Real-Images-Datasets"

    More Information needed

  7. LBA Regional Wetlands Data Set, 1-Degree (Matthews and Fung) - Dataset -...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). LBA Regional Wetlands Data Set, 1-Degree (Matthews and Fung) - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/lba-regional-wetlands-data-set-1-degree-matthews-and-fung-204ef
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This database, compiled by Matthews and Fung (1987), provides information on the distribution and environmental characteristics of natural wetlands. The database was developed to evaluate the role of wetlands in the annual emission of methane from terrestrial sources. The original data consists of five global 1-degree latitude by 1-degree longitude arrays. This subset, for the study area of the Large Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) in South America, retains all five arrays at the 1-degree resolution but only for the area of interest (i.e., longitude 85 deg to 30 deg W, latitude 25 deg S to 10 deg N). The arrays are (1) wetland data source, (2) wetland type, (3) fractional inundation, (4) vegetation type, and (5) soil type. The data subsets are in both ASCII GRID and binary image file formats.The data base is the result of the integration of three independent digital sources: (1) vegetation classified according to the United Nations Educational Scientific and Cultural Organization (UNESCO) system (Matthews, 1983), (2) soil properties from the Food and Agriculture Organization (FAO) soil maps (Zobler, 1986), and (3) fractional inundation in each 1-degree cell compiled from a global map survey of Operational Navigation Charts (ONC). With vegetation, soil, and inundation characteristics of each wetland site identified, the data base has been used for a coherent and systematic estimate of methane emissions from wetlands and for an analysis of the causes for uncertainties in the emission estimate.The complete global data base is available from NASA/GISS [http://www.giss.nasa.gov] and NCAR data set ds765.5 [http://www.ncar.ucar.edu]; the global vegetation types data are available from ORNL DAAC [http://www.daac.ornl.gov].

  8. Covid-19 variants survival data

    • kaggle.com
    zip
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massock Batalong Maurice Blaise (2025). Covid-19 variants survival data [Dataset]. https://www.kaggle.com/datasets/lumierebatalong/covid-19-variants-survival-data
    Explore at:
    zip(216589 bytes)Available download formats
    Dataset updated
    Jan 2, 2025
    Authors
    Massock Batalong Maurice Blaise
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview:

    This dataset provides a unique resource for researchers and data scientists interested in the global dynamics of the COVID-19 pandemic. It focuses on the impact of different SARS-CoV-2 variants and mutations on the duration of local epidemics. By combining variant information with epidemiological data, this dataset allows for a comprehensive analysis of factors influencing the trajectory of the pandemic.

    Key Features:

    • Global Coverage: Includes data from multiple countries.
    • Variant-Specific Information: Detailed records for various SARS-CoV-2 variants.
    • Epidemic Duration: Data on the duration of local epidemics, accounting for right-censoring.
    • Epidemiological Variables: Includes mortality rates, a proxy for R0, transmission proxies, and other pertinent variables.
    • Geographical characteristics: Include a continent variable for exploring geographical patterns
    • Time varying variables: Include the number of waves and the number of variants in the different countries for more in-depth exploration.

    Data Source: The data combines information from the Johns Hopkins University COVID-19 dataset (confirmed_cases.csv and deaths_cases.csv) and the covariants.org dataset (variants.csv). The dataset you see here is the combination of two datasets from Johns Hopkins University and covariants.org.

    Questions to Inspire Users:

    This dataset is designed for a diverse set of analytical questions. Here are some ideas to inspire the Kaggle community:

    Survival Analysis:

    1. How do different SARS-CoV-2 variants influence the duration of local epidemics?
    2. Which factors (mortality, R0, etc.) are most strongly associated with shorter or longer epidemic durations?
    3. Does the type of variant/mutation (mutation,S, Omicron, Delta, Other) have a significant impact on epidemic duration?
    4. Is there a geographical pattern to the duration of epidemics?

    Epidemiological Analysis:

    1. How do local transmission rates (represented by our proxy of R0) affect the duration of an epidemic?
    2. Do countries with higher mortality rates have different patterns of epidemic progression?
    3. How can we predict the duration of an epidemic based on its initial characteristics?
    4. How does the number of epidemic waves impact the duration of an epidemic?
    5. Does the number of variants in a country affect the duration of an épidémie?

    Data Science/Machine Learning:

    1. Can we develop a machine learning model to predict the duration of an epidemic?
    2. What features have the best predictive power ?
    3. Can we identify clusters of variants/regions with similar epidemic patterns?
    4. Are there interactions between variables that can explain the non-linearities that we have identified ?
  9. d

    Original Vector Datasets for Hawaii StreamStats

    • catalog.data.gov
    • search.dataone.org
    • +2more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Original Vector Datasets for Hawaii StreamStats [Dataset]. https://catalog.data.gov/dataset/original-vector-datasets-for-hawaii-streamstats
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Hawaii
    Description

    These datasets each consist of a folder containing a personal geodatabase of the NHD, and shapefiles used in the HydroDEM process. These files are provided as a means to document exactly which lines were used to develop the HydroDEMs. Each folder contains a line shapefile named for the 8-digit HUC code, containing the NHD flowlines that comprise the coastline for that island. The “hydrolines.shp” shapefile contains the lines that were burned into the DEM. These lines were selected from the NHD flowlines, with some minor editing in places. The “wbpolys.shp” shapefile contains the water-body polygons that were selected from the NHD and used in the bathymetric gradient process. The folders for HUCs 20010000 (Hawaii) and 20020000 (Maui) also contain a “walls.shp” shapefile, which contains the lines that were superimposed on the surface as “walls.”

  10. MHS Dashboard Children and Youth Demographic Datasets

    • data.ca.gov
    • data.chhs.ca.gov
    • +1more
    csv, zip
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Health Care Services (2025). MHS Dashboard Children and Youth Demographic Datasets [Dataset]. https://data.ca.gov/dataset/mhs-dashboard-children-and-youth-demographic-datasets
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The following datasets are based on the children and youth (under age 21) beneficiary population and consist of aggregate Mental Health Service data derived from Medi-Cal claims, encounter, and eligibility systems. These datasets were developed in accordance with California Welfare and Institutions Code (WIC) § 14707.5 (added as part of Assembly Bill 470 on 10/7/17). Please contact BHData@dhcs.ca.gov for any questions or to request previous years’ versions of these datasets. Note: The Performance Dashboard AB 470 Report Application Excel tool development has been discontinued. Please see the Behavioral Health reporting data hub at https://behavioralhealth-data.dhcs.ca.gov/ for access to dashboards utilizing these datasets and other behavioral health data.

  11. Georeferenced Population Datasets of Mexico (GEO-MEX): Urban Place GIS...

    • data.nasa.gov
    • cmr.earthdata.nasa.gov
    • +2more
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Georeferenced Population Datasets of Mexico (GEO-MEX): Urban Place GIS Coverage of Mexico [Dataset]. https://data.nasa.gov/dataset/georeferenced-population-datasets-of-mexico-geo-mex-urban-place-gis-coverage-of-mexico
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Mexico
    Description

    The Urban Place GIS Coverage of Mexico is a vector based point Geographic Information System (GIS) coverage of 696 urban places in Mexico. Each Urban Place is geographically referenced down to one tenth of a minute. The attribute data include time-series population and selected census/geographic data items for Mexican urban places from from 1921 to 1990. The cartographic data include urban place point locations on a state boundary file of Mexico. This data set is produced by the Columbia University Center for International Earth Science Information Network (CIESIN) in collaboration with the Instituto Nacional de Estadistica Geografia e Informatica (INEGI) and the Environmental Research Institute (ERI) of Michigan.

  12. FIRM Panels

    • gstore.unm.edu
    • datasets.ai
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Emergency Management Agency, FIRM Panels [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/e2b82ae8-5d62-4bdf-8f4a-04a03107294e/metadata/ISO-19115:2003.html
    Explore at:
    Dataset provided by
    Federal Emergency Management Agencyhttp://www.fema.gov/
    Time period covered
    Jan 16, 2009
    Area covered
    West Bound -108.312507806935 East Bound -103.0000000149 North Bound 36.2500001138827 South Bound 31.9999999822822
    Description

    The National Flood Hazard Layer (NFHL) data incorporates all Digital Flood Insurance Rate Map(DFIRM) databases published by FEMA, and any Letters Of Map Revision (LOMRs) that have been issued against those databases since their publication date. The DFIRM Database is the digital, geospatial version of the flood hazard information shown on the published paper Flood Insurance Rate Maps(FIRMs). The primary risk classifications used are the 1-percent-annual-chance flood event, the 0.2-percent-annual-chance flood event, and areas of minimal flood risk. The NFHL data are derived from Flood Insurance Studies (FISs), previously published Flood Insurance Rate Maps (FIRMs), flood hazard analyses performed in support of the FISs and FIRMs, and new mapping data where available. The FISs and FIRMs are published by the Federal Emergency Management Agency (FEMA). The specifications for the horizontal control of DFIRM data are consistent with those required for mapping at a scale of 1:12,000. The NFHL data contain layers in the Standard DFIRM datasets except for S_Label_Pt and S_Label_Ld. The NFHL is available as State or US Territory data sets. Each State or Territory data set consists of all DFIRMs and corresponding LOMRs available on the publication date of the data set.

  13. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  14. smollm-corpus

    • huggingface.co
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face Smol Models Research (2024). smollm-corpus [Dataset]. https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face Smol Models Research
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    SmolLM-Corpus

    This dataset is a curated collection of high-quality educational and synthetic data designed for training small language models. You can find more details about the models trained on this dataset in our SmolLM blog post.

      Dataset subsets
    
    
    
    
    
      Cosmopedia v2
    

    Cosmopedia v2 is an enhanced version of Cosmopedia, the largest synthetic dataset for pre-training, consisting of over 39 million textbooks, blog posts, and stories generated by… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus.

  15. d

    Biodiversity by County - Distribution of Animals, Plants and Natural...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of New York (2025). Biodiversity by County - Distribution of Animals, Plants and Natural Communities [Dataset]. https://catalog.data.gov/dataset/biodiversity-by-county-distribution-of-animals-plants-and-natural-communities
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    State of New York
    Description

    The NYS Department of Environmental Conservation (DEC) collects and maintains several datasets on the locations, distribution and status of species of plants and animals. Information on distribution by county from the following three databases was extracted and compiled into this dataset. First, the New York Natural Heritage Program biodiversity database: Rare animals, rare plants, and significant natural communities. Significant natural communities are rare or high-quality wetlands, forests, grasslands, ponds, streams, and other types of habitats. Next, the 2nd NYS Breeding Bird Atlas Project database: Birds documented as breeding during the atlas project from 2000-2005. And last, DEC’s NYS Reptile and Amphibian Database: Reptiles and amphibians; most records are from the NYS Amphibian & Reptile Atlas Project (Herp Atlas) from 1990-1999.

  16. N

    Dataset for Kiawah Island, SC Census Bureau Demographics and Population...

    • neilsberg.com
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for Kiawah Island, SC Census Bureau Demographics and Population Distribution Across Age // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b79be6a5-5460-11ee-804b-3860777c1fe6/
    Explore at:
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Carolina, Kiawah Island
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Kiawah Island population by age. The dataset can be utilized to understand the age distribution and demographics of Kiawah Island.

    Content

    The dataset constitues the following three datasets

    • Kiawah Island, SC Age Group Population Dataset: A complete breakdown of Kiawah Island age demographics from 0 to 85 years, distributed across 18 age groups
    • Kiawah Island, SC Age Cohorts Dataset: Children, Working Adults, and Seniors in Kiawah Island - Population and Percentage Analysis
    • Kiawah Island, SC Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

  17. h

    AgentTrek

    • huggingface.co
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XLang NLP Lab (2025). AgentTrek [Dataset]. https://huggingface.co/datasets/xlangai/AgentTrek
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Dataset authored and provided by
    XLang NLP Lab
    Description

    AgentTrek Data Collection

    AgentTrek dataset is the training dataset for the Web agent AgentTrek-1.0-32B. It consists of a total of 52,594 dialogue turns, specifically designed to train a language model for performing web-based tasks, such as browsing and web shopping. The dialogues in this dataset simulate interactions where the agent assists users in tasks like searching for information, comparing products, making purchasing decisions, and navigating websites.

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/AgentTrek.
    
  18. h

    Fake_or_Real_Competition_Dataset

    • huggingface.co
    • kaggle.com
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GenON (2023). Fake_or_Real_Competition_Dataset [Dataset]. https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Dataset authored and provided by
    GenON
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    2023 Fake or Real: AI-generated Image Discrimination Competition dataset is now available on Hugging Face!

    Hello🖐️ We are excited to announce the release of the dataset for the 2023 Fake or Real: AI-generated Image Discrimination Competition. The competition was held on AI CONNECT(https://aiconnect.kr/) from June 26th to July 6th, 2023, with 768 participants.If you're interested in evaluating the performance of your model on the test dataset, we encourage you to visit the… See the full description on the dataset page: https://huggingface.co/datasets/mncai/Fake_or_Real_Competition_Dataset.

  19. a

    Enterprise Dataset Inventory - Retired Datasets

    • hub.arcgis.com
    • opendata.dc.gov
    • +1more
    Updated May 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Washington, DC (2024). Enterprise Dataset Inventory - Retired Datasets [Dataset]. https://hub.arcgis.com/datasets/DCGIS::enterprise-dataset-inventory-retired-datasets/about
    Explore at:
    Dataset updated
    May 21, 2024
    Dataset authored and provided by
    City of Washington, DC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Mayor's Order 2017-115 establishes a comprehensive data policy for the District government. The data created and managed by the District government are valuable assets and are independent of the information systems in which the data reside. As such, the District government shall: maintain an inventory of its enterprise datasets; classify enterprise datasets by level of sensitivity; regularly publish the inventory, including the classifications, as an open dataset; and strategically plan and manage its investment in data.The greatest value from the District’s investment in data can only be realized when enterprise datasets are freely shared among District agencies, with federal and regional governments, and with the public to the fullest extent consistent with safety, privacy, and security. For more information, please visit https://opendata.dc.gov/pages/edi-overview. Previous years of EDI can be found on Open Data.

  20. d

    Massachusetts and Rhode Island 2016 BIRDS (Birds Polygons)

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Oct 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2024). Massachusetts and Rhode Island 2016 BIRDS (Birds Polygons) [Dataset]. https://catalog.data.gov/dataset/massachusetts-and-rhode-island-2016-birds-birds-polygons1
    Explore at:
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    (Point of Contact, Custodian)
    Area covered
    Rhode Island, Massachusetts
    Description

    This data set contains sensitive biological resource data for wading birds, shorebirds, waterfowl, raptors, diving birds, seabirds, passerine birds, and gulls and terns in Massachusetts, Rhode Island. Vector polygons in this data set represent bird nesting, migratory staging, and wintering sites. Species-specific abundance, seasonality, status, life history, and source information are stored in associated data tables (described below) designed to be used in conjunction with this spatial data layer. This data set is a portion of the ESI data for Massachusetts, Rhode Island. As a whole, the ESI data characterize the marine and coastal environments and wildlife by their sensitivity to spilled oil, and include information for three main components: shoreline habitats, sensitive biological resources, and human-use resources. See also the BIRDSPT (Bird Points) data layer for additional bird information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yang Liu; Haiming Xu; Suchao Chen; Xianfeng Chen; Zhenguo Zhang; Zhihong Zhu; Xueying Qin; Landian Hu; Jun Zhu; Guo-Ping Zhao; Xiangyin Kong (2023). Comparison of OR tables between two datasets for one CD interaction. [Dataset]. http://doi.org/10.1371/journal.pgen.1001338.t005

Comparison of OR tables between two datasets for one CD interaction.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOS Genetics
Authors
Yang Liu; Haiming Xu; Suchao Chen; Xianfeng Chen; Zhenguo Zhang; Zhihong Zhu; Xueying Qin; Landian Hu; Jun Zhu; Guo-Ping Zhao; Xiangyin Kong
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Comparison of OR tables between the interaction of rs7522462 and rs11945978 in the WTCCC data with the shared controls (left) and the interaction of the proxy SNPs, rs296533 and rs2089509 in the IBDGC data (right). The legend to this table is the same as that of Table 3.

Search
Clear search
Close search
Google apps
Main menu