100+ datasets found
  1. h

    c-sharp-coding-dataset

    • huggingface.co
    Updated Dec 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Meldrum (2024). c-sharp-coding-dataset [Dataset]. https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2024
    Authors
    David Meldrum
    Description

    Dataset Card for c-sharp-coding-dataset

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset.

  2. GOTHiC, a probabilistic model to resolve complex biases and to identify real...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Borbala Mifsud; Inigo Martincorena; Elodie Darbo; Robert Sugar; Stefan Schoenfelder; Peter Fraser; Nicholas M. Luscombe (2023). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data [Dataset]. http://doi.org/10.1371/journal.pone.0174744
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Borbala Mifsud; Inigo Martincorena; Elodie Darbo; Robert Sugar; Stefan Schoenfelder; Peter Fraser; Nicholas M. Luscombe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).

  3. C-V2X Interoperability Testing Datasets

    • catalog.data.gov
    • gimi9.com
    • +2more
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). C-V2X Interoperability Testing Datasets [Dataset]. https://catalog.data.gov/dataset/c-v2x-interoperability-testing-datasets
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    These datasets contain C-V2X network communication and interoperability testing packet data collected using a network sniffer (Wireshark) in the Packet Capture (PCAP) format and converted into the Packet Description Markup Language (PDML) format. These datasets include three testcases: C-V2I, C-V2V, and C-V2X. These datasets can be used to display, analyze, and assess C-V2X compatibility and interoperability among commercial on-board units (OBUs) and road-side units (RSUs) based on IEEE 1609.2, IEEE 1609.3, and SAE J2735 standards.

  4. EDDEN

    • openneuro.org
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Pedro Manzano Patron; Steen Moeller; Jesper L.R. Andersson; Essa Yacoub; Stamatios N. Sotiropoulos (2023). EDDEN [Dataset]. http://doi.org/10.18112/openneuro.ds004666.v1.0.0
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Jose Pedro Manzano Patron; Steen Moeller; Jesper L.R. Andersson; Essa Yacoub; Stamatios N. Sotiropoulos
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    EDDEN stands for *E*valuation of *D*MRI *DEN*oising approaches. The data correspond to the publication: Manzano Patron, J.P., Moeller, S., Andersson, J.L.R., Yacoub, E., Sotiropoulos, S.N.. Denoising Diffusion MRI: Considerations and implications for analysis. doi: https://doi.org/10.1101/2023.07.24.550348. Please, cite it if you use this dataset.

    • Description of the dataset RAW Complex data (magnitude and phase) is acquired for a single subject at different SNR/resolution regimes, under ~/EDDEN/sub-01/ses-XXX/dwi/:

      • Dataset A (2mm)

        • This dataset represents a relatively medium-to-high SNR regime.
        • 6 repeats of a 2mm isotropic multi-shell dataset each implementing the UK Biobank protocol (Miller et al., 2016)
        • TR=3s, TE=92ms, MB=3, no in-plane acceleration, scan time ∼6 minutes per repeat.
        • For each repeat, 116 volumes were acquired: 105 volumes with AP phase encoding direction, such as 5 b = 0 s/mm2 volumes, and 100 diffusion encoding orientations, with 50 b = 1000 s/mm2 and 50 b = 2000 s/mm2 volumes; and 4 b = 0 s/mm2 volumes with reversed phase encoding direction (PA) for susceptibility induced distortion correction (Andersson and Skare, 2002).
        • NOTES: Only 1 PA set of volumes was acquired for all the runs.
      • Dataset B (1p5mm):

        • This is a low-to-medium SNR dataset, with relatively high resolution.
        • 5 repeats of a 1.5 mm isotropic multi-shell dataset, each implementing an HCP-like protocol in terms of q-space sampling (Sotiropoulos et al., 2013a).
        • TR=3.23 s, TE=89.2 ms, MB=4 no in-plane acceleration, scan time ∼16 minutes per repeat.
        • For each repeat, 300 volumes were acquired: 297 volumes with AP phase encoding direction, such as 27 b = 0 s/mm2 volumes, and 270 diffusion encoding orientations, with 90 b = 1000 s/mm2, 90 b = 2000 s/mm2, and 90 b = 3000 s/mm2 volumes; and 3 b = 0 s/mm2 volumes with PA phase encoding for susceptibility-induced distortion correction.
      • Dataset C (0p9mm):

        • This is a very low SNR dataset to represent extremely noisy data that without denoising are expected to be borderline unusable (particularly for the higher b).
        • 4 repeats of an ultra-high-resolution multi-shell dataset with 0.9mm isotropic resolution.
        • TR=6.569 s, TE=91 ms, MB=3, in-plane GRAPPA=2, scan time ∼22 minutes per repeat.
        • For each repeat, 202 volumes were acquired with orientations as in (Harms et al., 2018): 199 volumes with AP phase encoding direction, such as 14 b = 0 s/mm2 volumes, and 185 diffusion encoding orientations, with 93 b = 1000 s/mm2, and 92 b = 2000 s/mm2 volumes; and 3 b = 0 s/mm2 volumes with PA phase encoding for susceptibility-induced distortion correction.
        • NOTES: The phase of the PAs is not available, and the same PA is used for runs 3 and 4.

    Each dataset contains their own T1w-MPRAGE under ~/EDDEN/sub-01/ses-XXX/anat/. Each data set was acquired on a different day, to minimise fatigue, but all repeats within a dataset were acquired in the same session. All acquisitions were obtained parallel to the anterior and posterior commissure line, covering the entire cerebrum.

    DERIVATIVES Here are the different denoised version of the raw data for the different datasets, the pre-processed data for the raw, denoised and averages, and the FA, MD and V1 outputs from the DTI model fitting (see *Data pre-processin section below). - Denoised data: - NLM (NLM), for Non-Local Means denoising applied to magnitude raw data. - MPPCA (|MPPCA|), for Marchenko-Pastur PCA denoising applied to magnitude raw data. - MPPCA_complex (MPPCA*), for Marchenko-Pastur PCA denoising applied to complex raw data. - NORDIC (NORDIC), for NORDIC applied to complex raw data. - AVG_mag (|AVG|), for the average of the multiple repeats in magnitude. - AVG_complex (AVG*), for the average in the complex space of the multiple repeats. - Masks: Under ~/EDDEN/derivatives/ses-XXX/masks we can find different masks for each dataset: - GM_mask: Gray Matter mask. - WM_mask: White Matter mask. - CC_mask: Corpus Callosum Matter mask. - CS_mask: Centrum Semiovale mask. - ventricles_mask: CSF ventricles mask. - nodif_brain_mask: Eroded brain mask.

    • Data pre-processing Both magnitude and phase data were retained for each acquisition to allow evaluations of denoising in both magnitude and complex domains. In order to allow distortion correction and processing for complex data and avoid phase incoherence artifacts, the raw complex-valued diffusion data were rotated to the real axis using the phase information. A spatially varying phase-field was estimated and complex vectors were multiplied with the conjugate of the phase. The phase-field was estimated uniquely for each slice and volume by firstly removing the phase variations from k-space sampling and coil sensitivity combination, and secondly by removing an estimate of a smooth residual phase-field. The smooth residual phase-field was estimated using a low-pass filter with a narrowed tapered cosine filter (a Tukey filter with an FWHM of 58%). Hence, the final signal was rotated approximately along the real axis, subject to the smoothness constraints.

    Having the magnitude and complex data for each dataset, denoising was applied using different approaches prior to any pre-processing to minimise potential changes in statistical properties of the raw data due to interpolations (Veraart et al., 2016b). For denoising, we used the following four algorithms:

    - **Denoising in the magnitude domain**: i) The Non-Local Means (**NLM**) (Buades et al., 2005) was applied as an exemplar of a simple non-linear filtering method adapted from traditional signal pre-processing. We used the default implementation in DIPY (Garyfallidis et al., 2014), where each dMRI volume is denoised independently. ii) The Marchenko-Pastur PCA (MPPCA) (denoted as **|MPPCA|** throughout the text) (Cordero-Grande et al., 2019; Veraart et al., 2016b), reflecting a commonly used approach that performs PCA over image patches and uses the MP theorem to identify noise components from the eigenspectrum. We used the default MrTrix3 implementation (Tournier et al., 2019).
    
    - **Denoising in the complex domain**: i) MPPCA applied to complex data (rotated along the real axis), denoted as **MPPCA***. We applied the MrTrix3 implementation of the magnitude MPPCA to the complex data rotated to the real axis (we found that this approach was more stable in terms of handling phase images and achieved better denoising, compared to the MrTrix3 complex MPPCA implementation). ii) The **NORDIC** algorithm (Moeller et al., 2021a), which also relies on the MP theorem, but performs variance spatial normalisation prior to noise component identification and filtering, to ensure noise stationarity assumptions are fulfilled.
    

    All data, both raw and their four denoised versions, underwent the same pre-processing steps for distortion and motion correction (Sotiropoulos et al., 2013b) using an in-house pipeline (Mohammadi-Nejad et al., 2019). To avoid confounds from potential misalignment in the distortion-corrected diffusion native space obtained from each approach, we chose to compute a single susceptibility-induced off-resonance fieldmap using the raw data for each of the Datasets A, B and C; and then use the corresponding fieldmap for all denoising approaches in each dataset so that the reference native space stays the same for each of A, B and C. Note that differences between fieldmaps before and after denoising are small anyway, as the relatively high SNR b = 0 s/mm2 images are used to estimate them. But these small differences can cause noticeable misalignments between methods and confounds when attempting quantitative comparisons, which we avoid here using our approach. Hence, for each of the Datasets A, B and C, the raw blip-reversed b = 0 s/mm2 were used in FSL’s topup to generate a fieldmap (Andersson and Skare, 2002). This was then used into individual runs of FSL’s eddy for each approach (Andersson and Sotiropoulos, 2016) that applied the common fieldmap and performed corrections for eddy current and subject motion in a single interpolation step. FSL’s eddyqc (Bastiani et al.,2019) was used to generate quality control (QC) metrics, including SNR and angular CNR for each b value. The same T1w image was used within each dataset. A linear transformation estimated using with boundary-based registration (Greve and Fischl, 2009) was obtained from the corrected native diffusion space to the T1w space. The T1w image was skull-stripped and non-linearly registered to the MNI standard space allowing further analysis. Masks of white and grey matter were obtained from the T1w image using FSL’s FAST (Jenkinson et al., 2012) and they were aligned to diffusion space.

  5. Physical Gene Regulatory Networks in C.elegans

    • kaggle.com
    zip
    Updated Feb 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Physical Gene Regulatory Networks in C.elegans [Dataset]. https://www.kaggle.com/datasets/thedevastator/physical-gene-regulatory-networks-in-c-elegans
    Explore at:
    zip(543510 bytes)Available download formats
    Dataset updated
    Feb 10, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Physical Gene Regulatory Networks in C.elegans

    239,001 Regulatory Interactions from 289 Wild-type Young Adult Datasets

    By [source]

    About this dataset

    This dataset provides highly complex physical gene regulatory networks in young adult wild-type (WT) C.elegans worms. With a total of 239,001 regulatory interactions collected from 289 datasets, this dataset is a great resource for studying gene regulation and exploring how this gene activity contributes to organism function under varying bio-environmental conditions. Our collection of datasets contains 126 genes and 495 transcription factors, along with functional knockdown data that has been used to validate the physical gene regulatory networks present in the young adult C.elegans worms. Moreover, researchers and biologists can leverage this data to gain valuable insights on how various genotypes, ages and strains are associated with different perturbations in their biological features and ultimately uncover new discoveries about the network of relationships that exist between these genes inside animals. This comprehensive dataset will be essential for conducting research related to such topics as life development processes or age-related diseases - further enriching our understanding of life!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide will help you understand how to use this dataset of physical gene regulatory networks to research and analyze young adult C.elegans worms.

    • Understand the columns in the dataset: In this dataset, there are 239,001 regulatory interactions from 289 datasets consisting of 126 genes and 495 transcription factors registered with their genotype, age, strain, perturbation type, data type, data source and source used. Additionally, comments and regulator are also included in the columns for more information about each interaction.

    • Know your research goal: Determine what it is you wish to discover when working with this dataset so that you can work efficiently when sorting or exploring the data within it. Knowing your goals for the analysis will be helpful for deciding which column may provide valuable insights in relation to our project objectives when doing any kind of filter or sorting within the internal structure of our database file itself.

    • Analyzing Specific Types Of Data: Once your goals have been established it is then important to start analyzing specific types of data that are relevant for achieving those objectives as we go further into understanding what kind of database structures we will need to read from on a molecular level (this includes focusing on different types such as transcription factor levels). When looking at all these individual components together they can offer insight into how regulation may be changing within a cell’s environment & which pathways could become activated/ deactivated due its presence or absence throughout different conditions).

    4 Keeping Logs And Documents Up To Date: Once done with some sortings or filters on certain columns make sure that your logs/documents stay up-to-date and match up with any changes made during analysis so as not mix-up usage across different documents/sessions throughout our project lifespan itself! This is highly recommended as having an organized record keeping system helps ensure accuracy when dealing with large volumes of information over time periods (thus making sure nothing gets overlooked accidentally!).

    We hope these tips help get you started into exploring Physical Gene Regulatory Networks in C Elegans’! If you have any questions feel free to reach out via message – we would love hearing about how things go after implementing them into practice!

    Research Ideas

    • Training machine-learning algorithms to develop automated approaches in predicting gene expression levels of individual regulatory networks.
    • Using this dataset alongside data from RNA-seq experiments to investigate how genetic mutations, environmental changes, and other factors can affect gene regulation across C.elegans populations.
    • Exploring the correlation between transcription factor binding sites and gene expression levels to predict potential target genes for a given transcription factor

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, m...

  6. f

    Supplementary data 2. Hi-C datasets used in the study

    • auckland.figshare.com
    • figshare.com
    xlsx
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sreemol Gokuladhas; William Schierding; Evgeniia Golovina; Tayaza Fadason; Justin O'Sullivan (2021). Supplementary data 2. Hi-C datasets used in the study [Dataset]. http://doi.org/10.17608/k6.auckland.14273630.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 24, 2021
    Dataset provided by
    The University of Auckland
    Authors
    Sreemol Gokuladhas; William Schierding; Evgeniia Golovina; Tayaza Fadason; Justin O'Sullivan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Details of the Hi-C datasets from 70 cell-lines/tissues used for the analysis is provided in the table. Hi-C contact data was used to find the target genes of the autoimmune disease-associated SNPs.

  7. e

    Land use forcing and LPJ-GUESS C pool projections

    • data.europa.eu
    • researchdata.se
    unknown
    Updated Sep 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lunds universitet (2017). Land use forcing and LPJ-GUESS C pool projections [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-18161-lpj-guess_plum-201708?locale=mt
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Sep 1, 2017
    Dataset authored and provided by
    Lunds universitet
    Description

    Land use data for 2001-2100 from PLUM1.3 (Parsimonious Land Use Model version 1.3) coupled with a global energy-economics model, downscaled to 0.5*0.5 degree gridcells, for the five scenarios SSP1-SSP5 (reference and mitigation strategies) as described in detail in Engström et al. (2017). - Total terrestrial biosphere carbon (kg/m2) for 2001-2100 from simulations using the vegetation model LPJ-GUESS at 0.5*0.5 degree resolution for the 10 SSP-SPA scenarios using 1-3 RCPs for four different climate models as described in Engström et al. (2017). Ref.: Engström K., Lindeskog M., Olin S., Hassler J., and Smith B., (2017) Impacts of climate mitigation strategies in the energy sector on global land use and carbon balance.

  8. Musical Scale Classification Dataset using Chroma

    • kaggle.com
    zip
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Om Avashia (2025). Musical Scale Classification Dataset using Chroma [Dataset]. https://www.kaggle.com/datasets/omavashia/synthetic-scale-chromagraph-tensor-dataset
    Explore at:
    zip(392580911 bytes)Available download formats
    Dataset updated
    Apr 8, 2025
    Authors
    Om Avashia
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Dataset Description

    Musical Scale Dataset: 1900+ Chroma Tensors Labeled by Scale

    This dataset contains 1900+ unique synthetic musical audio samples generated from melodies in each of the 24 Western scales (12 major and 12 minor). Each sample has been converted into a chroma tensor, a 12-dimensional pitch class representation commonly used in music information retrieval (MIR) and deep learning tasks.

    What’s Inside

    • chroma_tensor: A JSON-safe formatted of a PyTorch tensor with shape [1, 12, T], where:
      • 12 = the 12 pitch classes (C, C#, D, ... B)
      • T = time steps
    • scale_index: An integer label from 0–23 identifying the scale the sample belongs to

    Use Cases

    This dataset is ideal for: - Training deep learning models (CNNs, MLPs) to classify musical scales - Exploring pitch-class distributions in Western tonal music - Prototyping models for music key detection, chord prediction, or tonal analysis - Teaching or demonstrating chromagram-based ML workflows

    Labels

    IndexScale
    0C major
    1C# major
    ......
    11B major
    12C minor
    ......
    23B minor

    Quick Load Example (PyTorch)

    Chroma tensors are of shape [1, 12, T], where: - 1 is the channel dimension (for CNN input) - 12 represents the 12 pitch classes (C through B) - T is the number of time frames

    import torch
    import pandas as pd
    from tqdm import tqdm
    
    df = pd.read_csv("/content/scale_dataset.csv")
    
    # Reconstruct chroma tensors
    X = [torch.tensor(eval(row)).reshape(1, 12, -1) for row in tqdm(df['chroma_tensor'])]
    y = df['scale_index'].tolist()
    

    Alternatively, you could directly load the chroma tensors and target scale indices using the .pt file.

    import torch
    import pandas as pd
    
    data = torch.load("chroma_tensors.pt")
    X_pt = data['X'] # list of [1, 12, 302] tensors
    y_pt = data['y'] # list of scale indices
    

    How It Was Built

    • Notes generated from random melodies using music21
    • MIDI converted to WAV via FluidSynth
    • Chromagrams extracted with librosa.feature.chroma_stft
    • Tensors flattened and saved alongside scale index labels

    File Format

    ColumnTypeDescription
    chroma_tensorstrFlattened 1D chroma tensor [1×12×T]
    scale_indexintLabel from 0 to 23

    Notes

    • Data is synthetic but musically valid and well-balanced
    • Each of the 24 scales appears 300 times
    • All tensors have fixed length (T) for easy batching
  9. R

    Dataset C 0510 Dataset

    • universe.roboflow.com
    zip
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HMobility (2025). Dataset C 0510 Dataset [Dataset]. https://universe.roboflow.com/hmobility-esmgj/dataset-c-0510/dataset/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset authored and provided by
    HMobility
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Crosswalk Polygons
    Description

    DATASET C 0510

    ## Overview
    
    DATASET C 0510 is a dataset for instance segmentation tasks - it contains Crosswalk annotations for 676 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  10. R

    Data from: C Project Dataset

    • universe.roboflow.com
    zip
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    seoyh rb (2022). C Project Dataset [Dataset]. https://universe.roboflow.com/seoyh-rb-gdiwi/c-project/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 29, 2022
    Dataset authored and provided by
    seoyh rb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Tooth Bounding Boxes
    Description

    C Project

    ## Overview
    
    C Project is a dataset for object detection tasks - it contains Tooth annotations for 713 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. Functional Use Database (FUse)

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Functional Use Database (FUse) [Dataset]. https://catalog.data.gov/dataset/functional-use-database-fuse
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    There are five different files for this dataset: 1. A dataset listing the reported functional uses of chemicals (FUse) 2. All 729 ToxPrint descriptors obtained from ChemoTyper for chemicals in FUse 3. All EPI Suite properties obtained for chemicals in FUse 4. The confusion matrix values, similarity thresholds, and bioactivity index for each model. 5. The functional use prediction, bioactivity index, and prediction classification (poor prediction, functional substitute, candidate alternative) for each Tox21 chemical. This dataset is associated with the following publication: Phillips, K., J. Wambaugh, C. Grulke, K. Dionisio, and K. Isaacs. High-throughput screening of chemicals as functional substitutes using structure-based classification models. GREEN CHEMISTRY. Royal Society of Chemistry, Cambridge, UK, 19: 1063-1074, (2017).

  12. Coastal Change Analysis Program (C-CAP) Regional Land Cover Data and Change...

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Apr 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA Office for Coastal Management (Point of Contact, Custodian) (2025). Coastal Change Analysis Program (C-CAP) Regional Land Cover Data and Change Data [Dataset]. https://catalog.data.gov/dataset/coastal-change-analysis-program-c-cap-regional-land-cover-data-and-change-data2
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    The NOAA Coastal Change Analysis Program (C-CAP) produces national standardized land cover and change products for the coastal regions of the U.S. C-CAP products inventory coastal intertidal areas, wetlands, and adjacent uplands with the goal of monitoring changes in these habitats, on a one-to-five year repeat cycle. The timeframe for this metadata is reported as 1985 - 2010-Era, but the actual dates of the Landsat imagery used to create the land cover may have been acquired a few years before or after each era. These maps are developed utilizing Landsat Thematic Mapper imagery, and can be used to track changes in the landscape through time. This trend information gives important feedback to managers on the success or failure of management policies and programs and aid in developing a scientific understanding of the Earth system and its response to natural and human-induced changes. This understanding allows for the prediction of impacts due to these changes and the assessment of their cumulative effects, helping coastal resource managers make more informed regional decisions. NOAA C-CAP is a contributing member to the Multi-Resolution Land Characteristics consortium and C-CAP products are included as the coastal expression of land cover within the National Land Cover Database.

  13. Data from: A large EEG database with users' profile information for motor...

    • data.europa.eu
    • zenodo.org
    unknown
    Updated Jan 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). A large EEG database with users' profile information for motor imagery Brain-Computer Interface research [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7554429?locale=en
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jan 8, 2023
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context : We share a large database containing electroencephalographic signals from 87 human participants, with more than 20,800 trials in total representing about 70 hours of recording. It was collected during brain-computer interface (BCI) experiments and organized into 3 datasets (A, B, and C) that were all recorded following the same protocol: right and left hand motor imagery (MI) tasks during one single day session. It includes the performance of the associated BCI users, detailed information about the demographics, personality and cognitive user’s profile, and the experimental instructions and codes (executed in the open-source platform OpenViBE). Such database could prove useful for various studies, including but not limited to: 1) studying the relationships between BCI users' profiles and their BCI performances, 2) studying how EEG signals properties varies for different users' profiles and MI tasks, 3) using the large number of participants to design cross-user BCI machine learning algorithms or 4) incorporating users' profile information into the design of EEG signal classification algorithms. Sixty participants (Dataset A) performed the first experiment, designed in order to investigated the impact of experimenters' and users' gender on MI-BCI user training outcomes, i.e., users performance and experience, (Pillette & al). Twenty one participants (Dataset B) performed the second one, designed to examined the relationship between users' online performance (i.e., classification accuracy) and the characteristics of the chosen user-specific Most Discriminant Frequency Band (MDFB) (Benaroch & al). The only difference between the two experiments lies in the algorithm used to select the MDFB. Dataset C contains 6 additional participants who completed one of the two experiments described above. Physiological signals were measured using a g.USBAmp (g.tec, Austria), sampled at 512 Hz, and processed online using OpenViBE 2.1.0 (Dataset A) & OpenVIBE 2.2.0 (Dataset B). For Dataset C, participants C83 and C85 were collected with OpenViBE 2.1.0 and the remaining 4 participants with OpenViBE 2.2.0. Experiments were recorded at Inria Bordeaux sud-ouest, France. Duration : Each participant's folder is composed of approximately 48 minutes EEG recording. Meaning six 7-minutes runs and a 6-minutes baseline. Documents Instructions: checklist read by experimenters during the experiments. Questionnaires: the Mental Rotation test used, the translation of 4 questionnaires, notably the Demographic and Social information, the Pre and Post-session questionnaires, and the Index of Learning style. English and french version Performance: The online OpenViBE BCI classification performances obtained by each participant are provided for each run, as well as answers to all questionnaires Scenarios/scripts : set of OpenViBE scenarios used to perform each of the steps of the MI-BCI protocol, e.g., acquire training data, calibrate the classifier or run the online MI-BCI Database : raw signals Dataset A : N=60 participants Dataset B : N=21 participants Dataset C : N=6 participants

  14. R

    Capstone C Final 1 Dataset

    • universe.roboflow.com
    zip
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Capstone C (2023). Capstone C Final 1 Dataset [Dataset]. https://universe.roboflow.com/capstone-c/capstone-c-final-dataset-1-vgfa5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2023
    Dataset authored and provided by
    Capstone C
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cars Bounding Boxes
    Description

    Capstone C Final Dataset 1

    ## Overview
    
    Capstone C Final Dataset 1 is a dataset for object detection tasks - it contains Cars annotations for 4,564 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  15. Z

    Preliminary Coastal Grain Size Portal (C-GRASP) dataset. Version 1, January...

    • data.niaid.nih.gov
    Updated Feb 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buscombe, Daniel; Speiser, William; Goldstein, Evan (2022). Preliminary Coastal Grain Size Portal (C-GRASP) dataset. Version 1, January 2022 [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5874230
    Explore at:
    Dataset updated
    Feb 16, 2022
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    University of North Carolina at Greensboro
    Authors
    Buscombe, Daniel; Speiser, William; Goldstein, Evan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Provisional database: The data you have secured from the U.S. Geological Survey (USGS) database identified as Preliminary Coastal Grain Size Portal (C-GRASP) dataset. Version 1, January 2022 have not received USGS approval and as such are provisional and subject to revision. The data are released on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use.

    Version 1 (January 2022) of the the Coastal Grain Size Portal (C-GRASP) database. This is a preliminary internal deliverable for the National Oceanography Partnership Program (NOPP) Task 1 / USGS Gesch team and project partners only.

    The primary purpose of this Provisional data release is to provide National Oceanography Partnership Program (NOPP) project partners with programmatic access to this preliminary version of the Coastal Grain Size Portal (C-GRASP) database for internal project use. These data are preliminary or provisional and are subject to revision. They are being provided to meet the need for timely best science. The data have not received final approval by the U.S. Geological Survey (USGS) and are provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the data.

    This preliminary data release contains various files that list grain size information collated from secondary data already in the public domain, in the form of public datasets, or in published literature.

    Where possible, we have indicated the source, location, and sampling methods used to obtain these data. Where not possible to establish these facts, those fields have been left empty.

    More information on our methods, data sources, and data processing and analysis codes are found on our github page

    The dataset consists of one zipped file, Source_Files.zip, and 4 comma separated value (csv) files

    dataset_10kmcoast.csv- This is all data that is found to be within 10km of the Natural Earth coastline polyline

    Data_EstimatedOnshore.csv- This is all the data from dataset_10kmcoast.csv that lies within the Natural Earth United States Polygon

    Data_VerifiedOnshore.csv- This is all data that was able to be verified onshore from either sampling method, note, or location type data

    Data_Post2012_VerifiedOnshore.csv- This is all the data from Data_VerifiedOnshore.csv that is after 2012

    The files each have the following fields (no data is blank):

    'ID': row ID integer

    'Sample_ID': identifier to raw data source

    'Sample_Type_Code': code of sample id

    'Project': raw datasource project identifier

    'dataset': raw dataset major identifier

    'Date': date, where specified, and to whatever precision that is specified

    'Location_Type': where specified, code indicating type of location information

    'latitude': latitude in decimal degrees

    'longitude': longitude in decimal degrees

    'Contact': where specified, raw data originator

    'num_orig_dists': number of unique grain size distributions

    'Measured_Distributions': number iof measured grain size distributions

    'Grainsize': grain size is sometimes reported without specification

    'Mean', mean grain size in mm

    'Median', median grain size in mm

    'Wentworth', wentworth name (one of ['Clay', 'CoarseSand', 'CoarseSilt', 'Cobble', 'FineSand', 'FineSilt', 'Granule', 'MediumSand', 'MediumSilt', 'Pebble', 'VeryCoarseSand', 'VeryFineSand', 'VeryFineSilt'])

    'Kurtosis', kurtosis value (non-dim)

    'Kurtosis_Class', kurtosis category

    'Skewness', skewness value (non-dim)

    'Skewness_Class', skewness category

    'Std', standard deviation of grain sizes

    'Sorting', sorting category

    'd5', grain size distribution 5th percentile

    'd10', grain size distribution 10th percentile

    'd16', grain size distribution 16th percentile

    'd25', grain size distribution 25th percentile

    'd30', grain size distribution 30th percentile

    'd50', grain size distribution 50th percentile

    'd65', grain size distribution 65th percentile

    'd75', grain size distribution 75th percentile

    'd84',grain size distribution 84th percentile

    'd90', grain size distribution 90th percentile

    'd95', grain size distribution 95th percentile

    'Notes': notes - these can be informative and substantial, do not disregard

    Source_Files.zip contains 11 comma separated value files, namely bicms.csv boem.csv clark.csv dbseabed.csv ecstdb.csv mass.csv mcfall.csv rossi.csv sandsnap.csv sbell.csv ussb.csv, which contain raw datasets that have been collated and extracted from their native formats into csv format

  16. w

    Live tables on National Land Use Database of previously-developed Brownfield...

    • gov.uk
    Updated Nov 10, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities & Local Government (2018 to 2021) (2012). Live tables on National Land Use Database of previously-developed Brownfield Land [Dataset]. https://www.gov.uk/government/statistical-data-sets/live-tables-on-national-land-use-database-of-previously-developed-brownfield-land
    Explore at:
    Dataset updated
    Nov 10, 2012
    Dataset provided by
    GOV.UK
    Authors
    Ministry of Housing, Communities & Local Government (2018 to 2021)
    Description

    https://assets.publishing.service.gov.uk/media/5a79ba7040f0b642860da4a6/8651421.xls">Table P301 - Previously-developed land by land type

     <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">16.5 KB</span></p>
    
    
    
    
     <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
     <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
    

    Request an accessible format.

      If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternativeformats@communities.gov.uk" target="_blank" class="govuk-link">alternativeformats@communities.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
    

    https://assets.publishing.service.gov.uk/media/5a79064140f0b679c0a07eed/8651471.xls">Table P302 - Previously-developed land by whether vacant/derelict or in use

     <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">18 KB</span></p>
    
    
    
    
     <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
     <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
    

    Request an accessible format.

      If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternativeformats@communities.gov.uk" target="_blank" class="govuk-link">alternativeformats
    
  17. CPP Dataset

    • kaggle.com
    zip
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mujtaba Ahmed (2025). CPP Dataset [Dataset]. https://www.kaggle.com/datasets/rajamujtabaahmed/cpp-dataset
    Explore at:
    zip(82876 bytes)Available download formats
    Dataset updated
    Apr 21, 2025
    Authors
    Mujtaba Ahmed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains 10,000 unique C++ programming prompts along with their corresponding code responses, designed specifically for training and evaluating natural language generation models such as Transformers. ** Each row in the CSV contains:**

    id: A unique identifier for each record.

    prompt: A C++ programming instruction or task, phrased in natural language.

    response: The corresponding C++ source code fulfilling the prompt.

    The prompts include a wide range of programming concepts, such as:

    Basic arithmetic operations

    Loops and conditionals

    Class and object creation

    Recursion and algorithm design

    Template functions and data structures

    This dataset is ideal for:

    Fine-tuning code generation models (e.g., GPT-style models)

    Creating educational tools or auto-code assistants

    Exploring zero-shot/few-shot learning in code generation

    Following Code can Be used to complete all #TODO Programs in the Dataset:

    import pandas as pd from transformers import AutoModelForCausalLM, AutoTokenizer import torch from tqdm import tqdm

    Load your dataset

    df = pd.read_csv("/Path/CPP_Dataset_MujtabaAhmed.csv")

    Load the model and tokenizer (CodeGen 350M - specialized for programming)

    model_name = "Salesforce/codegen-350M-mono" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name).cuda() # Use .cpu() if no GPU

    Function to complete C++ code with TODO

    def complete_code(prompt): input_text = prompt.strip() + " " inputs = tokenizer(input_text, return_tensors="pt").to(model.device) output = model.generate( **inputs, max_length=512, num_return_sequences=1, temperature=0.7, do_sample=True, top_p=0.95, pad_token_id=tokenizer.eos_token_id ) decoded = tokenizer.decode(output[0], skip_special_tokens=True) return decoded.replace(prompt.strip(), "").strip()

    Iterate and fill TODOs

    completed_responses = []

    for i, row in tqdm(df.iterrows(), total=len(df), desc="Processing"): prompt, response = row["prompt"], row["response"] if "TODO" in response: generated = complete_code(prompt + " " + response.split("TODO")[0]) response_filled = response.replace("TODO", generated) else: response_filled = response completed_responses.append(response_filled)

    Update DataFrame and save

    df["response"] = completed_responses df.to_csv("CPP_Dataset_Completed.csv", index=False) print("✅ Completed CSV saved as 'CPP_Dataset_Completed.csv'")

  18. d

    August 2024 data-update for "Updated science-wide author databases of...

    • elsevier.digitalcommonsdata.com
    Updated Sep 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John P.A. Ioannidis (2024). August 2024 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.7
    Explore at:
    Dataset updated
    Sep 16, 2024
    Authors
    John P.A. Ioannidis
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

  19. m

    AutoNaVIT-C : Vision-Based Path and Obstacle Segmentation Dataset for...

    • data.mendeley.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeevan S (2025). AutoNaVIT-C : Vision-Based Path and Obstacle Segmentation Dataset for Autonomous Driving - XML Compatible [Dataset]. http://doi.org/10.17632/8zhhjhyt35.1
    Explore at:
    Dataset updated
    Apr 14, 2025
    Authors
    Jeevan S
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    AutoNaVIT is a meticulously developed dataset designed to accelerate research in autonomous navigation, semantic scene understanding, and object segmentation through deep learning. This release includes only the annotation labels in XML format, aligned with high-resolution frames extracted from a controlled driving sequence at Vellore Institute of Technology – Chennai Campus (VIT-C). The corresponding images will be included in Version 2 of the dataset.

    Class Annotations The dataset features carefully annotated bounding boxes for the following three essential classes relevant to real-time navigation and path planning in autonomous vehicles:

    Kerb – 1,377 instances

    Obstacle – 258 instances

    Path – 532 instances

    All annotations were produced using Roboflow with human-verified precision, ensuring consistent, high-quality data that supports robust model development for urban and semi-urban scenarios.

    Data Capture Specifications The source video was captured using a Sony IMX890 sensor, under stable daylight lighting. Below are the capture parameters:

    Sensor Size: 1/1.56", 50 MP

    Lens: 6P optical configuration

    Aperture: ƒ/1.8

    Focal Length: 24mm equivalent

    Pixel Size: 1.0 µm

    Features: Optical Image Stabilization (OIS), PDAF autofocus

    Video Duration: 4 minutes 11 seconds

    Frame Rate: 2 FPS

    Total Annotated Frames: 504

    Format Compatibility and Model Support AutoNaVIT annotations are provided in Pascal VOC-compatible XML format, making them directly usable with models that support the Pascal VOC standard. The dataset is immediately compatible with:

    Pascal VOC

    As XML is a structured, extensible format, these annotations can be easily adapted for use with additional object detection frameworks that support XML-based label schemas.

    Benchmark Results To assess dataset utility, a YOLOv8 segmentation model was trained on the full dataset (including images). The model achieved the following results:

    Mean Average Precision (mAP): 96.5%

    Precision: 92.2%

    Recall: 94.4%

    These metrics demonstrate the dataset’s effectiveness in training models for autonomous vehicle perception and obstacle detection.

    Disclaimer and Attribution Requirement By downloading or using this dataset, users agree to the terms outlined in the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0):

    This dataset is available solely for academic and non-commercial research purposes.

    Proper attribution must be provided as follows: “Dataset courtesy of Vellore Institute of Technology – Chennai Campus.” This citation must appear in all research papers, presentations, or any work derived from this dataset.

    Redistribution, public hosting, commercial use, or modification is prohibited without prior written permission from VIT-C.

    Use of this dataset implies acceptance of these terms. All rights not explicitly granted are retained by VIT-C.

  20. Dataset associated with ORD-025118: Using a Gene Expression Biomarker to...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Dataset associated with ORD-025118: Using a Gene Expression Biomarker to Identify DNA Damage-Inducing Agents in Microarray Profiles [Dataset]. https://catalog.data.gov/dataset/dataset-associated-with-ord-025118-using-a-gene-expression-biomarker-to-identify-dna-damag
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Datasets used in ORD-025118: Using a Gene Expression Biomarker to Identify DNA Damage-Inducing Agents in Microarray Profiles. This dataset is associated with the following publication: Corton, C., A. Williams, and C. Yauk. Using a Gene Expression Biomarker to Identify DNA Damage-Inducing Agents in Microarray Profiles. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS. John Wiley & Sons, Inc, Hoboken, NJ, USA, 59(9): 772-784, (2018).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David Meldrum (2024). c-sharp-coding-dataset [Dataset]. https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset

c-sharp-coding-dataset

dmeldrum6/c-sharp-coding-dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 17, 2024
Authors
David Meldrum
Description

Dataset Card for c-sharp-coding-dataset

This dataset has been created with distilabel.

  Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset.

Search
Clear search
Close search
Google apps
Main menu