100+ datasets found

h
c-sharp-coding-dataset
huggingface.co
Updated Dec 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Meldrum (2024). c-sharp-coding-dataset [Dataset]. https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 17, 2024
Authors
David Meldrum
Description
Dataset Card for c-sharp-coding-dataset

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset.
GOTHiC, a probabilistic model to resolve complex biases and to identify real...
plos.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Borbala Mifsud; Inigo Martincorena; Elodie Darbo; Robert Sugar; Stefan Schoenfelder; Peter Fraser; Nicholas M. Luscombe (2023). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data [Dataset]. http://doi.org/10.1371/journal.pone.0174744
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0174744
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Borbala Mifsud; Inigo Martincorena; Elodie Darbo; Robert Sugar; Stefan Schoenfelder; Peter Fraser; Nicholas M. Luscombe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).
C-V2X Interoperability Testing Datasets
catalog.data.gov
gimi9.com
+2more
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). C-V2X Interoperability Testing Datasets [Dataset]. https://catalog.data.gov/dataset/c-v2x-interoperability-testing-datasets
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
These datasets contain C-V2X network communication and interoperability testing packet data collected using a network sniffer (Wireshark) in the Packet Capture (PCAP) format and converted into the Packet Description Markup Language (PDML) format. These datasets include three testcases: C-V2I, C-V2V, and C-V2X. These datasets can be used to display, analyze, and assess C-V2X compatibility and interoperability among commercial on-board units (OBUs) and road-side units (RSUs) based on IEEE 1609.2, IEEE 1609.3, and SAE J2735 standards.
EDDEN
openneuro.org
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jose Pedro Manzano Patron; Steen Moeller; Jesper L.R. Andersson; Essa Yacoub; Stamatios N. Sotiropoulos (2023). EDDEN [Dataset]. http://doi.org/10.18112/openneuro.ds004666.v1.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds004666.v1.0.0
Dataset updated
Aug 9, 2023
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Jose Pedro Manzano Patron; Steen Moeller; Jesper L.R. Andersson; Essa Yacoub; Stamatios N. Sotiropoulos
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
EDDEN stands for *E*valuation of *D*MRI *DEN*oising approaches. The data correspond to the publication: Manzano Patron, J.P., Moeller, S., Andersson, J.L.R., Yacoub, E., Sotiropoulos, S.N.. Denoising Diffusion MRI: Considerations and implications for analysis. doi: https://doi.org/10.1101/2023.07.24.550348. Please, cite it if you use this dataset.

Description of the dataset RAW Complex data (magnitude and phase) is acquired for a single subject at different SNR/resolution regimes, under ~/EDDEN/sub-01/ses-XXX/dwi/:

Dataset A (2mm)

This dataset represents a relatively medium-to-high SNR regime.

6 repeats of a 2mm isotropic multi-shell dataset each implementing the UK Biobank protocol (Miller et al., 2016)

TR=3s, TE=92ms, MB=3, no in-plane acceleration, scan time ∼6 minutes per repeat.

For each repeat, 116 volumes were acquired: 105 volumes with AP phase encoding direction, such as 5 b = 0 s/mm2 volumes, and 100 diffusion encoding orientations, with 50 b = 1000 s/mm2 and 50 b = 2000 s/mm2 volumes; and 4 b = 0 s/mm2 volumes with reversed phase encoding direction (PA) for susceptibility induced distortion correction (Andersson and Skare, 2002).

NOTES: Only 1 PA set of volumes was acquired for all the runs.

Dataset B (1p5mm):

This is a low-to-medium SNR dataset, with relatively high resolution.

5 repeats of a 1.5 mm isotropic multi-shell dataset, each implementing an HCP-like protocol in terms of q-space sampling (Sotiropoulos et al., 2013a).

TR=3.23 s, TE=89.2 ms, MB=4 no in-plane acceleration, scan time ∼16 minutes per repeat.

For each repeat, 300 volumes were acquired: 297 volumes with AP phase encoding direction, such as 27 b = 0 s/mm2 volumes, and 270 diffusion encoding orientations, with 90 b = 1000 s/mm2, 90 b = 2000 s/mm2, and 90 b = 3000 s/mm2 volumes; and 3 b = 0 s/mm2 volumes with PA phase encoding for susceptibility-induced distortion correction.

Dataset C (0p9mm):

This is a very low SNR dataset to represent extremely noisy data that without denoising are expected to be borderline unusable (particularly for the higher b).

4 repeats of an ultra-high-resolution multi-shell dataset with 0.9mm isotropic resolution.

TR=6.569 s, TE=91 ms, MB=3, in-plane GRAPPA=2, scan time ∼22 minutes per repeat.

For each repeat, 202 volumes were acquired with orientations as in (Harms et al., 2018): 199 volumes with AP phase encoding direction, such as 14 b = 0 s/mm2 volumes, and 185 diffusion encoding orientations, with 93 b = 1000 s/mm2, and 92 b = 2000 s/mm2 volumes; and 3 b = 0 s/mm2 volumes with PA phase encoding for susceptibility-induced distortion correction.

NOTES: The phase of the PAs is not available, and the same PA is used for runs 3 and 4.

Each dataset contains their own T1w-MPRAGE under ~/EDDEN/sub-01/ses-XXX/anat/. Each data set was acquired on a different day, to minimise fatigue, but all repeats within a dataset were acquired in the same session. All acquisitions were obtained parallel to the anterior and posterior commissure line, covering the entire cerebrum.

DERIVATIVES Here are the different denoised version of the raw data for the different datasets, the pre-processed data for the raw, denoised and averages, and the FA, MD and V1 outputs from the DTI model fitting (see *Data pre-processin section below). - Denoised data: - NLM (NLM), for Non-Local Means denoising applied to magnitude raw data. - MPPCA (|MPPCA|), for Marchenko-Pastur PCA denoising applied to magnitude raw data. - MPPCA_complex (MPPCA*), for Marchenko-Pastur PCA denoising applied to complex raw data. - NORDIC (NORDIC), for NORDIC applied to complex raw data. - AVG_mag (|AVG|), for the average of the multiple repeats in magnitude. - AVG_complex (AVG*), for the average in the complex space of the multiple repeats. - Masks: Under ~/EDDEN/derivatives/ses-XXX/masks we can find different masks for each dataset: - GM_mask: Gray Matter mask. - WM_mask: White Matter mask. - CC_mask: Corpus Callosum Matter mask. - CS_mask: Centrum Semiovale mask. - ventricles_mask: CSF ventricles mask. - nodif_brain_mask: Eroded brain mask.

Data pre-processing Both magnitude and phase data were retained for each acquisition to allow evaluations of denoising in both magnitude and complex domains. In order to allow distortion correction and processing for complex data and avoid phase incoherence artifacts, the raw complex-valued diffusion data were rotated to the real axis using the phase information. A spatially varying phase-field was estimated and complex vectors were multiplied with the conjugate of the phase. The phase-field was estimated uniquely for each slice and volume by firstly removing the phase variations from k-space sampling and coil sensitivity combination, and secondly by removing an estimate of a smooth residual phase-field. The smooth residual phase-field was estimated using a low-pass filter with a narrowed tapered cosine filter (a Tukey filter with an FWHM of 58%). Hence, the final signal was rotated approximately along the real axis, subject to the smoothness constraints.

Having the magnitude and complex data for each dataset, denoising was applied using different approaches prior to any pre-processing to minimise potential changes in statistical properties of the raw data due to interpolations (Veraart et al., 2016b). For denoising, we used the following four algorithms:

- **Denoising in the magnitude domain**: i) The Non-Local Means (**NLM**) (Buades et al., 2005) was applied as an exemplar of a simple non-linear filtering method adapted from traditional signal pre-processing. We used the default implementation in DIPY (Garyfallidis et al., 2014), where each dMRI volume is denoised independently. ii) The Marchenko-Pastur PCA (MPPCA) (denoted as **|MPPCA|** throughout the text) (Cordero-Grande et al., 2019; Veraart et al., 2016b), reflecting a commonly used approach that performs PCA over image patches and uses the MP theorem to identify noise components from the eigenspectrum. We used the default MrTrix3 implementation (Tournier et al., 2019). - **Denoising in the complex domain**: i) MPPCA applied to complex data (rotated along the real axis), denoted as **MPPCA***. We applied the MrTrix3 implementation of the magnitude MPPCA to the complex data rotated to the real axis (we found that this approach was more stable in terms of handling phase images and achieved better denoising, compared to the MrTrix3 complex MPPCA implementation). ii) The **NORDIC** algorithm (Moeller et al., 2021a), which also relies on the MP theorem, but performs variance spatial normalisation prior to noise component identification and filtering, to ensure noise stationarity assumptions are fulfilled.

All data, both raw and their four denoised versions, underwent the same pre-processing steps for distortion and motion correction (Sotiropoulos et al., 2013b) using an in-house pipeline (Mohammadi-Nejad et al., 2019). To avoid confounds from potential misalignment in the distortion-corrected diffusion native space obtained from each approach, we chose to compute a single susceptibility-induced off-resonance fieldmap using the raw data for each of the Datasets A, B and C; and then use the corresponding fieldmap for all denoising approaches in each dataset so that the reference native space stays the same for each of A, B and C. Note that differences between fieldmaps before and after denoising are small anyway, as the relatively high SNR b = 0 s/mm2 images are used to estimate them. But these small differences can cause noticeable misalignments between methods and confounds when attempting quantitative comparisons, which we avoid here using our approach. Hence, for each of the Datasets A, B and C, the raw blip-reversed b = 0 s/mm2 were used in FSL’s topup to generate a fieldmap (Andersson and Skare, 2002). This was then used into individual runs of FSL’s eddy for each approach (Andersson and Sotiropoulos, 2016) that applied the common fieldmap and performed corrections for eddy current and subject motion in a single interpolation step. FSL’s eddyqc (Bastiani et al.,2019) was used to generate quality control (QC) metrics, including SNR and angular CNR for each b value. The same T1w image was used within each dataset. A linear transformation estimated using with boundary-based registration (Greve and Fischl, 2009) was obtained from the corrected native diffusion space to the T1w space. The T1w image was skull-stripped and non-linearly registered to the MNI standard space allowing further analysis. Masks of white and grey matter were obtained from the T1w image using FSL’s FAST (Jenkinson et al., 2012) and they were aligned to diffusion space.
Physical Gene Regulatory Networks in C.elegans
kaggle.com
zip
Updated Feb 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Physical Gene Regulatory Networks in C.elegans [Dataset]. https://www.kaggle.com/datasets/thedevastator/physical-gene-regulatory-networks-in-c-elegans
Explore at:
zip(543510 bytes)Available download formats
Dataset updated
Feb 10, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Physical Gene Regulatory Networks in C.elegans

239,001 Regulatory Interactions from 289 Wild-type Young Adult Datasets

By [source]

About this dataset

This dataset provides highly complex physical gene regulatory networks in young adult wild-type (WT) C.elegans worms. With a total of 239,001 regulatory interactions collected from 289 datasets, this dataset is a great resource for studying gene regulation and exploring how this gene activity contributes to organism function under varying bio-environmental conditions. Our collection of datasets contains 126 genes and 495 transcription factors, along with functional knockdown data that has been used to validate the physical gene regulatory networks present in the young adult C.elegans worms. Moreover, researchers and biologists can leverage this data to gain valuable insights on how various genotypes, ages and strains are associated with different perturbations in their biological features and ultimately uncover new discoveries about the network of relationships that exist between these genes inside animals. This comprehensive dataset will be essential for conducting research related to such topics as life development processes or age-related diseases - further enriching our understanding of life!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will help you understand how to use this dataset of physical gene regulatory networks to research and analyze young adult C.elegans worms.

Understand the columns in the dataset: In this dataset, there are 239,001 regulatory interactions from 289 datasets consisting of 126 genes and 495 transcription factors registered with their genotype, age, strain, perturbation type, data type, data source and source used. Additionally, comments and regulator are also included in the columns for more information about each interaction.

Know your research goal: Determine what it is you wish to discover when working with this dataset so that you can work efficiently when sorting or exploring the data within it. Knowing your goals for the analysis will be helpful for deciding which column may provide valuable insights in relation to our project objectives when doing any kind of filter or sorting within the internal structure of our database file itself.

Analyzing Specific Types Of Data: Once your goals have been established it is then important to start analyzing specific types of data that are relevant for achieving those objectives as we go further into understanding what kind of database structures we will need to read from on a molecular level (this includes focusing on different types such as transcription factor levels). When looking at all these individual components together they can offer insight into how regulation may be changing within a cell’s environment & which pathways could become activated/ deactivated due its presence or absence throughout different conditions).

4 Keeping Logs And Documents Up To Date: Once done with some sortings or filters on certain columns make sure that your logs/documents stay up-to-date and match up with any changes made during analysis so as not mix-up usage across different documents/sessions throughout our project lifespan itself! This is highly recommended as having an organized record keeping system helps ensure accuracy when dealing with large volumes of information over time periods (thus making sure nothing gets overlooked accidentally!).

We hope these tips help get you started into exploring Physical Gene Regulatory Networks in C Elegans’! If you have any questions feel free to reach out via message – we would love hearing about how things go after implementing them into practice!

Research Ideas

Training machine-learning algorithms to develop automated approaches in predicting gene expression levels of individual regulatory networks.

Using this dataset alongside data from RNA-seq experiments to investigate how genetic mutations, environmental changes, and other factors can affect gene regulation across C.elegans populations.

Exploring the correlation between transcription factor binding sites and gene expression levels to predict potential target genes for a given transcription factor

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, m...
f
Supplementary data 2. Hi-C datasets used in the study
auckland.figshare.com
figshare.com
xlsx
Updated Mar 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sreemol Gokuladhas; William Schierding; Evgeniia Golovina; Tayaza Fadason; Justin O'Sullivan (2021). Supplementary data 2. Hi-C datasets used in the study [Dataset]. http://doi.org/10.17608/k6.auckland.14273630.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.17608/k6.auckland.14273630.v2
Dataset updated
Mar 24, 2021
Dataset provided by
The University of Auckland
Authors
Sreemol Gokuladhas; William Schierding; Evgeniia Golovina; Tayaza Fadason; Justin O'Sullivan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Details of the Hi-C datasets from 70 cell-lines/tissues used for the analysis is provided in the table. Hi-C contact data was used to find the target genes of the autoimmune disease-associated SNPs.
e
Land use forcing and LPJ-GUESS C pool projections
data.europa.eu
researchdata.se
unknown
Updated Sep 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lunds universitet (2017). Land use forcing and LPJ-GUESS C pool projections [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-18161-lpj-guess_plum-201708?locale=mt
Explore at:
unknownAvailable download formats
Dataset updated
Sep 1, 2017
Dataset authored and provided by
Lunds universitet
Description
Land use data for 2001-2100 from PLUM1.3 (Parsimonious Land Use Model version 1.3) coupled with a global energy-economics model, downscaled to 0.5*0.5 degree gridcells, for the five scenarios SSP1-SSP5 (reference and mitigation strategies) as described in detail in Engström et al. (2017). - Total terrestrial biosphere carbon (kg/m2) for 2001-2100 from simulations using the vegetation model LPJ-GUESS at 0.5*0.5 degree resolution for the 10 SSP-SPA scenarios using 1-3 RCPs for four different climate models as described in Engström et al. (2017). Ref.: Engström K., Lindeskog M., Olin S., Hassler J., and Smith B., (2017) Impacts of climate mitigation strategies in the energy sector on global land use and carbon balance.
Musical Scale Classification Dataset using Chroma
kaggle.com
zip
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om Avashia (2025). Musical Scale Classification Dataset using Chroma [Dataset]. https://www.kaggle.com/datasets/omavashia/synthetic-scale-chromagraph-tensor-dataset
Explore at:
zip(392580911 bytes)Available download formats
Dataset updated
Apr 8, 2025
Authors
Om Avashia
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
Dataset Description

Musical Scale Dataset: 1900+ Chroma Tensors Labeled by Scale

This dataset contains 1900+ unique synthetic musical audio samples generated from melodies in each of the 24 Western scales (12 major and 12 minor). Each sample has been converted into a chroma tensor, a 12-dimensional pitch class representation commonly used in music information retrieval (MIR) and deep learning tasks.

What’s Inside

chroma_tensor: A JSON-safe formatted of a PyTorch tensor with shape [1, 12, T], where:

12 = the 12 pitch classes (C, C#, D, ... B)

T = time steps

scale_index: An integer label from 0–23 identifying the scale the sample belongs to

Use Cases

This dataset is ideal for: - Training deep learning models (CNNs, MLPs) to classify musical scales - Exploring pitch-class distributions in Western tonal music - Prototyping models for music key detection, chord prediction, or tonal analysis - Teaching or demonstrating chromagram-based ML workflows

Labels

Index Scale
0 C major
1 C# major
... ...
11 B major
12 C minor
... ...
23 B minor

Quick Load Example (PyTorch)

Chroma tensors are of shape [1, 12, T], where: - 1 is the channel dimension (for CNN input) - 12 represents the 12 pitch classes (C through B) - T is the number of time frames

import torch import pandas as pd from tqdm import tqdm df = pd.read_csv("/content/scale_dataset.csv") # Reconstruct chroma tensors X = [torch.tensor(eval(row)).reshape(1, 12, -1) for row in tqdm(df['chroma_tensor'])] y = df['scale_index'].tolist()

Alternatively, you could directly load the chroma tensors and target scale indices using the .pt file.

import torch import pandas as pd data = torch.load("chroma_tensors.pt") X_pt = data['X'] # list of [1, 12, 302] tensors y_pt = data['y'] # list of scale indices

How It Was Built

Notes generated from random melodies using music21

MIDI converted to WAV via FluidSynth

Chromagrams extracted with librosa.feature.chroma_stft

Tensors flattened and saved alongside scale index labels

File Format

Column Type Description
chroma_tensor str Flattened 1D chroma tensor [1×12×T]
scale_index int Label from 0 to 23

Notes

Data is synthetic but musically valid and well-balanced

Each of the 24 scales appears 300 times

All tensors have fixed length (T) for easy batching
R
Dataset C 0510 Dataset
universe.roboflow.com
zip
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HMobility (2025). Dataset C 0510 Dataset [Dataset]. https://universe.roboflow.com/hmobility-esmgj/dataset-c-0510/dataset/4
Explore at:
zipAvailable download formats
Dataset updated
Jun 4, 2025
Dataset authored and provided by
HMobility
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Crosswalk Polygons
Description
DATASET C 0510

## Overview DATASET C 0510 is a dataset for instance segmentation tasks - it contains Crosswalk annotations for 676 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Data from: C Project Dataset
universe.roboflow.com
zip
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
seoyh rb (2022). C Project Dataset [Dataset]. https://universe.roboflow.com/seoyh-rb-gdiwi/c-project/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Sep 29, 2022
Dataset authored and provided by
seoyh rb
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Tooth Bounding Boxes
Description
C Project

## Overview C Project is a dataset for object detection tasks - it contains Tooth annotations for 713 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Functional Use Database (FUse)
catalog.data.gov
datasets.ai
+1more
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Functional Use Database (FUse) [Dataset]. https://catalog.data.gov/dataset/functional-use-database-fuse
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
There are five different files for this dataset: 1. A dataset listing the reported functional uses of chemicals (FUse) 2. All 729 ToxPrint descriptors obtained from ChemoTyper for chemicals in FUse 3. All EPI Suite properties obtained for chemicals in FUse 4. The confusion matrix values, similarity thresholds, and bioactivity index for each model. 5. The functional use prediction, bioactivity index, and prediction classification (poor prediction, functional substitute, candidate alternative) for each Tox21 chemical. This dataset is associated with the following publication: Phillips, K., J. Wambaugh, C. Grulke, K. Dionisio, and K. Isaacs. High-throughput screening of chemicals as functional substitutes using structure-based classification models. GREEN CHEMISTRY. Royal Society of Chemistry, Cambridge, UK, 19: 1063-1074, (2017).
Coastal Change Analysis Program (C-CAP) Regional Land Cover Data and Change...
catalog.data.gov
datasets.ai
+3more
Updated Apr 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA Office for Coastal Management (Point of Contact, Custodian) (2025). Coastal Change Analysis Program (C-CAP) Regional Land Cover Data and Change Data [Dataset]. https://catalog.data.gov/dataset/coastal-change-analysis-program-c-cap-regional-land-cover-data-and-change-data2
Explore at:
Dataset updated
Apr 15, 2025
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
The NOAA Coastal Change Analysis Program (C-CAP) produces national standardized land cover and change products for the coastal regions of the U.S. C-CAP products inventory coastal intertidal areas, wetlands, and adjacent uplands with the goal of monitoring changes in these habitats, on a one-to-five year repeat cycle. The timeframe for this metadata is reported as 1985 - 2010-Era, but the actual dates of the Landsat imagery used to create the land cover may have been acquired a few years before or after each era. These maps are developed utilizing Landsat Thematic Mapper imagery, and can be used to track changes in the landscape through time. This trend information gives important feedback to managers on the success or failure of management policies and programs and aid in developing a scientific understanding of the Earth system and its response to natural and human-induced changes. This understanding allows for the prediction of impacts due to these changes and the assessment of their cumulative effects, helping coastal resource managers make more informed regional decisions. NOAA C-CAP is a contributing member to the Multi-Resolution Land Characteristics consortium and C-CAP products are included as the coastal expression of land cover within the National Land Cover Database.
Data from: A large EEG database with users' profile information for motor...
data.europa.eu
zenodo.org
unknown
Updated Jan 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2023). A large EEG database with users' profile information for motor imagery Brain-Computer Interface research [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7554429?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
Jan 8, 2023
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context : We share a large database containing electroencephalographic signals from 87 human participants, with more than 20,800 trials in total representing about 70 hours of recording. It was collected during brain-computer interface (BCI) experiments and organized into 3 datasets (A, B, and C) that were all recorded following the same protocol: right and left hand motor imagery (MI) tasks during one single day session. It includes the performance of the associated BCI users, detailed information about the demographics, personality and cognitive user’s profile, and the experimental instructions and codes (executed in the open-source platform OpenViBE). Such database could prove useful for various studies, including but not limited to: 1) studying the relationships between BCI users' profiles and their BCI performances, 2) studying how EEG signals properties varies for different users' profiles and MI tasks, 3) using the large number of participants to design cross-user BCI machine learning algorithms or 4) incorporating users' profile information into the design of EEG signal classification algorithms. Sixty participants (Dataset A) performed the first experiment, designed in order to investigated the impact of experimenters' and users' gender on MI-BCI user training outcomes, i.e., users performance and experience, (Pillette & al). Twenty one participants (Dataset B) performed the second one, designed to examined the relationship between users' online performance (i.e., classification accuracy) and the characteristics of the chosen user-specific Most Discriminant Frequency Band (MDFB) (Benaroch & al). The only difference between the two experiments lies in the algorithm used to select the MDFB. Dataset C contains 6 additional participants who completed one of the two experiments described above. Physiological signals were measured using a g.USBAmp (g.tec, Austria), sampled at 512 Hz, and processed online using OpenViBE 2.1.0 (Dataset A) & OpenVIBE 2.2.0 (Dataset B). For Dataset C, participants C83 and C85 were collected with OpenViBE 2.1.0 and the remaining 4 participants with OpenViBE 2.2.0. Experiments were recorded at Inria Bordeaux sud-ouest, France. Duration : Each participant's folder is composed of approximately 48 minutes EEG recording. Meaning six 7-minutes runs and a 6-minutes baseline. Documents Instructions: checklist read by experimenters during the experiments. Questionnaires: the Mental Rotation test used, the translation of 4 questionnaires, notably the Demographic and Social information, the Pre and Post-session questionnaires, and the Index of Learning style. English and french version Performance: The online OpenViBE BCI classification performances obtained by each participant are provided for each run, as well as answers to all questionnaires Scenarios/scripts : set of OpenViBE scenarios used to perform each of the steps of the MI-BCI protocol, e.g., acquire training data, calibrate the classifier or run the online MI-BCI Database : raw signals Dataset A : N=60 participants Dataset B : N=21 participants Dataset C : N=6 participants
R
Capstone C Final 1 Dataset
universe.roboflow.com
zip
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Capstone C (2023). Capstone C Final 1 Dataset [Dataset]. https://universe.roboflow.com/capstone-c/capstone-c-final-dataset-1-vgfa5
Explore at:
zipAvailable download formats
Dataset updated
Oct 30, 2023
Dataset authored and provided by
Capstone C
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cars Bounding Boxes
Description
Capstone C Final Dataset 1

## Overview Capstone C Final Dataset 1 is a dataset for object detection tasks - it contains Cars annotations for 4,564 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Z
Preliminary Coastal Grain Size Portal (C-GRASP) dataset. Version 1, January...
data.niaid.nih.gov
Updated Feb 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Buscombe, Daniel; Speiser, William; Goldstein, Evan (2022). Preliminary Coastal Grain Size Portal (C-GRASP) dataset. Version 1, January 2022 [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5874230
Explore at:
Dataset updated
Feb 16, 2022
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
University of North Carolina at Greensboro
Authors
Buscombe, Daniel; Speiser, William; Goldstein, Evan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Provisional database: The data you have secured from the U.S. Geological Survey (USGS) database identified as Preliminary Coastal Grain Size Portal (C-GRASP) dataset. Version 1, January 2022 have not received USGS approval and as such are provisional and subject to revision. The data are released on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use.

Version 1 (January 2022) of the the Coastal Grain Size Portal (C-GRASP) database. This is a preliminary internal deliverable for the National Oceanography Partnership Program (NOPP) Task 1 / USGS Gesch team and project partners only.

The primary purpose of this Provisional data release is to provide National Oceanography Partnership Program (NOPP) project partners with programmatic access to this preliminary version of the Coastal Grain Size Portal (C-GRASP) database for internal project use. These data are preliminary or provisional and are subject to revision. They are being provided to meet the need for timely best science. The data have not received final approval by the U.S. Geological Survey (USGS) and are provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the data.

This preliminary data release contains various files that list grain size information collated from secondary data already in the public domain, in the form of public datasets, or in published literature.

Where possible, we have indicated the source, location, and sampling methods used to obtain these data. Where not possible to establish these facts, those fields have been left empty.

More information on our methods, data sources, and data processing and analysis codes are found on our github page

The dataset consists of one zipped file, Source_Files.zip, and 4 comma separated value (csv) files

dataset_10kmcoast.csv- This is all data that is found to be within 10km of the Natural Earth coastline polyline

Data_EstimatedOnshore.csv- This is all the data from dataset_10kmcoast.csv that lies within the Natural Earth United States Polygon

Data_VerifiedOnshore.csv- This is all data that was able to be verified onshore from either sampling method, note, or location type data

Data_Post2012_VerifiedOnshore.csv- This is all the data from Data_VerifiedOnshore.csv that is after 2012

The files each have the following fields (no data is blank):

'ID': row ID integer

'Sample_ID': identifier to raw data source

'Sample_Type_Code': code of sample id

'Project': raw datasource project identifier

'dataset': raw dataset major identifier

'Date': date, where specified, and to whatever precision that is specified

'Location_Type': where specified, code indicating type of location information

'latitude': latitude in decimal degrees

'longitude': longitude in decimal degrees

'Contact': where specified, raw data originator

'num_orig_dists': number of unique grain size distributions

'Measured_Distributions': number iof measured grain size distributions

'Grainsize': grain size is sometimes reported without specification

'Mean', mean grain size in mm

'Median', median grain size in mm

'Wentworth', wentworth name (one of ['Clay', 'CoarseSand', 'CoarseSilt', 'Cobble', 'FineSand', 'FineSilt', 'Granule', 'MediumSand', 'MediumSilt', 'Pebble', 'VeryCoarseSand', 'VeryFineSand', 'VeryFineSilt'])

'Kurtosis', kurtosis value (non-dim)

'Kurtosis_Class', kurtosis category

'Skewness', skewness value (non-dim)

'Skewness_Class', skewness category

'Std', standard deviation of grain sizes

'Sorting', sorting category

'd5', grain size distribution 5th percentile

'd10', grain size distribution 10th percentile

'd16', grain size distribution 16th percentile

'd25', grain size distribution 25th percentile

'd30', grain size distribution 30th percentile

'd50', grain size distribution 50th percentile

'd65', grain size distribution 65th percentile

'd75', grain size distribution 75th percentile

'd84',grain size distribution 84th percentile

'd90', grain size distribution 90th percentile

'd95', grain size distribution 95th percentile

'Notes': notes - these can be informative and substantial, do not disregard

Source_Files.zip contains 11 comma separated value files, namely bicms.csv boem.csv clark.csv dbseabed.csv ecstdb.csv mass.csv mcfall.csv rossi.csv sandsnap.csv sbell.csv ussb.csv, which contain raw datasets that have been collated and extracted from their native formats into csv format

Index	Scale
0	C major
1	C# major
...	...
11	B major
12	C minor
...	...
23	B minor

Column	Type	Description
`chroma_tensor`	`str`	Flattened 1D chroma tensor `[1×12×T]`
`scale_index`	`int`	Label from 0 to 23

Live tables on National Land Use Database of previously-developed Brownfield...

gov.uk

Updated Nov 10, 2012

Facebook

Twitter

Click to copy link

Link copied

Cite

Ministry of Housing, Communities & Local Government (2018 to 2021) (2012). Live tables on National Land Use Database of previously-developed Brownfield Land [Dataset]. https://www.gov.uk/government/statistical-data-sets/live-tables-on-national-land-use-database-of-previously-developed-brownfield-land

Explore at:

Dataset updated

Nov 10, 2012

Dataset provided by

GOV.UK

Authors

Ministry of Housing, Communities & Local Government (2018 to 2021)

Description

https://assets.publishing.service.gov.uk/media/5a79ba7040f0b642860da4a6/8651421.xls">

https://assets.publishing.service.gov.uk/media/5a79ba7040f0b642860da4a6/8651421.xls">Table P301 - Previously-developed land by land type

 <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">16.5 KB</span></p>




 <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
 <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">

Request an accessible format.

  If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternativeformats@communities.gov.uk" target="_blank" class="govuk-link">alternativeformats@communities.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.

https://assets.publishing.service.gov.uk/media/5a79064140f0b679c0a07eed/8651471.xls">

https://assets.publishing.service.gov.uk/media/5a79064140f0b679c0a07eed/8651471.xls">Table P302 - Previously-developed land by whether vacant/derelict or in use

 <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">18 KB</span></p>




 <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
 <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">

Request an accessible format.

  If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternativeformats@communities.gov.uk" target="_blank" class="govuk-link">alternativeformats

CPP Dataset
kaggle.com
zip
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mujtaba Ahmed (2025). CPP Dataset [Dataset]. https://www.kaggle.com/datasets/rajamujtabaahmed/cpp-dataset
Explore at:
zip(82876 bytes)Available download formats
Dataset updated
Apr 21, 2025
Authors
Mujtaba Ahmed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains 10,000 unique C++ programming prompts along with their corresponding code responses, designed specifically for training and evaluating natural language generation models such as Transformers. ** Each row in the CSV contains:**

id: A unique identifier for each record.

prompt: A C++ programming instruction or task, phrased in natural language.

response: The corresponding C++ source code fulfilling the prompt.

The prompts include a wide range of programming concepts, such as:

Basic arithmetic operations

Loops and conditionals

Class and object creation

Recursion and algorithm design

Template functions and data structures

This dataset is ideal for:

Fine-tuning code generation models (e.g., GPT-style models)

Creating educational tools or auto-code assistants

Exploring zero-shot/few-shot learning in code generation

Following Code can Be used to complete all #TODO Programs in the Dataset:

import pandas as pd from transformers import AutoModelForCausalLM, AutoTokenizer import torch from tqdm import tqdm

Load your dataset

df = pd.read_csv("/Path/CPP_Dataset_MujtabaAhmed.csv")

Load the model and tokenizer (CodeGen 350M - specialized for programming)

model_name = "Salesforce/codegen-350M-mono" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name).cuda() # Use .cpu() if no GPU

Function to complete C++ code with TODO

def complete_code(prompt): input_text = prompt.strip() + " " inputs = tokenizer(input_text, return_tensors="pt").to(model.device) output = model.generate( **inputs, max_length=512, num_return_sequences=1, temperature=0.7, do_sample=True, top_p=0.95, pad_token_id=tokenizer.eos_token_id ) decoded = tokenizer.decode(output[0], skip_special_tokens=True) return decoded.replace(prompt.strip(), "").strip()

Iterate and fill TODOs

completed_responses = []

for i, row in tqdm(df.iterrows(), total=len(df), desc="Processing"): prompt, response = row["prompt"], row["response"] if "TODO" in response: generated = complete_code(prompt + " " + response.split("TODO")[0]) response_filled = response.replace("TODO", generated) else: response_filled = response completed_responses.append(response_filled)

Update DataFrame and save

df["response"] = completed_responses df.to_csv("CPP_Dataset_Completed.csv", index=False) print("✅ Completed CSV saved as 'CPP_Dataset_Completed.csv'")
d
August 2024 data-update for "Updated science-wide author databases of...
elsevier.digitalcommonsdata.com
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John P.A. Ioannidis (2024). August 2024 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.7
Explore at:
Unique identifier
https://doi.org/10.17632/btchxktzyw.7
Dataset updated
Sep 16, 2024
Authors
John P.A. Ioannidis
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
m
AutoNaVIT-C : Vision-Based Path and Obstacle Segmentation Dataset for...
data.mendeley.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeevan S (2025). AutoNaVIT-C : Vision-Based Path and Obstacle Segmentation Dataset for Autonomous Driving - XML Compatible [Dataset]. http://doi.org/10.17632/8zhhjhyt35.1
Explore at:
Unique identifier
https://doi.org/10.17632/8zhhjhyt35.1
Dataset updated
Apr 14, 2025
Authors
Jeevan S
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
AutoNaVIT is a meticulously developed dataset designed to accelerate research in autonomous navigation, semantic scene understanding, and object segmentation through deep learning. This release includes only the annotation labels in XML format, aligned with high-resolution frames extracted from a controlled driving sequence at Vellore Institute of Technology – Chennai Campus (VIT-C). The corresponding images will be included in Version 2 of the dataset.

Class Annotations The dataset features carefully annotated bounding boxes for the following three essential classes relevant to real-time navigation and path planning in autonomous vehicles:

Kerb – 1,377 instances

Obstacle – 258 instances

Path – 532 instances

All annotations were produced using Roboflow with human-verified precision, ensuring consistent, high-quality data that supports robust model development for urban and semi-urban scenarios.

Data Capture Specifications The source video was captured using a Sony IMX890 sensor, under stable daylight lighting. Below are the capture parameters:

Sensor Size: 1/1.56", 50 MP

Lens: 6P optical configuration

Aperture: ƒ/1.8

Focal Length: 24mm equivalent

Pixel Size: 1.0 µm

Features: Optical Image Stabilization (OIS), PDAF autofocus

Video Duration: 4 minutes 11 seconds

Frame Rate: 2 FPS

Total Annotated Frames: 504

Format Compatibility and Model Support AutoNaVIT annotations are provided in Pascal VOC-compatible XML format, making them directly usable with models that support the Pascal VOC standard. The dataset is immediately compatible with:

Pascal VOC

As XML is a structured, extensible format, these annotations can be easily adapted for use with additional object detection frameworks that support XML-based label schemas.

Benchmark Results To assess dataset utility, a YOLOv8 segmentation model was trained on the full dataset (including images). The model achieved the following results:

Mean Average Precision (mAP): 96.5%

Precision: 92.2%

Recall: 94.4%

These metrics demonstrate the dataset’s effectiveness in training models for autonomous vehicle perception and obstacle detection.

Disclaimer and Attribution Requirement By downloading or using this dataset, users agree to the terms outlined in the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0):

This dataset is available solely for academic and non-commercial research purposes.

Proper attribution must be provided as follows: “Dataset courtesy of Vellore Institute of Technology – Chennai Campus.” This citation must appear in all research papers, presentations, or any work derived from this dataset.

Redistribution, public hosting, commercial use, or modification is prohibited without prior written permission from VIT-C.

Use of this dataset implies acceptance of these terms. All rights not explicitly granted are retained by VIT-C.
Dataset associated with ORD-025118: Using a Gene Expression Biomarker to...
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Dataset associated with ORD-025118: Using a Gene Expression Biomarker to Identify DNA Damage-Inducing Agents in Microarray Profiles [Dataset]. https://catalog.data.gov/dataset/dataset-associated-with-ord-025118-using-a-gene-expression-biomarker-to-identify-dna-damag
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Datasets used in ORD-025118: Using a Gene Expression Biomarker to Identify DNA Damage-Inducing Agents in Microarray Profiles. This dataset is associated with the following publication: Corton, C., A. Williams, and C. Yauk. Using a Gene Expression Biomarker to Identify DNA Damage-Inducing Agents in Microarray Profiles. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS. John Wiley & Sons, Inc, Hoboken, NJ, USA, 59(9): 772-784, (2018).

Facebook

Twitter

Click to copy link

Link copied

Cite

David Meldrum (2024). c-sharp-coding-dataset [Dataset]. https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset

c-sharp-coding-dataset

dmeldrum6/c-sharp-coding-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 17, 2024

Authors

David Meldrum

Description

Dataset Card for c-sharp-coding-dataset

This dataset has been created with distilabel.

  Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/dmeldrum6/c-sharp-coding-dataset.

Clear search

Close search

Google apps

Main menu

c-sharp-coding-dataset

GOTHiC, a probabilistic model to resolve complex biases and to identify real...

C-V2X Interoperability Testing Datasets

EDDEN

Physical Gene Regulatory Networks in C.elegans

Physical Gene Regulatory Networks in C.elegans

239,001 Regulatory Interactions from 289 Wild-type Young Adult Datasets

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Supplementary data 2. Hi-C datasets used in the study

Land use forcing and LPJ-GUESS C pool projections

Musical Scale Classification Dataset using Chroma

Dataset Description

What’s Inside

Use Cases

Labels

Quick Load Example (PyTorch)

How It Was Built

File Format

Notes

Dataset C 0510 Dataset

DATASET C 0510

Data from: C Project Dataset

C Project

Functional Use Database (FUse)

Coastal Change Analysis Program (C-CAP) Regional Land Cover Data and Change...

Data from: A large EEG database with users' profile information for motor...

Capstone C Final 1 Dataset

Capstone C Final Dataset 1

Preliminary Coastal Grain Size Portal (C-GRASP) dataset. Version 1, January...

Live tables on National Land Use Database of previously-developed Brownfield...

https://assets.publishing.service.gov.uk/media/5a79ba7040f0b642860da4a6/8651421.xls">Table P301 - Previously-developed land by land type

https://assets.publishing.service.gov.uk/media/5a79064140f0b679c0a07eed/8651471.xls">Table P302 - Previously-developed land by whether vacant/derelict or in use

CPP Dataset

Load your dataset

Load the model and tokenizer (CodeGen 350M - specialized for programming)

Function to complete C++ code with TODO

Iterate and fill TODOs

Update DataFrame and save

August 2024 data-update for "Updated science-wide author databases of...

AutoNaVIT-C : Vision-Based Path and Obstacle Segmentation Dataset for...

Dataset associated with ORD-025118: Using a Gene Expression Biomarker to...

c-sharp-coding-datasetSee More Versions

dmeldrum6/c-sharp-coding-dataset

c-sharp-coding-dataset