67 datasets found
  1. f

    Data from: Time-Split Cross-Validation as a Method for Estimating the...

    • figshare.com
    • acs.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert P. Sheridan (2023). Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. [Dataset]. http://doi.org/10.1021/ci400084k.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    ACS Publications
    Authors
    Robert P. Sheridan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.

  2. P

    MUSIC-AVQA-R Dataset

    • paperswithcode.com
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie Ma; Min Hu; Pinghui Wang; Wangchun Sun; Lingyun Song; Hongbin Pei; Jun Liu; Youtian Du (2025). MUSIC-AVQA-R Dataset [Dataset]. https://paperswithcode.com/dataset/music-avqa-r
    Explore at:
    Dataset updated
    Apr 16, 2025
    Authors
    Jie Ma; Min Hu; Pinghui Wang; Wangchun Sun; Lingyun Song; Hongbin Pei; Jun Liu; Youtian Du
    Description

    We introduce the first dataset, MUSIC-AVQA-R, to evaluate the robustness of AVQA models. The construction of this dataset involves two key processes: rephrasing and splitting. The former involves the rephrasing of questions in the test split of MUSIC-AVQA, and the latter is dedicated to the categorization of questions into frequent (head) and rare (tail) subset.

    We followed the previous work in partitioning the dataset into "head" and "tail" categories. Based on the number of answers in the dataset, answers with a count greater than $1.2$ times the mean, denoted as $\mu(a)$, were categorized as "head" while those with counts less than $1.2\mu(a)$ were categorized as "tail"

  3. f

    DataSheet_1_Automated data preparation for in vivo tumor characterization...

    • frontiersin.figshare.com
    docx
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denis Krajnc; Clemens P. Spielvogel; Marko Grahovac; Boglarka Ecsedi; Sazan Rasul; Nina Poetsch; Tatjana Traub-Weidinger; Alexander R. Haug; Zsombor Ritter; Hussain Alizadeh; Marcus Hacker; Thomas Beyer; Laszlo Papp (2023). DataSheet_1_Automated data preparation for in vivo tumor characterization with machine learning.docx [Dataset]. http://doi.org/10.3389/fonc.2022.1017911.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Frontiers
    Authors
    Denis Krajnc; Clemens P. Spielvogel; Marko Grahovac; Boglarka Ecsedi; Sazan Rasul; Nina Poetsch; Tatjana Traub-Weidinger; Alexander R. Haug; Zsombor Ritter; Hussain Alizadeh; Marcus Hacker; Thomas Beyer; Laszlo Papp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThis study proposes machine learning-driven data preparation (MLDP) for optimal data preparation (DP) prior to building prediction models for cancer cohorts.MethodsA collection of well-established DP methods were incorporated for building the DP pipelines for various clinical cohorts prior to machine learning. Evolutionary algorithm principles combined with hyperparameter optimization were employed to iteratively select the best fitting subset of data preparation algorithms for the given dataset. The proposed method was validated for glioma and prostate single center cohorts by 100-fold Monte Carlo (MC) cross-validation scheme with 80-20% training-validation split ratio. In addition, a dual-center diffuse large B-cell lymphoma (DLBCL) cohort was utilized with Center 1 as training and Center 2 as independent validation datasets to predict cohort-specific clinical endpoints. Five machine learning (ML) classifiers were employed for building prediction models across all analyzed cohorts. Predictive performance was estimated by confusion matrix analytics over the validation sets of each cohort. The performance of each model with and without MLDP, as well as with manually-defined DP were compared in each of the four cohorts.ResultsSixteen of twenty established predictive models demonstrated area under the receiver operator characteristics curve (AUC) performance increase utilizing the MLDP. The MLDP resulted in the highest performance increase for random forest (RF) (+0.16 AUC) and support vector machine (SVM) (+0.13 AUC) model schemes for predicting 36-months survival in the glioma cohort. Single center cohorts resulted in complex (6-7 DP steps) DP pipelines, with a high occurrence of outlier detection, feature selection and synthetic majority oversampling technique (SMOTE). In contrast, the optimal DP pipeline for the dual-center DLBCL cohort only included outlier detection and SMOTE DP steps.ConclusionsThis study demonstrates that data preparation prior to ML prediction model building in cancer cohorts shall be ML-driven itself, yielding optimal prediction models in both single and multi-centric settings.

  4. Val split & vocab file

    • kaggle.com
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devi Hemamalini R (2024). Val split & vocab file [Dataset]. https://www.kaggle.com/datasets/devihemamalinir/val-split-and-vocab-file/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Devi Hemamalini R
    Description

    Dataset

    This dataset was created by Devi Hemamalini R

    Contents

  5. P

    GR712RC LEON3 Power Model Data Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated May 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). GR712RC LEON3 Power Model Data Dataset [Dataset]. https://paperswithcode.com/dataset/leon3-sample-data
    Explore at:
    Dataset updated
    May 25, 2021
    Description

    Dataset Files The official dataset files are hosted at https://dx.doi.org/10.21227/1y7r-am78.

    Generating the models from the LEON3 sample data The data for this paper is generated using a custom open-source methodology called REPPS. In order to replicate the results, first you must follow all the steps in https://github.com/TSL-UOB/TP-REPPS in order to install and configure all the scripts and supporting programs. Afterwards you can proceed with executing the following commands to generate the various models from the LEON3 data.

    DISCLAIMER - If you have any issues please don't hesitate to contact via email.

    Generate models trained on BEEBS and validated on the use_case_core application ASIC Only Model ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_use_case_finegrain.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_use_case_split.data -p 6 -e 4 -d 2 -o 2 -s 20210421_leon3_beebs_ucc_pwr_fngr_nocyc_nocth_asicdata_avgrelerr_nfolds_ools.data

    Bottom-Up Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_use_case_finegrain.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_use_case_split.data -p 6 -l 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 -m 1 -n 16 -c 1 -g -i 50 -d 2 -o 2 -s 20210425_leon3_beebs_ucc_pwr_fngr_allev_nocyc_nocth_botup_avgrelerr_nfolds_ools.data

    Top-Down Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_use_case_finegrain.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_use_case_split.data -p 6 -l 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 -m 2 -n 1 -c 1 -g -i 50 -d 2 -o 2 -s 20210425_leon3_beebs_ucc_pwr_fngr_allev_nocyc_nocth_topdown_avgrelerr_nfolds_ools

    Validate the previous models on BEEBS as well (no need to redo all the event selection, just use same events) ASIC Only Model ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_BEEBS_split.data -p 6 -e 4 -d 2 -o 2 -s 20210421_leon3_beebs_beebs_pwr_fngr_nocyc_nocth_asicdata_avgrelerr_nfolds_ools.data

    Bottom-Up Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_BEEBS_split.data -p 6 -e 24 -d 2 -o 2 -s 20210421_leon3_beebs_beebs_pwr_fngr_allev_nocyc_nocth_botup_avgrelerr_nfolds_ools.data

    Top-Down Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_BEEBS_split.data -p 6 -e 9,10,12,13,14,15,16,18,19,20,22,23 -d 2 -o 2 -s 20210421_leon3_beebs_beebs_pwr_fngr_allev_nocyc_nocth_topdown_avgrelerr_nfolds_ools.data

    Visualise the data Generate model per-sample breakdown files for the 1st run of the use_case_opt application ASIC Only Model ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_use_case_finegrain_1run.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_onlyusecaseopt_split.data -p 6 -e 4 -d 2 -o 6 -s /PATH/TO/ESL_paper_data/20210421_leon3_beebs_uco_pwr_fngr_nocyc_nocth_asicdata_avgrelerr_nfolds_ools_1r.data

    Bottom-Up Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_use_case_finegrain_1run.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_onlyusecaseopt_split.data -p 6 -e 24 -d 2 -o 6 -s /PATH/TO/ESL_paper_data/20210427_leon3_beebs_uco_pwr_fngr_allev_nocyc_nocth_botup_avgrelerr_nfolds_ools_1r.data

    Top-Down Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_use_case_finegrain_1run.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_onlyusecaseopt_split.data -p 6 -e 9,10,12,13,14,15,16,18,19,20,22,23 -d 2 -o 6 -s /PATH/TO/ESL_paper_data/20210427_leon3_beebs_uco_pwr_fngr_allev_nocyc_nocth_topdown_avgrelerr_nfolds_ools_1r.data

    Generate model per-sample breakdown files for the 1st run of the BEEBS benchmarks ASIC Only Model ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain_1run.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_BEEBS_split.data -p 6 -e 4 -d 2 -o 6 -s /PATH/TO/ESL_paper_data/20210423_leon3_beebs_beebs_pwr_fngr_nocyc_nocth_asicdata_avgrelerr_nfolds_ools_1r.data

    Bottom-Up Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain_1run.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_BEEBS_split.data -p 6 -e 24 -d 2 -o 6 -s /PATH/TO/ESL_paper_data/20210427_leon3_beebs_beebs_pwr_fngr_allev_nocyc_nocth_botup_avgrelerr_nfolds_ools_1r.data

    Top-Down Search ./octave_makemodel.sh -r /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain.data -t /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain_1run.data -b /PATH/TO/ESL_paper_data/split/LEON3_BEEBS_BEEBS_split.data -p 6 -e 9,10,12,13,14,15,16,18,19,20,22,23 -d 2 -o 6 -s /PATH/TO/ESL_paper_data/20210427_leon3_beebs_beebs_pwr_fngr_allev_nocyc_nocth_topdown_avgrelerr_nfolds_ools_1r.data

    Plot the model per-sample breakdwon data using MODELDATA_plot.py Plot the use_case_opt 1st run per-sample physical measurements and model errors ./MODELDATA_plot.py -p 1 -x "Samples[#]" -t 10 -y "Power[W]" -b /PATH/TO/ESL_paper_data/data/LEON3_use_case_opt_finegrain_1run.data -l "Sensor Data" -i /PATH/TO/ESL_paper_data/20210421_leon3_beebs_uco_pwr_fngr_nocyc_nocth_asicdata_avgrelerr_nfolds_ools_1r.data -a 'ASIC Data Only' -i /PATH/TO/ESL_paper_data/20210427_leon3_beebs_uco_pwr_fngr_allev_nocyc_nocth_botup_avgrelerr_nfolds_ools_1r.data -a "Bottom-Up Search" -i /PATH/TO/ESL_paper_data/20210427_leon3_beebs_uco_pwr_fngr_allev_nocyc_nocth_topdown_avgrelerr_nfolds_ools_1r.data -a "Top-Down Search"

    Plot the BEEBS 1st run per-sample physical measurements and model errors ./MODELDATA_plot.py -p 1 -x "Samples[#]" -t 10 -y "Power[W]" -b /PATH/TO/ESL_paper_data/data/LEON3_BEEBS_finegrain_1run_physicaldata.data -l "Sensor Data" -i /PATH/TO/ESL_paper_data/20210423_leon3_beebs_beebs_pwr_fngr_nocyc_nocth_asicdata_avgrelerr_nfolds_ools_1r.data -a 'ASIC Data Only' -i /PATH/TO/ESL_paper_data/20210427_leon3_beebs_beebs_pwr_fngr_allev_nocyc_nocth_botup_avgrelerr_nfolds_ools_1r.data -a "Bottom-Up Search" -i /PATH/TO/ESL_paper_data/20210427_leon3_beebs_beebs_pwr_fngr_allev_nocyc_nocth_topdown_avgrelerr_nfolds_ools_1r.data -a "Top-Down Search"

  6. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image...

    • zenodo.org
    bin, zip
    Updated Feb 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roser Viñals; Roser Viñals; Jean-Philippe Thiran; Jean-Philippe Thiran (2024). A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning: Dataset (2/6) [Dataset]. http://doi.org/10.5281/zenodo.10608742
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Roser Viñals; Roser Viñals; Jean-Philippe Thiran; Jean-Philippe Thiran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of ultrafast ultrasound acquisitions from nine volunteers and the CIRS 054G phantom. For a comprehensive understanding of the dataset, please refer to the paper: Viñals, R.; Thiran, J.-P. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning. J. Imaging 2023, 9, 256. https://doi.org/10.3390/jimaging9120256. Please cite the original paper when using this dataset.

    Due to data size restriction, the dataset has been divided into six subdatasets, each one published into a separate entry in Zenodo. This repository contains subdataset 2.

    Structure

    In Vivo Data

    • Number of Acquisitions: 20,000

    • Volunteers: Nine volunteers

    • File Structure: Each volunteer's data is compressed in a separate zip file.

      • Note: For volunteer 1, due to a higher number of acquisitions, data for this volunteer is distributed across multiple zip files, each containing acquisitions from different body regions.
    • Regions :

      • Abdomen: 6599 acquisitions
      • Neck: 3294 acquisitions
      • Breast: 3291 acquisitions
      • Lower limbs: 2616 acquisitions
      • Upper limbs: 2110 acquisitions
      • Back: 2090 acquisitions
    • File Naming Convention: Incremental IDs from acquisition_00000 to acquisition_19999.

    In Vitro Data

    • Number of Acquisitions: 32 from CIRS model 054G phantom
    • File Structure: The in vitro data is compressed in the cirs-phantom.zip file.
    • File Naming Convention: Incremental IDs from invitro_00000 to invitro_00031.

    CSV Files

    Two CSV files are provided:

    • invivo_dataset.csv :

      • Contains a list of all in vivo acquisitions.
      • Columns: id, path, volunteer id, body region.
    • invitro_dataset.csv :

      • Contains a list of all in vitro acquisitions.
      • Columns: id, path

    Zenodo dataset splits and files

    The dataset has been divided into six subdatasets, each one published in a separate entry on Zenodo. The following table indicates, for each file or compressed folder, the Zenodo dataset split where it has been uploaded along with its size. Each dataset split is named "A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning: Dataset (ii/6)", where ii represents the split number. This repository contains the 2nd split.

    File nameSizeZenodo subdataset number
    invivo_dataset.csv995.9 kB1
    invitro_dataset.csv1.1 kB1
    cirs-phantom.zip418.2 MB1
    volunteer-1-lowerLimbs.zip29.7 GB1
    volunteer-1-carotids.zip8.8 GB1
    volunteer-1-back.zip7.1 GB1
    volunteer-1-abdomen.zip34.0 GB2
    volunteer-1-breast.zip15.7 GB2
    volunteer-1-upperLimbs.zip25.0 GB3
    volunteer-2.zip26.5 GB4
    volunteer-3.zip20.3 GB3
    volunteer-4.zip24.1 GB5
    volunteer-5.zip6.5 GB5
    volunteer-6.zip11.5 GB5
    volunteer-7.zip11.1 GB6
    volunteer-8.zip21.2 GB6
    volunteer-9.zip23.2 GB4

    Normalized RF Images

    • Beamforming:

      • Depth from 1 mm to 55 mm

      • Width spanning the probe aperture

      • Grid: 𝜆/8 × 𝜆/8

      • Resulting images shape: 1483 × 1189

      • Two beamformed RF images from each acquisition:

        • Input image: single unfocused acquisition obtained from a single plane wave (PW) steered at 0° (acquisition-xxxx-1PW)
        • Target image: coherently compounded image from 87 PWs acquisitions steered at different angles (acquisition-xxxx-87PWs)
    • Normalization:

      • The two RF images have been normalized
    • To display the images:

      • Perform the envelop detection (to obtain the IQ images)
      • Log-compress (to obtain the B-mode images)
    • File Format: Saved in npy format, loadable using Python and numpy.load(file).

    Training and Validation Split in the paper

    For the volunteer-based split used in the paper:

    • Training set: volunteers 1, 2, 3, 6, 7, 9
    • Validation set: volunteer 4
    • Test set: volunteers 5, 8
    • Images analyzed in the paper
      • Carotid acquisition (from volunteer 5): acquisition_12397
      • Back acquisition (from volunteer 8): acquisition_19764
      • In vitro acquisition: invitro-00030

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

    Please cite the original paper when using this dataset :

    Viñals, R.; Thiran, J.-P. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning. J. Imaging 2023, 9, 256. DOI: 10.3390/jimaging9120256

    Contact

    For inquiries or issues related to this dataset, please contact:

    • Name: Roser Viñals
    • Email: roser.vinalsterres@epfl.ch
  7. Downsized camera trap images for automated classification

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Dec 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danielle L Norman; Danielle L Norman; Oliver R Wearne; Oliver R Wearne; Philip M Chapman; Sui P Heon; Robert M Ewers; Philip M Chapman; Sui P Heon; Robert M Ewers (2022). Downsized camera trap images for automated classification [Dataset]. http://doi.org/10.5281/zenodo.6627707
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Dec 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Danielle L Norman; Danielle L Norman; Oliver R Wearne; Oliver R Wearne; Philip M Chapman; Sui P Heon; Robert M Ewers; Philip M Chapman; Sui P Heon; Robert M Ewers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:

    Downsized (256x256) camera trap images used for the analyses in "Can CNN-based species classification generalise across variation in habitat within a camera trap survey?", and the dataset composition for each analysis. Note that images tagged as 'human' have been removed from this dataset. Full-size images for the BorneoCam dataset will be made available at LILA.science. The full SAFE camera trap dataset metadata is available at DOI: 10.5281/zenodo.6627707.

    Project: This dataset was collected as part of the following SAFE research project: Machine learning and image recognition to monitor spatio-temporal changes in the behaviour and dynamics of species interactions

    Funding: These data were collected as part of research funded by:

    This dataset is released under the CC-BY 4.0 licence, requiring that you cite the dataset in any outputs, but has the additional condition that you acknowledge the contribution of these funders in any outputs.

    XML metadata: GEMINI compliant metadata for this dataset is available here

    Files: This dataset consists of 3 files: CT_image_data_info2.xlsx, DN_256x256_image_files.zip, DN_generalisability_code.zip

    CT_image_data_info2.xlsx

    This file contains dataset metadata and 1 data tables:

    1. Dataset Images (described in worksheet Dataset_images)

      Description: This worksheet details the composition of each dataset used in the analyses

      Number of fields: 69

      Number of data rows: 270287

      Fields:

      • filename: Root ID (Field type: id)
      • camera_trap_site: Site ID for the camera trap location (Field type: location)
      • taxon: Taxon recorded by camera trap (Field type: taxa)
      • dist_level: Level of disturbance at site (Field type: ordered categorical)
      • baseline: Label as to whether image is included in the baseline training, validation (val) or test set, or not included (NA) (Field type: categorical)
      • increased_cap: Label as to whether image is included in the 'increased cap' training, validation (val) or test set, or not included (NA) (Field type: categorical)
      • dist_individ_event_level: Label as to whether image is included in the 'individual disturbance level datasets split at event level' training, validation (val) or test set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_1: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance level 1' training or test set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_2: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance level 2' training or test set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_3: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance level 3' training or test set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance level 4' training or test set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance level 5' training or test set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_1_2: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1 and 2 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_1_3: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1 and 3 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_1_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1 and 4 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_1_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1 and 5 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_2_3: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 2 and 3 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_2_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 2 and 4 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_2_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 2 and 5 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_3_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 3 and 4 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_3_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 3 and 5 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_pair_4_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 4 and 5 (pair)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_1_2_3: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 2 and 3 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_1_2_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 2 and 4 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_1_2_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 2 and 5 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_1_3_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 3 and 4 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_1_3_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 3 and 5 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_1_4_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 4 and 5 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_2_3_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 2, 3 and 4 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_2_3_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 2, 3 and 5 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_2_4_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 2, 4 and 5 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_triple_3_4_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 3, 4 and 5 (triple)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_quad_1_2_3_4: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 2, 3 and 4 (quad)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_quad_1_2_3_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 2, 3 and 5 (quad)' training set, or not included (NA) (Field type: categorical)
      • dist_combined_event_level_quad_1_2_4_5: Label as to whether image is included in the 'disturbance level combination analysis split at event level: disturbance levels 1, 2, 4 and 5 (quad)' training set, or not included (NA) (Field type:

  8. h

    Data from: depression-detection

    • huggingface.co
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian B (2025). depression-detection [Dataset]. https://huggingface.co/datasets/thePixel42/depression-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2025
    Authors
    Cristian B
    Description

    This dataset contains a collection of posts from Reddit. The posts have been collected from 3 subreddits: r/teenagers, r/SuicideWatch, and r/depression. There are 140,000 labeled posts for training and 60,000 labeled posts for testing. Both training and testing datasets have an equal split of labels. This dataset is not mine. The original dataset is on Kaggle: https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch/versions/13

  9. f

    Divide and Recombine Approaches for Fitting Smoothing Spline Models with...

    • tandf.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danqing Xu; Yuedong Wang (2023). Divide and Recombine Approaches for Fitting Smoothing Spline Models with Large Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.5635045.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Danqing Xu; Yuedong Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spline smoothing is a widely used nonparametric method that allows data to speak for themselves. Due to its complexity and flexibility, fitting smoothing spline models is usually computationally intensive which may become prohibitive with large datasets. To overcome memory and CPU limitations, we propose four divide and recombine (D&R) approaches for fitting cubic splines with large datasets. We consider two approaches to divide the data: random and sequential. For each approach of division, we consider two approaches to recombine. These D&R approaches are implemented in parallel without communication. Extensive simulations show that these D&R approaches are scalable and have comparable performance as the method that uses the whole data. The sequential D&R approaches are spatially adaptive which lead to better performance than the method that uses the whole data when the underlying function is spatially inhomogeneous.

  10. MERGE Dataset

    • zenodo.org
    zip
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Lima Louro; Pedro Lima Louro; Hugo Redinho; Hugo Redinho; Ricardo Santos; Ricardo Santos; Ricardo Malheiro; Ricardo Malheiro; Renato Panda; Renato Panda; Rui Pedro Paiva; Rui Pedro Paiva (2025). MERGE Dataset [Dataset]. http://doi.org/10.5281/zenodo.13939205
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pedro Lima Louro; Pedro Lima Louro; Hugo Redinho; Hugo Redinho; Ricardo Santos; Ricardo Santos; Ricardo Malheiro; Ricardo Malheiro; Renato Panda; Renato Panda; Rui Pedro Paiva; Rui Pedro Paiva
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The MERGE dataset is a collection of audio, lyrics, and bimodal datasets for conducting research on Music Emotion Recognition. A complete version is provided for each modality. The audio datasets provide 30-second excerpts for each sample, while full lyrics are provided in the relevant datasets. The amount of available samples in each dataset is the following:

    • MERGE Audio Complete: 3554
    • MERGE Audio Balanced: 3232
    • MERGE Lyrics Complete: 2568
    • MERGE Lyrics Balanced: 2400
    • MERGE Bimodal Complete: 2216
    • MERGE Bimodal Balanced: 2000

    Additional Contents

    Each dataset contains the following additional files:

    • av_values: File containing the arousal and valence values for each sample sorted by their identifier;
    • tvt_dataframes: Train, validate, and test splits for each dataset. Both a 70-15-15 and a 40-30-30 split are provided.

    Metadata

    A metadata spreadsheet is provided for each dataset with the following information for each sample, if available:

    • Song (Audio and Lyrics datasets) - Song identifiers. Identifiers starting with MT were extracted from the AllMusic platform, while those starting with A or L were collected from private collections;
    • Quadrant - Label corresponding to one of the four quadrants from Russell's Circumplex Model;
    • AllMusic Id - For samples starting with A or L, the matching AllMusic identifier is also provided. This was used to complement the available information for the samples originally obtained from the platform;
    • Artist - First performing artist or band;
    • Title - Song title;
    • Relevance - AllMusic metric representing the relevance of the song in relation to the query used;
    • Duration - Song length in seconds;
    • Moods - User-generated mood tags extracted from the AllMusic platform and available in Warriner's affective dictionary;
    • MoodsAll - User-generated mood tags extracted from the AllMusic platform;
    • Genres - User-generated genre tags extracted from the AllMusic platform;
    • Themes - User-generated theme tags extracted from the AllMusic platform;
    • Styles - User-generated style tags extracted from the AllMusic platform;
    • AppearancesTrackIDs - All AllMusic identifiers related with a sample;
    • Sample - Availability of the sample in the AllMusic platform;
    • SampleURL - URL to the 30-second excerpt in AllMusic;
    • ActualYear - Year of song release.

    Citation

    If you use some part of the MERGE dataset in your research, please cite the following article:

    Louro, P. L. and Redinho, H. and Santos, R. and Malheiro, R. and Panda, R. and Paiva, R. P. (2024). MERGE - A Bimodal Dataset For Static Music Emotion Recognition. arxiv. URL: https://arxiv.org/abs/2407.06060.

    BibTeX:

    @misc{louro2024mergebimodaldataset,
    title={MERGE -- A Bimodal Dataset for Static Music Emotion Recognition},
    author={Pedro Lima Louro and Hugo Redinho and Ricardo Santos and Ricardo Malheiro and Renato Panda and Rui Pedro Paiva},
    year={2024},
    eprint={2407.06060},
    archivePrefix={arXiv},
    primaryClass={cs.SD},
    url={https://arxiv.org/abs/2407.06060},
    }

    Acknowledgements

    This work is funded by FCT - Foundation for Science and Technology, I.P., within the scope of the projects: MERGE - DOI: 10.54499/PTDC/CCI-COM/3171/2021 financed with national funds (PIDDAC) via the Portuguese State Budget; and project CISUC - UID/CEC/00326/2020 with funds from the European Social Fund, through the Regional Operational Program Centro 2020.

    Renato Panda was supported by Ci2 - FCT UIDP/05567/2020.

  11. h

    details_CohereLabs_c4ai-command-r-plus-08-2024_private

    • huggingface.co
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sasha Luccioni (2025). details_CohereLabs_c4ai-command-r-plus-08-2024_private [Dataset]. https://huggingface.co/datasets/sasha/details_CohereLabs_c4ai-command-r-plus-08-2024_private
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Sasha Luccioni
    Description

    Dataset Card for Evaluation run of CohereLabs/c4ai-command-r-plus-08-2024

    Dataset automatically created during the evaluation run of model CohereLabs/c4ai-command-r-plus-08-2024. The dataset is composed of 3 configuration, each one corresponding to one of the evaluated task. The dataset has been created from 3 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the… See the full description on the dataset page: https://huggingface.co/datasets/sasha/details_CohereLabs_c4ai-command-r-plus-08-2024_private.

  12. h

    opencritic-split-code-prompt-truth

    • huggingface.co
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcus Cedric R. Idia (2025). opencritic-split-code-prompt-truth [Dataset]. https://huggingface.co/datasets/marcuscedricridia/opencritic-split-code-prompt-truth
    Explore at:
    Dataset updated
    Apr 18, 2025
    Authors
    Marcus Cedric R. Idia
    Description

    marcuscedricridia/opencritic-split-code-prompt-truth dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. T

    imagenet_r

    • tensorflow.org
    Updated Jun 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imagenet_r [Dataset]. https://www.tensorflow.org/datasets/catalog/imagenet_r
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    ImageNet-R is a set of images labelled with ImageNet labels that were obtained by collecting art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes. ImageNet-R has renditions of 200 ImageNet classes resulting in 30,000 images. by collecting new data and keeping only those images that ResNet-50 models fail to correctly classify. For more details please refer to the paper.

    The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:

    • 'image': The image, a (H, W, 3)-tensor.
    • 'label': An integer in the range [0, 1000).
    • 'file_name': A unique sting identifying the example within the dataset.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imagenet_r', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_r-0.2.0.png" alt="Visualization" width="500px">

  14. R

    Replication data for: "Split Decisions: Household Finance When a Policy...

    • dataverse.iza.org
    • dataverse.harvard.edu
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael A. Clemens; Michael A. Clemens; Erwin R. Tiongson; Erwin R. Tiongson (2024). Replication data for: "Split Decisions: Household Finance When a Policy Discontinuity Allocates Overseas Work" [Dataset]. http://doi.org/10.7910/DVN/2DO8QP
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Research Data Center of IZA (IDSC)
    Authors
    Michael A. Clemens; Michael A. Clemens; Erwin R. Tiongson; Erwin R. Tiongson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Clemens, Michael A., and Tiongson, Erwin R., (2017) "Split Decisions: Household Finance When a Policy Discontinuity Allocates Overseas Work." Review of Economics and Statistics 99:3, 531-543.

  15. R

    Equal Divide Esrgan Corn Dataset

    • universe.roboflow.com
    zip
    Updated Nov 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    equal divide Esrgan corn (2022). Equal Divide Esrgan Corn Dataset [Dataset]. https://universe.roboflow.com/equal-divide-esrgan-corn/equal-divide-esrgan-corn/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 14, 2022
    Dataset authored and provided by
    equal divide Esrgan corn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    R Bounding Boxes
    Description

    Equal Divide ESRGAN Corn

    ## Overview
    
    Equal Divide ESRGAN Corn is a dataset for object detection tasks - it contains R annotations for 5,874 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. d

    Dataset and R code: Genetic diversity of lion populations in Kenya:...

    • search.dataone.org
    • datadryad.org
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mumbi Chege (2024). Dataset and R code: Genetic diversity of lion populations in Kenya: evaluating past management practices and recommendations for future conservation actions by Chege M et.al [Dataset]. http://doi.org/10.5061/dryad.s4mw6m9d8
    Explore at:
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Mumbi Chege
    Area covered
    Kenya
    Description

    The decline of lions (Panthera leo) in Kenya has raised conservation concerns on their overall population health and long-term survival. This study aimed to assess the genetic structure, differentiation, and diversity of lion populations in the country, while considering the influence of past management practices. Using a lion-specific Single Nucleotide Polymorphism (SNP) panel, we genotyped 171 individuals from 12 populations representative of areas with permanent lion presence. Our results revealed a distinct genetic pattern with pronounced population structure, confirmed a north-south split, and found no indication of inbreeding in any of the tested populations. Differentiation seems to be primarily driven by geographical barriers, human presence, and climatic factors, but management practices may have also affected the observed patterns. Notably, the Tsavo population displayed evidence of admixture, perhaps attributable to its geographic location as a suture zone, vast size, or to p..., This dataset was obtained from 12 kenyan lion populations. After DNA extraction, SNP genotyping was performed using an allele-specific KASP technique. The attached datasets includes the .txt and .str versions of the autosomal SNPs to aid in reproducing the results.  , , # dataset and r code associated with the publication entitled "Genetic diversity of lion populations in Kenya: evaluating past management practices and recommendations for future conservation actions" by Chege M et.al.

    https://doi.org/10.5061/dryad.s4mw6m9d8

    Â Â Â We provide the following description of the dataset and scripts for analysis carried out in R: We have split the data and scripts for ease of reference i.e.,

     1.) Script 1: titled ‘***Calc_He_Ho_Ar_Fis’***. For calculating the genetic diversity indices i.e. allelic richness (AR), Private alleles (AP), Inbreeding coefficients (FIS), expected (HE) and observed heterozygosity (HO). This script uses:

    • **“data_HoHeAr.txt†** dataset. This dataset has information on individual samples, including their geographical area (population) of origin and the corresponding 335 autosomal single nucleotide polymorphism (SNP) reads.

    • ‘***shompole2.txt’***  this bears the dataset from the Shompol...

  17. c

    Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...

    • s.cnmilf.com
    • data.usgs.gov
    • +2more
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8 Analysis Ready Dataset Raster Images from 2013-2023 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/water-temperature-of-lakes-in-the-conterminous-u-s-using-the-landsat-8-analysis-ready-2013
    Explore at:
    Dataset updated
    Feb 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Contiguous United States
    Description

    This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.

  18. r

    Dataset for Publication "Probing the Temperature of Quantum Dots in a...

    • radar4culture.radar-service.eu
    • search.nfdi4chem.de
    • +2more
    tar
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Knut R. Asmis; Björn Bastian; Sophia C. Leippe (2024). Dataset for Publication "Probing the Temperature of Quantum Dots in a Cryogenic Ion Trap by Fluorescence Spectroscopy" [Dataset]. http://doi.org/10.22000/44bv95gm8ajgc9xf
    Explore at:
    tar(87116800 bytes)Available download formats
    Dataset updated
    Dec 2, 2024
    Dataset provided by
    Leipzig University
    Authors
    Knut R. Asmis; Björn Bastian; Sophia C. Leippe
    Dataset funded by
    Universität Leipzig
    Deutsche Forschungsgemeinschaft
    Hans Böckler Stiftung
    Description

    Dataset with emission spectra of CdSe/CdS core/shell quantum dots (QDs) in a single nanoparticle mass spectrometer at trap temperatures from 50 to 300 K. Additional meta information and analysis scripts are included. From the gas phase emission spectra the temperature of the respective QDs is determined, using calibration data of QDs on a surface. A model for the QD temperature in the gas phase is refined based on the experimental results.

  19. MegaWeeds dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophie Wildeboer; Sophie Wildeboer (2025). MegaWeeds dataset [Dataset]. http://doi.org/10.5281/zenodo.8077195
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sophie Wildeboer; Sophie Wildeboer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MegaWeeds dataset consists of seven existing datasets:

    - WeedCrop dataset; Sudars, K., Jasko, J., Namatevs, I., Ozola, L., & Badaukis, N. (2020). Dataset of annotated food crops and weed images for robotic computer vision control. Data in Brief, 31, 105833. https://doi.org/https://doi.org/10.1016/j.dib.2020.105833

    - Chicory dataset; Gallo, I., Rehman, A. U., Dehkord, R. H., Landro, N., La Grassa, R., & Boschetti, M. (2022). Weed detection by UAV 416a Image Dataset. https://universe.roboflow.com/chicory-crop-weeds-5m7vo/weed-detection-by-uav-416a/dataset/1

    - Sesame dataset; Utsav, P., Raviraj, P., & Rayja, M. (2020). crop and weed detection data with bounding boxes. https://www.kaggle.com/datasets/ravirajsinh45/crop-and-weed-detection-data-with-bounding-boxes

    - Sugar beet dataset; Wangyongkun. (2020). sugarbeetsAndweeds. https://www.kaggle.com/datasets/wangyongkun/sugarbeetsandweeds

    - Weed-Detection-v2; Tandon, K. (2021, June). Weed_Detection_v2. https://www.kaggle.com/datasets/kushagratandon12/weed-detection-v2

    - Maize dataset; Correa, J. M. L., D. Andújar, M. Todeschini, J. Karouta, JM Begochea, & Ribeiro A. (2021). WeedMaize. Zenodo. https://doi.org/10.5281/ZENODO.5106795

    - CottonWeedDet12 dataset; Dang, F., Chen, D., Lu, Y., & Li, Z. (2023). YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Computers and Electronics in Agriculture, 205, 107655. https://doi.org/https://doi.org/10.1016/j.compag.2023.107655

    All the datasets contain open-field images from crops and weeds with annotations. The annotation files were converted to text files so it can be used in the YOLO model. All the datasets were combined into one big dataset with in total 19,317 images. The dataset is split into a training and validation set.

  20. h

    haoranxu_ALMA-13B-R-details

    • huggingface.co
    Updated Mar 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). haoranxu_ALMA-13B-R-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/haoranxu_ALMA-13B-R-details
    Explore at:
    Dataset updated
    Mar 10, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of haoranxu/ALMA-13B-R

    Dataset automatically created during the evaluation run of model haoranxu/ALMA-13B-R The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/haoranxu_ALMA-13B-R-details.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Robert P. Sheridan (2023). Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. [Dataset]. http://doi.org/10.1021/ci400084k.s001

Data from: Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
ACS Publications
Authors
Robert P. Sheridan
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.

Search
Clear search
Close search
Google apps
Main menu