8 datasets found
  1. mnist_splitter

    • kaggle.com
    zip
    Updated Mar 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    satya (2021). mnist_splitter [Dataset]. https://www.kaggle.com/satyapr/mnist-splitter
    Explore at:
    zip(23336435 bytes)Available download formats
    Dataset updated
    Mar 7, 2021
    Authors
    satya
    Description

    Dataset

    This dataset was created by satya

    Contents

    It contains the following files:

  2. Pytorch - Dataloading tutorial dataset

    • kaggle.com
    Updated Dec 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Faye (2018). Pytorch - Dataloading tutorial dataset [Dataset]. https://www.kaggle.com/maxoumask/pytorch-dataloading-tutorial-dataset/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Maxime Faye
    Description

    Context

    On the Pytorch website, you'll find a bunch of tutorials. This Dataset is the one used for the dataloading one.

    Content

    It contains some faces images and a CSV file with their respective landmarks points.

    Acknowledgements

    Thanks for pytorch to provide comprehensive material to get a grasp about their framework. https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

    Inspiration

    Play with pytorch or check how much simpler it is to accomplish this kind of task with a different framework.

  3. h

    MGSV-EC

    • huggingface.co
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zijie Xin (2025). MGSV-EC [Dataset]. https://huggingface.co/datasets/xxayt/MGSV-EC
    Explore at:
    Dataset updated
    Mar 12, 2025
    Authors
    Zijie Xin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Music Grounding by Short Video E-commerce (MGSV-EC) Dataset

    📄 [Paper] 📦 Feature File 🔧 [PyTorch Dataloader]

      📝 Dataset Summary
    

    MGSV-EC is a large-scale dataset for the new task of Music Grounding by Short Video (MGSV), which aims to localize a specific music segment that best serves as the background music (BGM) for a given query short video.Unlike traditional video-to-music retrieval (V2MR), MGSV requires both… See the full description on the dataset page: https://huggingface.co/datasets/xxayt/MGSV-EC.

  4. i

    UniTOBrain

    • ieee-dataport.org
    • zenodo.org
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umberto Gava (2021). UniTOBrain [Dataset]. http://doi.org/10.21227/x8ea-vh16
    Explore at:
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    IEEE Dataport
    Authors
    Umberto Gava
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The University of Turin (UniTO) released the open-access dataset Stoke collected for the homonymous Use Case 3 in the DeepHealth project (https://deephealth-project.eu/). UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP). The dataset includes 258 consecutive patients, a subsample of 100 training subjects and 15 testing subjects was used in a submitted publication for the training and the testing of a Convolutional Neural Network (CNN, see for details: https://arxiv.org/abs/2101.05992, https://paperswithcode.com/paper/neural-network-derived-perfusion-maps-a-model, https://www.medrxiv.org/content/10.1101/2021.01.13.21249757v1). The UniTO team released this dataset publicly.CTP data were retrospectively obtained from the hospital PACS of Città della Salute e della Scienza di Torino (Molinette). CTP acquisition parameters were as follows: Scanner GE, 64 slices, 80 kV, 150 mAs, 44.5 sec duration, 89 volumes (40 mm axial coverage), injection of 40 ml of Iodine contrast agent (300 mg/ml) at 4 ml/s speed.Along with the dataset, we provide some utility files.dicomtonpy.py: It converts the dicom files in the dataset to numpy arrays. These are 3D arrays, where CT slices at the same height are piled-up over the temporal acquisition.dataloader_pytorch.py: Dataloader for the pytorch deep learning framework. It converts the numpy arrays in normalized tensors, which can be provided as input to standard deep learning models.dataloader_pyeddl.py: Dataloader for the pyeddl deep learning framework. It converts the numpy arrays in normalized tensors, which can be provided as input to standard deep learning models using the european library EDDL. Visit https://github.com/EIDOSlab/UC3-UNITOBrain to have a full companion code where a U-Net model is trained over the dataset.

  5. Z

    InfantMarmosetsVox

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Magimai Doss, Mathew (2023). InfantMarmosetsVox [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10130104
    Explore at:
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Magimai Doss, Mathew
    Sarkar, Eklavya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description InfantMarmosetsVox is a dataset for multi-class call-type and caller identification. It contains audio recordings of different individual marmosets and their call-types. The dataset contains a total of 350 files of precisely labelled 10-minute audio recordings across all caller classes. The audio was recorded from five pairs of infant marmoset twins, each recorded individually in two separate sound-proofed recording rooms at a sampling rate of 44.1 kHz. The start and end time, call-type, and marmoset identity of each vocalization are provided, labeled by an experienced researcher.

    References This dataset was collected and partially used for the paper "Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks" by Zhang et al. It is also used for the experiments in the paper "Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?" by E. Sarkar and M. Magimai-Doss. The source code of a PyTorch DataLoader reading this data is available at https://github.com/idiap/ssl-caller-detection.

    Citation Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of InfantsMarmosetVox must cite the following publication: Sarkar, E., Magimai.-Doss, M. (2023) Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? Proc. INTERSPEECH 2023, 1189-1193, doi: 10.21437/Interspeech.2023-1968 Bibtex: @inproceedings{sarkar23_interspeech, author={Eklavya Sarkar and Mathew Magimai.-Doss}, title={{Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1189--1193}, doi={10.21437/Interspeech.2023-1968}}

  6. HALOC Dataset | WiFi CSI-based Long-Range Person Localization Using...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Strohmayer; Julian Strohmayer; Martin Kampel; Martin Kampel (2024). HALOC Dataset | WiFi CSI-based Long-Range Person Localization Using Directional Antennas [Dataset]. http://doi.org/10.5281/zenodo.10715595
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julian Strohmayer; Julian Strohmayer; Martin Kampel; Martin Kampel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 5, 2024
    Description

    WiFi CSI-based Long-Range Person Localization Using Directional Antennas

    This repository contains the HAllway LOCalization (HALOC) dataset and WiFi system CAD files as proposed in [1].

    PyTroch Dataloader

    A minimal PyTorch dataloader for the HALOC dataset is provided at: https://github.com/StrohmayerJ/HALOC

    Dataset Description

    The HALOC dataset comprises six sequences (in .csv format) of synchronized WiFi Channel State Information (CSI) and 3D position labels. Each row in a given .csv file represents a single WiFi packet captured via ESP-IDF, with CSI and 3D coordinates stored in the "data" and ("x", "y", "z") fields, respectively.

    The sequences are divided into training, validation, and test subsets as follows:

    SubsetSequences
    Training0.csv, 1.csv, 2.csv and 3.csv
    Validation4.csv
    Test5.csv

    WiFi System CAD files

    We provide CAD files for the 3D printable parts of the proposed WiFi system consisting of the main housing (housing.stl), the lid (lid.stl), and the carrier board (carrier.stl) featuring mounting points for the Nvidia Jetson Orin Nano and the ESP32-S3-DevKitC-1 module.

    Download and Use
    This data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].

    [1] Strohmayer, J., and Kampel, M. (2024). “WiFi CSI-based Long-Range Person Localization Using Directional Antennas”, The Second Tiny Papers Track at ICLR 2024, May 2024, Vienna, Austria. https://openreview.net/forum?id=AOJFcEh5Eb

    BibTeX citation:

    @inproceedings{
    strohmayer2024wifi,
    title={WiFi {CSI}-based Long-Range Person Localization Using Directional Antennas},
    author={Julian Strohmayer and Martin Kampel},
    booktitle={The Second Tiny Papers Track at ICLR 2024},
    year={2024},
    url={https://openreview.net/forum?id=AOJFcEh5Eb}
    }
  7. f

    Overall accuracy and macro-averaged class aggregated assessment metrics for...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron E. Maxwell; Sarah Farhadpour; Srinjoy Das; Yalin Yang (2024). Overall accuracy and macro-averaged class aggregated assessment metrics for landcover.ai [53] classification using ten replicates and different 3,000 chip random data partitions. [Dataset]. http://doi.org/10.1371/journal.pone.0315127.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Aaron E. Maxwell; Sarah Farhadpour; Srinjoy Das; Yalin Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overall accuracy and macro-averaged class aggregated assessment metrics for landcover.ai [53] classification using ten replicates and different 3,000 chip random data partitions.

  8. P

    FinSen Dataset

    • paperswithcode.com
    Updated Aug 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). FinSen Dataset [Dataset]. https://paperswithcode.com/dataset/finsen
    Explore at:
    Dataset updated
    Aug 1, 2024
    Description

    Enhancing Financial Market Predictions: Causality-Driven Feature Selection This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset’s extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability.

    Our FinSen Dataset

    This repository contains the dataset for Enhancing Financial Market Predictions: Causality-Driven Feature Selection, which has been accepted in ADMA 2024.

    If the dataset or the paper has been useful in your research, please add a citation to our work:

    @article{liang2024enhancing, title={Enhancing Financial Market Predictions: Causality-Driven Feature Selection}, author={Liang, Wenhao and Li, Zhengyang and Chen, Weitong}, journal={arXiv e-prints}, pages={arXiv--2408}, year={2024} }

    Datasets [FinSen] can be downloaded manually from the repository as csv file. Sentiment and its score are generated by FinBert model from the Hugging Face Transformers library under the identifier "ProsusAI/finbert". (Araci, Dogu. "Finbert: Financial sentiment analysis with pre-trained language models." arXiv preprint arXiv:1908.10063 (2019).)

    We only provide US for research purpose usage, please contact w.liang@adelaide.edu.au for other countries (total 197 included) if necessary.

    We also provide other NLP datasets for text classification tasks here, please cite them correspondingly once you used them in your research if any.

    20Newsgroups. Joachims, T., et al.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: ICML. vol. 97, pp. 143–151. Citeseer (1997) AG News. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015) Financial PhraseBank. Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology 65(4), 782–796 (2014)

    Dataloader for FinSen We provide the preprocessing file finsen.py for our FinSen dataset under dataloaders directory for more convienient usage.

    Models - Text Classification

    DAN-3.

    Gobal Pooling CNN.

    Models - Regression Prediction

    LSTM

    Using Sentiment Score from FinSen Predict Result on S&P500 Dependencies The code is based on PyTorch under code frame of https://github.com/torrvision/focal_calibration, please cite their work if you found it is useful.

    :smiley: ☺ Happy Research !

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
satya (2021). mnist_splitter [Dataset]. https://www.kaggle.com/satyapr/mnist-splitter
Organization logo

mnist_splitter

Randomly splitted mnist dataset and inform of pytorch dataloader

Explore at:
zip(23336435 bytes)Available download formats
Dataset updated
Mar 7, 2021
Authors
satya
Description

Dataset

This dataset was created by satya

Contents

It contains the following files:

Search
Clear search
Close search
Google apps
Main menu