This dataset was created by satya
It contains the following files:
On the Pytorch website, you'll find a bunch of tutorials. This Dataset is the one used for the dataloading one.
It contains some faces images and a CSV file with their respective landmarks points.
Thanks for pytorch to provide comprehensive material to get a grasp about their framework. https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
Play with pytorch or check how much simpler it is to accomplish this kind of task with a different framework.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Music Grounding by Short Video E-commerce (MGSV-EC) Dataset
📄 [Paper] 📦 Feature File 🔧 [PyTorch Dataloader]
📝 Dataset Summary
MGSV-EC is a large-scale dataset for the new task of Music Grounding by Short Video (MGSV), which aims to localize a specific music segment that best serves as the background music (BGM) for a given query short video.Unlike traditional video-to-music retrieval (V2MR), MGSV requires both… See the full description on the dataset page: https://huggingface.co/datasets/xxayt/MGSV-EC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The University of Turin (UniTO) released the open-access dataset Stoke collected for the homonymous Use Case 3 in the DeepHealth project (https://deephealth-project.eu/). UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP). The dataset includes 258 consecutive patients, a subsample of 100 training subjects and 15 testing subjects was used in a submitted publication for the training and the testing of a Convolutional Neural Network (CNN, see for details: https://arxiv.org/abs/2101.05992, https://paperswithcode.com/paper/neural-network-derived-perfusion-maps-a-model, https://www.medrxiv.org/content/10.1101/2021.01.13.21249757v1). The UniTO team released this dataset publicly.CTP data were retrospectively obtained from the hospital PACS of Città della Salute e della Scienza di Torino (Molinette). CTP acquisition parameters were as follows: Scanner GE, 64 slices, 80 kV, 150 mAs, 44.5 sec duration, 89 volumes (40 mm axial coverage), injection of 40 ml of Iodine contrast agent (300 mg/ml) at 4 ml/s speed.Along with the dataset, we provide some utility files.dicomtonpy.py: It converts the dicom files in the dataset to numpy arrays. These are 3D arrays, where CT slices at the same height are piled-up over the temporal acquisition.dataloader_pytorch.py: Dataloader for the pytorch deep learning framework. It converts the numpy arrays in normalized tensors, which can be provided as input to standard deep learning models.dataloader_pyeddl.py: Dataloader for the pyeddl deep learning framework. It converts the numpy arrays in normalized tensors, which can be provided as input to standard deep learning models using the european library EDDL. Visit https://github.com/EIDOSlab/UC3-UNITOBrain to have a full companion code where a U-Net model is trained over the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description InfantMarmosetsVox is a dataset for multi-class call-type and caller identification. It contains audio recordings of different individual marmosets and their call-types. The dataset contains a total of 350 files of precisely labelled 10-minute audio recordings across all caller classes. The audio was recorded from five pairs of infant marmoset twins, each recorded individually in two separate sound-proofed recording rooms at a sampling rate of 44.1 kHz. The start and end time, call-type, and marmoset identity of each vocalization are provided, labeled by an experienced researcher.
References This dataset was collected and partially used for the paper "Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks" by Zhang et al. It is also used for the experiments in the paper "Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?" by E. Sarkar and M. Magimai-Doss. The source code of a PyTorch DataLoader reading this data is available at https://github.com/idiap/ssl-caller-detection.
Citation Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of InfantsMarmosetVox must cite the following publication: Sarkar, E., Magimai.-Doss, M. (2023) Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? Proc. INTERSPEECH 2023, 1189-1193, doi: 10.21437/Interspeech.2023-1968 Bibtex: @inproceedings{sarkar23_interspeech, author={Eklavya Sarkar and Mathew Magimai.-Doss}, title={{Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1189--1193}, doi={10.21437/Interspeech.2023-1968}}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WiFi CSI-based Long-Range Person Localization Using Directional Antennas
This repository contains the HAllway LOCalization (HALOC) dataset and WiFi system CAD files as proposed in [1].
PyTroch Dataloader
A minimal PyTorch dataloader for the HALOC dataset is provided at: https://github.com/StrohmayerJ/HALOC
Dataset Description
The HALOC dataset comprises six sequences (in .csv format) of synchronized WiFi Channel State Information (CSI) and 3D position labels. Each row in a given .csv file represents a single WiFi packet captured via ESP-IDF, with CSI and 3D coordinates stored in the "data" and ("x", "y", "z") fields, respectively.
The sequences are divided into training, validation, and test subsets as follows:
Subset | Sequences |
Training | 0.csv, 1.csv, 2.csv and 3.csv |
Validation | 4.csv |
Test | 5.csv |
WiFi System CAD files
We provide CAD files for the 3D printable parts of the proposed WiFi system consisting of the main housing (housing.stl), the lid (lid.stl), and the carrier board (carrier.stl) featuring mounting points for the Nvidia Jetson Orin Nano and the ESP32-S3-DevKitC-1 module.
Download and Use
This data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].
[1] Strohmayer, J., and Kampel, M. (2024). “WiFi CSI-based Long-Range Person Localization Using Directional Antennas”, The Second Tiny Papers Track at ICLR 2024, May 2024, Vienna, Austria. https://openreview.net/forum?id=AOJFcEh5Eb
BibTeX citation:
@inproceedings{
strohmayer2024wifi,
title={WiFi {CSI}-based Long-Range Person Localization Using Directional Antennas},
author={Julian Strohmayer and Martin Kampel},
booktitle={The Second Tiny Papers Track at ICLR 2024},
year={2024},
url={https://openreview.net/forum?id=AOJFcEh5Eb}
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overall accuracy and macro-averaged class aggregated assessment metrics for landcover.ai [53] classification using ten replicates and different 3,000 chip random data partitions.
Enhancing Financial Market Predictions: Causality-Driven Feature Selection This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset’s extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability.
Our FinSen Dataset
This repository contains the dataset for Enhancing Financial Market Predictions: Causality-Driven Feature Selection, which has been accepted in ADMA 2024.
If the dataset or the paper has been useful in your research, please add a citation to our work:
@article{liang2024enhancing, title={Enhancing Financial Market Predictions: Causality-Driven Feature Selection}, author={Liang, Wenhao and Li, Zhengyang and Chen, Weitong}, journal={arXiv e-prints}, pages={arXiv--2408}, year={2024} }
Datasets [FinSen] can be downloaded manually from the repository as csv file. Sentiment and its score are generated by FinBert model from the Hugging Face Transformers library under the identifier "ProsusAI/finbert". (Araci, Dogu. "Finbert: Financial sentiment analysis with pre-trained language models." arXiv preprint arXiv:1908.10063 (2019).)
We only provide US for research purpose usage, please contact w.liang@adelaide.edu.au for other countries (total 197 included) if necessary.
We also provide other NLP datasets for text classification tasks here, please cite them correspondingly once you used them in your research if any.
20Newsgroups. Joachims, T., et al.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: ICML. vol. 97, pp. 143–151. Citeseer (1997) AG News. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015) Financial PhraseBank. Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology 65(4), 782–796 (2014)
Dataloader for FinSen We provide the preprocessing file finsen.py for our FinSen dataset under dataloaders directory for more convienient usage.
Models - Text Classification
DAN-3.
Gobal Pooling CNN.
Models - Regression Prediction
LSTM
Using Sentiment Score from FinSen Predict Result on S&P500 Dependencies The code is based on PyTorch under code frame of https://github.com/torrvision/focal_calibration, please cite their work if you found it is useful.
:smiley: ☺ Happy Research !
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset was created by satya
It contains the following files: