49 datasets found

Dataset: Gold standard dataset for explainability need detection in app...
zenodo.org
zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Obaidi; Martin Obaidi (2025). Dataset: Gold standard dataset for explainability need detection in app reviews. [Dataset]. http://doi.org/10.5281/zenodo.13273192
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13273192
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Obaidi; Martin Obaidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We crawled 90,000 app reviews from both Google Play Store and Apple App Store, including reviews from both free and paid apps. These reviews were filtered for explainability needs, and after this process, 4,495 reviews remained. Among them, 2,185 reviews indicated an explanation need, while 2,310 did not. This resulting gold standard dataset was used to train and evaluate several machine learning models and rule-based approaches for detecting explanation needs in app reviews.

The dataset includes both balanced and unbalanced evaluation sets, as well as the original crawled data from October 2023. In addition to machine learning approaches, rule-based methods optimized for F1 score, precision, and recall are also included.

We provide several pre-trained machine learning models (including BERT, SetFit, AdaBoost, K-Nearest Neighbor, Logistic Regression, Naive Bayes, Random Forest, and SVM) along with training scripts and evaluation notebooks. These models can be applied directly or retrained using the included datasets.

For further details on the structure and usage of the dataset, please refer to the README.md file within the provided ZIP archive.
Gold Price Prediction using Machine Learning
kaggle.com
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subho117 (2024). Gold Price Prediction using Machine Learning [Dataset]. https://www.kaggle.com/datasets/subho117/gold-price-prediction-using-machine-learning/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Subho117
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Subho117

Released under MIT

Contents
Z
GeoEDdA: A Gold Standard Dataset for Named Entity Recognition and Span...
data.niaid.nih.gov
zenodo.org
Updated Mar 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moncla, Ludovic (2024). GeoEDdA: A Gold Standard Dataset for Named Entity Recognition and Span Categorization Annotations of Diderot & d'Alembert's Encyclopédie [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10530177
Explore at:
Dataset updated
Mar 20, 2024
Dataset provided by
Vigier, Denis
Moncla, Ludovic
McDonough, Katherine
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This repository contains a gold standard dataset for named entity recognition and span categorization annotations from Diderot & d’Alembert’s Encyclopédie entries.

The dataset is available in the following formats:

JSONL format provided by Prodigy

binary spaCy format (ready to use with the spaCy train pipeline)

The Gold Standard dataset is composed of 2,200 paragraphs out of 2,001 Encyclopédie's entries randomly selected. All paragraphs were written in 19th-century French.

The spans/entities were labeled by the project team along with using pre-labelling with early machine learning models to speed up the labelling process. A train/val/test split was used. Validation and test sets are composed of 200 paragraphs each: 100 classified under 'Géographie' and 100 from another knowledge domain. The datasets have the following breakdown of tokens and spans/entities.

Tagset

NC-Spatial: a common noun that identifies a spatial entity (nominal spatial entity) including natural features, e.g. ville, la rivière, royaume.

NP-Spatial: a proper noun identifying the name of a place (spatial named entities), e.g. France, Paris, la Chine.

ENE-Spatial: nested spatial entity , e.g. ville de France , royaume de Naples, la mer Baltique.

Relation: spatial relation, e.g. dans, sur, à 10 lieues de.

Latlong: geographic coordinates, e.g. Long. 19. 49. lat. 43. 55. 44.

NC-Person: a common noun that identifies a person (nominal spatial entity), e.g. roi, l'empereur, les auteurs.

NP-Person: a proper noun identifying the name of a person (person named entities), e.g. Louis XIV, Pline.

ENE-Person: nested people entity, e.g. le czar Pierre, roi de Macédoine.

NP-Misc: a proper noun identifying entities not classified as spatial or person, e.g. l'Eglise, 1702, Pélasgique

ENE-Misc: nested named entity not classified as spatial or person, e.g. l'ordre de S. Jacques, la déclaration du 21 Mars 1671.

Head: entry name

Domain-Mark: words indicating the knowledge domain (usually after the head and between parenthesis), e.g. Géographie, Geog., en Anatomie.

HuggingFace

The GeoEDdA dataset is available on the HuggingFace Hub: https://huggingface.co/datasets/GEODE/GeoEDdA

spaCy Custom Spancat trained on Diderot & d’Alembert’s Encyclopédie entries

This dataset was used to train and evaluate a custom spancat model for French using spaCy. The model is available on HuggingFace's model hub: https://huggingface.co/GEODE/fr_spacy_custom_spancat_edda.

Acknowledgement

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR). Data courtesy the ARTFL Encyclopédie Project, University of Chicago.
Machine Learning Models for Gold Price Prediction (Forecast)
kappasignal.com
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2023). Machine Learning Models for Gold Price Prediction (Forecast) [Dataset]. https://www.kappasignal.com/2023/12/machine-learning-models-for-gold-price.html
Explore at:
Dataset updated
Dec 19, 2023
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Machine Learning Models for Gold Price Prediction

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
Fruits-360 dataset
kaggle.com
paperswithcode.com
+1more
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihai Oltean (2025). Fruits-360 dataset [Dataset]. https://www.kaggle.com/datasets/moltean/fruits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mihai Oltean
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Forest, Pecan), Onion (Red, White), Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Forelle, Kaiser, Monster, Red, Stone, Williams), Pepper (Red, Green, Orange, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pistachio, Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Potato (Red, Sweet, White), Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow, not ripened, Heart), Walnut, Watermelon, Zucchini (green and dark).

Branches

The dataset has 5 major branches:

-The 100x100 branch, where all images have 100x100 pixels. See _fruits-360_100x100_ folder.

-The original-size branch, where all images are at their original (captured) size. See _fruits-360_original-size_ folder.

-The meta branch, which contains additional information about the objects in the Fruits-360 dataset. See _fruits-360_dataset_meta_ folder.

-The multi branch, which contains images with multiple fruits, vegetables, nuts and seeds. These images are not labeled. See _fruits-360_multi_ folder.

-The _3_body_problem_ branch where the Training and Test folders contain different (varieties of) the 3 fruits and vegetables (Apples, Cherries and Tomatoes). See _fruits-360_3-body-problem_ folder.

How to cite

Mihai Oltean, Fruits-360 dataset, 2017-

Dataset properties

For the 100x100 branch

Total number of images: 138704.

Training set size: 103993 images.

Test set size: 34711 images.

Number of classes: 206 (fruits, vegetables, nuts and seeds).

Image size: 100x100 pixels.

For the original-size branch

Total number of images: 58363.

Training set size: 29222 images.

Validation set size: 14614 images

Test set size: 14527 images.

Number of classes: 90 (fruits, vegetables, nuts and seeds).

Image size: various (original, captured, size) pixels.

For the 3-body-problem branch

Total number of images: 47033.

Training set size: 34800 images.

Test set size: 12233 images.

Number of classes: 3 (Apples, Cherries, Tomatoes).

Number of varieties: Apples = 29; Cherries = 12; Tomatoes = 19.

Image size: 100x100 pixels.

For the meta branch

Number of classes: 26 (fruits, vegetables, nuts and seeds).

For the multi branch

Number of images: 150.

Filename format:

For the 100x100 branch

image_index_100.jpg (e.g. 31_100.jpg) or

r_image_index_100.jpg (e.g. r_31_100.jpg) or

r?_image_index_100.jpg (e.g. r2_31_100.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).

Different varieties of the same fruit (apple, for instance) are stored as belonging to different classes.

For the original-size branch

r?_image_index.jpg (e.g. r2_31.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis.

The name of the image files in the new version does NOT contain the "_100" suffix anymore. This will help you to make the distinction between the original-size branch and the 100x100 branch.

For the multi branch

The file's name is the concatenation of the names of the fruits inside that picture.

Alternate download

The Fruits-360 dataset can be downloaded from:

Kaggle https://www.kaggle.com/moltean/fruits

GitHub https://github.com/fruits-360

How fruits were filmed

Fruits and vegetables were planted in the shaft of a low-speed motor (3 rpm) and a short movie of 20 seconds was recorded.

A Logitech C920 camera was used for filming the fruits. This is one of the best webcams available.

Behind the fruits, we placed a white sheet of paper as a background.

Here i...
a
Data from: Mineral prospectivity mapping using machine learning techniques...
hub.arcgis.com
Updated Sep 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MetalEarth (2023). Mineral prospectivity mapping using machine learning techniques for gold exploration in the Larder Lake area, Ontario, Canada [Dataset]. https://hub.arcgis.com/documents/1be05051de7c498c97e7cb267076b435
Explore at:
Dataset updated
Sep 21, 2023
Dataset authored and provided by
MetalEarth
Area covered
Canada, Ontario, Larder Lake
Description
A mineral prospectivity map (MPM) focusing on gold mineralization in the Larder Lake region of Northern Ontario, Canada, has been produced in this study. We have used the Random Forest (RF) algorithm to use 32 predictor maps integrating geophysical, geochemical, and geological datasets from various sources that represent vectors to gold mineralization. It is evident from the efficiency of classification curves that MPMs generated are robust. The unsupervised algorithms, K-means and principal component analysis (PCA) were used to investigate and visualize the clustering nature of large geochemical and geophysical datasets. We used RQ-mode PCA to compute variable and object loadings simultaneously, which allows the displays of observations and the variables at the same scale. PCA biplots of the Larder Lake geochemical data show that Au is strongly correlated with W, S, Pb and K, but inversely correlated with Fe, Mn, Co, Mg, Ca, and Ni. The known gold mineralization locations were well classified by RF with the accuracy of 95.63 %. Furthermore, partial least squares-discriminant analysis (PLS-DA) model combines 3D geophysical clusters and geochemical compositions, which indicates the Au-rich areas are characterized with low to mid resistivity – low susceptibility properties. We conclude that the Larder Lake-Cadillac deformation zone (LLCDZ) is relatively more fertile than the Lincoln-Nipissing shear zone (LNSZ) with respect to gold mineralization due to deeper penetrating faults. The intersection of the LLCDZ and network of high-angle NE-trending cross faults acts as key conduits for gold endowments in the Larder Lake area. This study innovatively combined multivariate geological, geochemical, and geophysical datasets via machine learning algorithms, which improves identification of geochemical anomalies and interpretation of spatial features associated with gold mineralization.
e
Relationship and Entity Extraction Evaluation Dataset (Entities)
data.europa.eu
data.wu.ac.at
json
Updated Oct 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Defence Science and Technology Laboratory (2021). Relationship and Entity Extraction Evaluation Dataset (Entities) [Dataset]. https://data.europa.eu/data/datasets/relationship-and-entity-extraction-evaluation-dataset-entities
Explore at:
jsonAvailable download formats
Dataset updated
Oct 30, 2021
Dataset authored and provided by
Defence Science and Technology Laboratory
Description
This entities dataset was the output of a project aimed to create a 'gold standard' dataset that could be used to train and validate machine learning approaches to natural language processing (NLP). The project was carried out by Aleph Insights and Committed Software on behalf of the Defence Science and Technology Laboratory (Dstl). The data set specifically focusing on entity and relationship extraction relevant to somebody operating in the role of a defence and security intelligence analyst. The dataset was therefore constructed using documents and structured schemas that were relevant to the defence and security analysis domain. A number of data subsets were produced (this is the BBC Online data subset). Further information about this data subset (BBC Online) and the others produced (together with licence conditions, attribution and schemas) many be found at the main project GitHub repository webpage (https://github.com/dstl/re3d). Note that the 'entities.json' file is to be used together with the 'documents.json' and 'relations.json' files (also found on this data.gov.uk webpage and their structures and relationship described on the given GitHub webpage.
Gold VTKEL 300 documents dataset
figshare.com
txt
Updated Sep 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahi Dost (2020). Gold VTKEL 300 documents dataset [Dataset]. http://doi.org/10.6084/m9.figshare.9815387.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9815387.v2
Dataset updated
Sep 9, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Shahi Dost
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
(Updated version after fixed some bugs from previous version)Manually corrected gold standard dataset for 300 documents of VTKEL.
h
daily-historical-stock-price-data-for-alamos-gold-inc-20032025
huggingface.co
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khaled Ben Ali (2025). daily-historical-stock-price-data-for-alamos-gold-inc-20032025 [Dataset]. https://huggingface.co/datasets/khaledxbenali/daily-historical-stock-price-data-for-alamos-gold-inc-20032025
Explore at:
Dataset updated
Mar 20, 2025
Authors
Khaled Ben Ali
Description
📈 Daily Historical Stock Price Data for Alamos Gold Inc. (2003–2025)

A clean, ready-to-use dataset containing daily stock prices for Alamos Gold Inc. from 2003-05-02 to 2025-05-28. This dataset is ideal for use in financial analysis, algorithmic trading, machine learning, and academic research.

🗂️ Dataset Overview

Company: Alamos Gold Inc. Ticker Symbol: AGI Date Range: 2003-05-02 to 2025-05-28 Frequency: Daily Total Records: 5554 rows (one per trading day)… See the full description on the dataset page: https://huggingface.co/datasets/khaledxbenali/daily-historical-stock-price-data-for-alamos-gold-inc-20032025.
BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning...
osti.gov
data.openei.org
+1more
Updated Dec 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOE Open Energy Data Initiative (OEDI) (2022). BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/2329316
Explore at:
Unique identifier
https://doi.org/10.25984/2329316
Dataset updated
Dec 30, 2022
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Sciencehttp://www.er.doe.gov/
National Renewable Energy Laboratory (NREL), Golden, CO (United States)
DOE Open Energy Data Initiative (OEDI)
Description
The BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning Dataset adds node-level energy consumption data from watt-meters to the primary sweep of the BUTTER - Empirical Deep Learning Dataset. This dataset contains energy consumption and performance data from 63,527 individual experimental runs spanning 30,582 distinct configurations: 13 datasets, 20 sizes (number of trainable parameters), 8 network "shapes", and 14 depths on both CPU and GPU hardware collected using node-level watt-meters. This dataset reveals the complex relationship between dataset size, network structure, and energy use, and highlights the impact of cache effects. BUTTER-E is intended to be joined with the BUTTER dataset (see "BUTTER - Empirical Deep Learning Dataset on OEDI" resource below) which characterizes the performance of 483k distinct fully connected neural networks but does not include energy measurements.
ABC Gold standard dataset for 300 documents
figshare.com
txt
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abc Abc (2019). ABC Gold standard dataset for 300 documents [Dataset]. http://doi.org/10.6084/m9.figshare.9913886.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9913886.v1
Dataset updated
Sep 27, 2019
Dataset provided by
figshare
Authors
Abc Abc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Manually corrected gold standard dataset for 300 documents of ABC.
Data from: Dataset of Au atomic structures for training Machine Learning...
zenodo.org
repository.uantwerpen.be
bz2
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vlahovic Jovana; Vlahovic Jovana; Cem Sevik; Cem Sevik; Milorad Milosevic; Milorad Milosevic (2025). Dataset of Au atomic structures for training Machine Learning Interatomic Potentials [Dataset]. http://doi.org/10.5281/zenodo.15366677
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.15366677
Dataset updated
May 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vlahovic Jovana; Vlahovic Jovana; Cem Sevik; Cem Sevik; Milorad Milosevic; Milorad Milosevic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains atomic structures of gold (Au) generated using density functional theory (DFT) calculations performed with the VASP package [1, 2]. The calculations were carried out using the projector-augmented wave (PAW) [3, 4] method and Perdew–Burke–Ernzerhof (PBE) pseudopotentials for gold, within the generalised gradient approximation (GGA) [5] for the exchange-correlation functional.

Molecular dynamics simulations were performed for at least 500 steps per structure. For bulk systems, the temperature range spans from 100 K to 1500 K, while for nanoparticles and slab structures, it extends from 100 K to 1000 K. For training our machine learning model (GAP [6]), the first 200 steps of each molecular dynamics trajectory were discarded to allow the thermostat to equilibrate the system to the target temperature. Using this dataset, we trained a GAP model for Au nanoparticles, whose applicability extends beyond the nanoparticle sizes included in the training set.

The dataset includes these starting configurations:

Small, low-energy Au nanoparticles (3 to 55 atoms)

Bulk Au in fcc, bcc, hcp, and simple cubic (sc) crystal structures

Slab models of fcc surfaces

The initial low-energy nanoparticle structures were adopted from a dataset reported in the literature [7].

For each structure, we provide atomic coordinates along with corresponding total energies and per-atom forces. This dataset is suitable for training and validating machine learning interatomic potentials for gold.

References

[1] Kresse and Furthmüller. "Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set." Physical Review B 54.16 (1996): 11169.
[2] https://www.vasp.at/
[3] Blöchl. "Projector augmented-wave method." Physical Review B 50.24 (1994): 17953.
[4] Kresse and Joubert. "From ultrasoft pseudopotentials to the projector augmented-wave method." Physical Review B 59.3 (1999): 1758.
[5] Perdew, John P., Kieron Burke, and Matthias Ernzerhof. "Generalized gradient approximation made simple." Physical Review Letters 77.18 (1996): 3865
[6] Bartók, Payne, et al. "Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons." Physical Review Letters 104.13 (2010): 136403.
[7] Manna, Sukriti, et al. "A database of low-energy atomically precise nanoclusters." Scientific Data 10.1 (2023): 308.
A
‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sentiment-analysis-of-commodity-news-gold-732f/e3232de2/?iid=002-045&v=presentation
Explore at:
Dataset updated
Sep 27, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Sentiment Analysis of Commodity News (Gold)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankurzing/sentiment-analysis-in-commodity-market-gold on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This is a news dataset for the commodity market where we have manually annotated 11,412 news headlines across multiple dimensions into various classes. The dataset has been sampled from a period of 20+ years (2000-2021).

Content

The dataset has been collected from various news sources and annotated by three human annotators who were subject experts. Each news headline was evaluated on various dimensions, for instance - if a headline is a price related news then what is the direction of price movements it is talking about; whether the news headline is talking about the past or future; whether the news item is talking about asset comparison; etc.

Acknowledgements

Sinha, Ankur, and Tanmay Khandait. "Impact of News on the Commodity Market: Dataset and Results." In Future of Information and Communication Conference, pp. 589-601. Springer, Cham, 2021.

https://arxiv.org/abs/2009.04202 Sinha, Ankur, and Tanmay Khandait. "Impact of News on the Commodity Market: Dataset and Results." arXiv preprint arXiv:2009.04202 (2020)

We would like to acknowledge the financial support provided by the India Gold Policy Centre (IGPC).

Inspiration

Commodity prices are known to be quite volatile. Machine learning models that understand the commodity news well, will be able to provide an additional input to the short-term and long-term price forecasting models. The dataset will also be useful in creating news-based indicators for commodities.

Apart from researchers and practitioners working in the area of news analytics for commodities, the dataset will also be useful for researchers looking to evaluate their models on classification problems in the context of text-analytics. Some of the classes in the dataset are highly imbalanced and may pose challenges to the machine learning algorithms.

--- Original source retains full ownership of the source dataset ---
f
Dataset of automatically extracted sizes and morphologies of AuNPs from...
figshare.com
txt
Updated Nov 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshay Subramanian; Kevin Cruse; Amalie Trewartha; Xingzhi Wang; Paul Alivisatos; Gerbrand Ceder (2021). Dataset of automatically extracted sizes and morphologies of AuNPs from literature-mined SEM/TEM images [Dataset]. http://doi.org/10.6084/m9.figshare.17019836.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17019836.v2
Dataset updated
Nov 18, 2021
Dataset provided by
figshare
Authors
Akshay Subramanian; Kevin Cruse; Amalie Trewartha; Xingzhi Wang; Paul Alivisatos; Gerbrand Ceder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of automatically extracted sizes and morphologies of Gold nanoparticles, which have been obtained from the analysis of 4365 literature-mined SEM/TEM images. The dataset contains 4365 records, each of which contains extracted size/morphology information and metadata corresponding to a single microscopy image.
B
Gold Standard Snapshot Serengeti Bounding Box Coordinates
borealisdata.ca
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Schneider; Stefan Kremer; Graham Taylor (2025). Gold Standard Snapshot Serengeti Bounding Box Coordinates [Dataset]. http://doi.org/10.5683/SP/TPB5ID
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP/TPB5ID
Dataset updated
Jan 7, 2025
Dataset provided by
Borealis
Authors
Stefan Schneider; Stefan Kremer; Graham Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2010 - 2016
Area covered
Serengeti, Africa
Description
To contribute to the terrific work done by the Snapshot Serengeti community to provide bounding box coordinates for the Gold Standard Snapshot Serengeti dataset for the purpose of training deep learning object detectors to detect, localize, and classify species from camera trap images.
f
Additional file 3 of OffsampleAI: artificial intelligence approach to...
springernature.figshare.com
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katja Ovchinnikova; Vitaly Kovalev; Lachlan Stuart; Theodore Alexandrov (2023). Additional file 3 of OffsampleAI: artificial intelligence approach to recognize off-sample mass spectrometry images [Dataset]. http://doi.org/10.6084/m9.figshare.12082305.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12082305.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Katja Ovchinnikova; Vitaly Kovalev; Lachlan Stuart; Theodore Alexandrov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3 : Supplementary Data D1-D5: D1: “Supplementary methods and results.pdf”. D2: “Interactive tagging of ion images using web app.mov”, video of a tagger using the TagOff web app. D3: “Gold standard datasets.csv”, metadata of 87 public datasets from METASPACE selected for the gold standard. D4: “DHB matrix clusters frequencies.csv”, results of annotation of 31 gold standard datasets acquired using the MALDI DHB matrix and positive ion mode and off-sample recognition for DHB matrix clusters generated according to a combinatorial model. D5: “DESI offsample ions frequencies.csv”, a file showing for each molecular formula the number of DESI imaging datasets from the gold standard where ions with such molecular formula were classified as off-sample.
h
daily-historical-stock-price-data-for-americas-gold-and-silver-corporation-20032025...
huggingface.co
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khaled Ben Ali (2025). daily-historical-stock-price-data-for-americas-gold-and-silver-corporation-20032025 [Dataset]. https://huggingface.co/datasets/khaledxbenali/daily-historical-stock-price-data-for-americas-gold-and-silver-corporation-20032025
Explore at:
Dataset updated
Mar 20, 2025
Authors
Khaled Ben Ali
Description
📈 Daily Historical Stock Price Data for Americas Gold and Silver Corporation (2003–2025)

A clean, ready-to-use dataset containing daily stock prices for Americas Gold and Silver Corporation from 2003-10-27 to 2025-05-28. This dataset is ideal for use in financial analysis, algorithmic trading, machine learning, and academic research.

🗂️ Dataset Overview

Company: Americas Gold and Silver Corporation Ticker Symbol: USAS Date Range: 2003-10-27 to 2025-05-28 Frequency:… See the full description on the dataset page: https://huggingface.co/datasets/khaledxbenali/daily-historical-stock-price-data-for-americas-gold-and-silver-corporation-20032025.
Data from: DECM Machine Learning Training Corpus
figshare.com
produccioncientifica.ucm.es
+1more
bin
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patricia Murrieta-Flores; Mariana Favila-Vázquez; Raquel Liceras-Garrido (2023). DECM Machine Learning Training Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.12366734.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12366734.v3
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Patricia Murrieta-Flores; Mariana Favila-Vázquez; Raquel Liceras-Garrido
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This version contains a sample of the RGs manually annotated by multiple researchers with the software of our industry partner, Tagtog. This corpus has been used to carry out the NLP and ML experiments and the files are available in JSON and TSV format. These files are composed by texts and annotations. This is also accompanied by the DECM ontology which provides an explanation of the entities and labels produced. This corpus can be used for further experimentation with Artificial Intelligence methods.
D
Data from: Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse...
ckan.grassroots.tools
pdf
Updated Sep 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rothamsted Research (2022). Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods [Dataset]. https://ckan.grassroots.tools/ar/dataset/fc628bb2-24cb-46ca-8a04-466c605c72d4
Explore at:
pdfAvailable download formats
Dataset updated
Sep 15, 2022
Dataset provided by
Rothamsted Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
jats:pThe detection of wheat heads in plant images is an important task for estimating pertinent wheat traits including head population density and head characteristics such as health, size, maturity stage, and the presence of awns. Several studies have developed methods for wheat head detection from high-resolution RGB imagery based on machine learning algorithms. However, these methods have generally been calibrated and validated on limited datasets. High variability in observational conditions, genotypic differences, development stages, and head orientation makes wheat head detection a challenge for computer vision. Further, possible blurring due to motion or wind and overlap between heads for dense populations make this task even more complex. Through a joint international collaborative effort, we have built a large, diverse, and well-labelled dataset of wheat images, called the Global Wheat Head Detection (GWHD) dataset. It contains 4700 high-resolution RGB images and 190000 labelled wheat heads collected from several countries around the world at different growth stages with a wide range of genotypes. Guidelines for image acquisition, associating minimum metadata to respect FAIR principles, and consistent head labelling methods are proposed when developing new head detection datasets. The GWHD dataset is publicly available at
o
Data from: CS4984/CS5984: Big Data Text Summarization Team 17 ETDs
explore.openaire.eu
Updated Dec 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farnaz Khaghani; Ashin Marin Thomas; Chinmaya Patnayak; Dhruv Sharma; John Aromando (2018). CS4984/CS5984: Big Data Text Summarization Team 17 ETDs [Dataset]. https://explore.openaire.eu/search/dataset?pid=10919%2F86420
Explore at:
Dataset updated
Dec 15, 2018
Authors
Farnaz Khaghani; Ashin Marin Thomas; Chinmaya Patnayak; Dhruv Sharma; John Aromando
Description
Given the current explosion of information over various media such as electronic and physical texts, concise and relevant data has become key to the understanding of things. Summarization, which essentially is the process of reducing the text to convey only the salient aspects, has emerged as a challenging task in the field of Natural Language Processing. In a scientific construct, academia has been generating voluminous amounts of data in the form of theses and dissertations. Obtaining the chapter-wise summary of an electronic thesis or dissertation can be a computationally expensive task, particularly because of its length and the subject to which it pertains to. Through this course, research and development of various summarization techniques, primarily extractive and abstractive summarization, were analyzed. There have been various developments in the field of deep learning to tackle problems related to summarization and produce coherent and meaningful summaries for news articles. In this project, tools that could be used to generate coherent and concise summaries of long electronic theses and dissertations (ETDs) were developed as well. The major concern initially was to get the text from a PDF file of an ETD. GROBID and Scienceparse were used as pre-processing tools to carry out this task and presented the text from a PDF in a structured format such as XML or JSON file. The outputs from each of the tools were compared qualitatively as well as quantitatively. After this, a transfer learning approach was adopted, wherein a pre-trained model was tweaked to fit to the task of summarizing each ETD. This came in as a challenge to make the model learn the nuances of an ETD. An iterative approach was used to explore various networks, each trying to improve the shortcomings of the previous one in its novel way. Existing deep learning models including Sequence-2-Sequence, Pointer Generator Networks, and A Hybrid Extractive-Abstractive Reinforce-Selecting Sentence Rewriting Network, were used to generate and test summaries. Further tweaks were made to these deep neural networks to account for much longer and varied datasets as compared to what they were inherently designed to work for -- in this case ETDs. A thorough evaluation of these generated summaries was also done with respect to golden standards for five dissertations and theses created during the span of the course. ROUGE-1, ROUGE-2, and ROUGE-SU4 were used to compare the generated summaries with the golden standards. The average ROUGE scores were 0.1387, 0.1224, and 0.0480 respectively. These low ROUGE scores could be attributed to the varying summary length, and also to the complexity of the task of summarizing an ETD. The scope of improvements and the underlying reasons for the performance have also been analyzed. The conclusion that can be drawn from the project is that any machine learning task is highly biased by what pattern is inherently present in the data on which it is being trained. In the context of summarization, there can be a different perspective from which an article can be summarized, and thus the quantitative evaluation measures can vary drastically even after the summary is a coherent one. NSF: IIS-1619028 The submission contains multiple files: - CS5984_Final_Presentation.pdf: The PDF version of the presentation. - CS5984_Final_Presentation.ppt: The PowerPoint for the presentation. - CS5984_Final_Report.pdf: The PDF version of the report. - CS5984_Final_Report.zip: The LaTeX source code for the report. - ArXiv finished file: processed and tokenized arXiv data for Pointer Generator Network -text-summarization-tensorflow: seq2seq model code in TensorFlow modified to adapt with arXiv dataset

Facebook

Twitter

Click to copy link

Link copied

Cite

Martin Obaidi; Martin Obaidi (2025). Dataset: Gold standard dataset for explainability need detection in app reviews. [Dataset]. http://doi.org/10.5281/zenodo.13273192

Dataset: Gold standard dataset for explainability need detection in app reviews.

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13273192

Dataset updated

May 20, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Martin Obaidi; Martin Obaidi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We crawled 90,000 app reviews from both Google Play Store and Apple App Store, including reviews from both free and paid apps. These reviews were filtered for explainability needs, and after this process, 4,495 reviews remained. Among them, 2,185 reviews indicated an explanation need, while 2,310 did not. This resulting gold standard dataset was used to train and evaluate several machine learning models and rule-based approaches for detecting explanation needs in app reviews.

The dataset includes both balanced and unbalanced evaluation sets, as well as the original crawled data from October 2023. In addition to machine learning approaches, rule-based methods optimized for F1 score, precision, and recall are also included.

We provide several pre-trained machine learning models (including BERT, SetFit, AdaBoost, K-Nearest Neighbor, Logistic Regression, Naive Bayes, Random Forest, and SVM) along with training scripts and evaluation notebooks. These models can be applied directly or retrained using the included datasets.

For further details on the structure and usage of the dataset, please refer to the README.md file within the provided ZIP archive.

Clear search

Close search

Google apps

Main menu

Dataset: Gold standard dataset for explainability need detection in app...

Gold Price Prediction using Machine Learning

Dataset

Contents

GeoEDdA: A Gold Standard Dataset for Named Entity Recognition and Span...

Machine Learning Models for Gold Price Prediction (Forecast)

Machine Learning Models for Gold Price Prediction

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Fruits-360 dataset

Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

Branches

How to cite

Dataset properties

For the 100x100 branch

For the original-size branch

For the 3-body-problem branch

For the meta branch

For the multi branch

Filename format:

For the 100x100 branch

For the original-size branch

For the multi branch

Alternate download

How fruits were filmed

Data from: Mineral prospectivity mapping using machine learning techniques...

Relationship and Entity Extraction Evaluation Dataset (Entities)

Gold VTKEL 300 documents dataset

daily-historical-stock-price-data-for-alamos-gold-inc-20032025

BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning...

ABC Gold standard dataset for 300 documents

Data from: Dataset of Au atomic structures for training Machine Learning...

‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Dataset of automatically extracted sizes and morphologies of AuNPs from...

Gold Standard Snapshot Serengeti Bounding Box Coordinates

Additional file 3 of OffsampleAI: artificial intelligence approach to...

daily-historical-stock-price-data-for-americas-gold-and-silver-corporation-20032025...

Data from: DECM Machine Learning Training Corpus

Data from: Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse...

Data from: CS4984/CS5984: Big Data Text Summarization Team 17 ETDs

Dataset: Gold standard dataset for explainability need detection in app reviews.