66 datasets found

similar-loss
kaggle.com
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KOOK HEEJIN (2022). similar-loss [Dataset]. https://www.kaggle.com/datasets/kookheejin/similarloss
Explore at:
zip(1798645210 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
KOOK HEEJIN
Description
Dataset

This dataset was created by KOOK HEEJIN

Contents
Critical Habitats Data
kaggle.com
Updated May 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Utkarsh Singh (2023). Critical Habitats Data [Dataset]. https://www.kaggle.com/datasets/utkarshx27/critical-habitats-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 13, 2023
Dataset provided by
Kaggle
Authors
Utkarsh Singh
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
Note: There are 5 files

Description

Connecticut Critical Habitats is a polygon feature-based layer with a resolution of +/- 10 meters that represents significant natural community types occurring in Connecticut. This layer is a subset of habitat-related vegetation associations, described in Connecticut's Natural Vegetation Classification, that were designated as key habitats for species of Greatest Conservation Need in the Comprehensive Wildlife Conservation Strategy. These habitats are known to host a number of rare species including highly specialized invertebrates with very specific habitat associations. Some key habitats are broken into subtypes based on natural variations in plant species dominance and/or vegetation structure. These differences are apparent in the subtype names. Connecticut Critical Habitats can serve to highlight ecologically significant areas and to target areas of species diversity.

This layer can be used to perform various spatial analyses that pertain to Critical Habitats, to aid in determining site management and conservation priorities, prioritizing field surveys, and to further document the distribution and abundance of State-listed and/or rare vertebrate and invertebrate species within the significant habitats. Use this layer appropriately with data maintaining similar resolution. Not intended for maps printed at a resolution greater or more detailed than 1:2000.

Purpose

Connecticut Critical Habitats provides the identification and distribution of a subset of important wildlife habitats identified in the Connecticut Comprehensive Wildlife Conservation Strategy. Connecticut Critical Habitats can be used in conjunction with other environmental and natural resource information to provide a more thorough understanding of the physical characteristics of each habitat. The spatial relationships between these areas and data such as land ownership and past, present and projected land use can be analyzed. The Connecticut Critical Habitats can serve to highlight ecologically significant areas and to target areas of species diversity for land conservation and protection. Biologists may use this data to target further research on associated plant and animal species.

Use Limitations

Connecticut Critical Habitats is not a comprehensive map of all critical habitat types in Connecticut. It represents a subset of the key habitats of greatest conservation need identified in Connecticut's Comprehensive Wildlife Conservation Strategy. Sites were mapped according to their known distribution. For some habitats the distribution may not be complete since no state-wide exhaustive surveys have been conducted. Most critical habitat sites were not field visited and publicly available oblique imagery such as the Bing Maps web mapping service was used as a surrogate for field investigation. Caution is advised when using this information without field verifying the habitat delineation and characterization for accuracy. Since many of these areas occur on private property, visiting these sites will require permission from the landowner for access. The recommended scale for viewing Critical Habitats is 1:2,000 to 1:12,000. Displaying Connecticut Critical Habitats at map scales larger and more detailed than 1:2,000 scale may result in minor locational differences and inaccuracies.
FSDKaggle2018
zenodo.org
opendatalab.com
+1more
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal (2020). FSDKaggle2018 [Dataset]. http://doi.org/10.5281/zenodo.2552860
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2552860
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal
Description
FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge.

Citation

If you use the FSDKaggle2018 dataset or part of it, please cite our DCASE 2018 paper:

Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". Proceedings of the DCASE 2018 Workshop (2018)

You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2018.

Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

Contact

You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

About this dataset

Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology [1]. FSDKaggle2018 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2018. Please visit the DCASE2018 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound General-Purpose Audio Tagging Challenge. It was organized by researchers from the Music Technology Group of Universitat Pompeu Fabra, and from Google Research’s Machine Perception Team.

The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology.

All audio samples in this dataset are gathered from Freesound [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely.

The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the Data labeling process section. FSDKaggle2018 clips are unequally distributed in the following 41 categories of the AudioSet Ontology:

"Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".

Some other relevant characteristics of FSDKaggle2018:

The dataset is split into a train set and a test set.

The train set is meant to be for system development and includes ~9.5k samples unequally distributed among 41 categories. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h.

Out of the ~9.5k samples from the train set, ~3.7k have manually-verified ground truth annotations and ~5.8k have non-verified annotations. The non-verified annotations of the train set have a quality estimate of at least 65-70% in each category. Checkout the Data labeling process section below for more information about this aspect.

Non-verified annotations in the train set are properly flagged in train.csv so that participants can opt to use this information during the development of their systems.

The test set is composed of 1.6k samples with manually-verified annotations and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h.

All audio samples in this dataset have a single label (i.e. are only annotated with one label). Checkout the Data labeling process section below for more information about this aspect. A single label should be predicted for each file in the test set.

Data labeling process

The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or labels), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were automatically annotated with labels from the AudioSet Ontology. These annotations can be understood as weak labels since they express the presence of a sound category in an audio sample.

Then, a data validation process was carried out in which a number of participants did listen to the annotated sounds and manually assessed the presence/absence of an automatically assigned sound category, according to the AudioSet category description.

Audio samples in FSDKaggle2018 are only annotated with a single ground truth label (see train.csv). A total of 3,710 annotations included in the train set of FSDKaggle2018 are annotations that have been manually validated as present and predominant (some with inter-annotator agreement but not all of them). This means that in most cases there is no additional acoustic material other than the labeled category. In few cases there may be some additional sound events, but these additional events won't belong to any of the 41 categories of FSDKaggle2018.

The rest of the annotations have not been manually validated and therefore some of them could be inaccurate. Nonetheless, we have estimated that at least 65-70% of the non-verified annotations per category in the train set are indeed correct. It can happen that some of these non-verified audio samples present several sound sources even though only one label is provided as ground truth. These additional sources are typically out of the set of the 41 categories, but in a few cases they could be within.

More details about the data labeling process can be found in [3].

License

FSDKaggle2018 has licenses at two different levels, as explained next.

All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of the audio clips included in FSDKaggle2018 and their corresponding license. The licenses are specified in the files train_post_competition.csv and test_post_competition_scoring_clips.csv.

In addition, FSDKaggle2018 as a whole is the result of a curation process and it has an additional license. FSDKaggle2018 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2018.doc zip file.

Files

FSDKaggle2018 can be downloaded as a series of zip files with the following directory structure:

root │
└───FSDKaggle2018.audio_train/ Audio clips in the train set │
└───FSDKaggle2018.audio_test/ Audio clips in the test set │
└───FSDKaggle2018.meta/ Files for evaluation setup │ │
│ └───train_post_competition.csv Data split and ground truth for the train set │ │
│ └───test_post_competition_scoring_clips.csv Ground truth for the test set
│
└───FSDKaggle2018.doc/ │
└───README.md The dataset description file you are reading │
└───LICENSE-DATASET
f
Alternative predictor variables.
plos.figshare.com
xls
Updated May 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rivalani Hlongwane; Kutlwano K. K. M. Ramaboa; Wilson Mongwe (2024). Alternative predictor variables. [Dataset]. http://doi.org/10.1371/journal.pone.0303566.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303566.t001
Dataset updated
May 21, 2024
Dataset provided by
PLOS ONE
Authors
Rivalani Hlongwane; Kutlwano K. K. M. Ramaboa; Wilson Mongwe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study explores the potential of utilizing alternative data sources to enhance the accuracy of credit scoring models, compared to relying solely on traditional data sources, such as credit bureau data. A comprehensive dataset from the Home Credit Group’s home loan portfolio is analysed. The research examines the impact of incorporating alternative predictors that are typically overlooked, such as an applicant’s social network default status, regional economic ratings, and local population characteristics. The modelling approach applies the model-X knockoffs framework for systematic variable selection. By including these alternative data sources, the credit scoring models demonstrate improved predictive performance, achieving an area under the curve metric of 0.79360 on the Kaggle Home Credit default risk competition dataset, outperforming models that relied solely on traditional data sources, such as credit bureau data. The findings highlight the significance of leveraging diverse, non-traditional data sources to augment credit risk assessment capabilities and overall model accuracy.
Z
Dollar street 10 - 64x64x3
data.niaid.nih.gov
zenodo.org
Updated Apr 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
van der burg, Sven (2024). Dollar street 10 - 64x64x3 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10837089
Explore at:
Dataset updated
Apr 14, 2024
Dataset authored and provided by
van der burg, Sven
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MLCommons Dollar Street Dataset is a collection of images of everyday household items from homes around the world that visually captures socioeconomic diversity of traditionally underrepresented populations. It consists of public domain data, licensed for academic, commercial and non-commercial usage, under CC-BY and CC-BY-SA 4.0. The dataset was developed because similar datasets lack socioeconomic metadata and are not representative of global diversity.

This is a subset of the original dataset that can be used for multiclass classification with 10 categories. It is designed to be used in teaching, similar to the widely used, but unlicensed CIFAR-10 dataset.

These are the preprocessing steps that were performed:

Only take examples with one imagenet_synonym label

Use only examples with the 10 most frequently occuring labels

Downscale images to 64 x 64 pixels

Split data in train and test

Store as numpy array

This is the label mapping:

Category label

day bed 0

dishrag 1

plate 2

running shoe 3

soap dispenser 4

street sign 5

table lamp 6

tile roof 7

toilet seat 8

washing machine 9

Checkout this notebook to see how the subset was created.

The original dataset was downloaded from https://www.kaggle.com/datasets/mlcommons/the-dollar-street-dataset. See https://mlcommons.org/datasets/dollar-street/ for more information.
NYS Alternative Fuel Stations in New York
kaggle.com
zip
Updated Dec 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of New York (2020). NYS Alternative Fuel Stations in New York [Dataset]. https://www.kaggle.com/new-york-state/nys-alternative-fuel-stations-in-new-york
Explore at:
zip(552277 bytes)Available download formats
Dataset updated
Dec 28, 2020
Dataset authored and provided by
State of New York
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
New York, New York
Description
Content

Go to https://afdc.energy.gov/stations/#/find/nearest to access the full database of alternative fuel station locations nationwide, collected and maintained by the U.S. Department of Energy National Renewable Energy Laboratory. A station appears as one point in the data and on the map, regardless of the number of fuel dispensers or charging outlets at that location. For EV charging stations for example, the data includes the number of number of charging ports available at the specific station.

How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.

Context

This is a dataset hosted by the State of New York. The state has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York State using Kaggle and all of the data sources available through the State of New York organization page!

Update Frequency: This dataset is updated annually.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
A
‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-boston-house-prices-advanced-regression-techniques-bae0/fd606ebf/?iid=003-689&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Boston
Description
Analysis of ‘Boston House Prices-Advanced Regression Techniques’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/the-boston-houseprice-data on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Similar Datasets

Gender Pay Gap Dataset: LINK

California Housing Prices Data (5 new features!): LINK

Company Bankruptcy Prediction: LINK

Context

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

Attribute Information

Input features in order: 1) CRIM: per capita crime rate by town 2) ZN: proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS: proportion of non-retail business acres per town 4) CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise) 5) NOX: nitric oxides concentration (parts per 10 million) [parts/10M] 6) RM: average number of rooms per dwelling 7) AGE: proportion of owner-occupied units built prior to 1940 8) DIS: weighted distances to five Boston employment centres 9) RAD: index of accessibility to radial highways 10) TAX: full-value property-tax rate per $10,000 [$/10k] 11) PTRATIO: pupil-teacher ratio by town 12) B: The result of the equation B=1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 13) LSTAT: % lower status of the population

Output variable: 1) MEDV: Median value of owner-occupied homes in $1000's [k$]

Source

StatLib - Carnegie Mellon University

Relevant Papers

Harrison, David & Rubinfeld, Daniel. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management. 5. 81-102. 10.1016/0095-0696(78)90006-2. LINK

Belsley, David A. & Kuh, Edwin. & Welsch, Roy E. (1980). Regression diagnostics: identifying influential data and sources of collinearity. New York: Wiley LINK

--- Original source retains full ownership of the source dataset ---
A
‘COVID-19's Impact on Educational Stress’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19's Impact on Educational Stress’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-s-impact-on-educational-stress-49b5/4f12e21a/?iid=019-227&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘COVID-19's Impact on Educational Stress’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/bsoyka3/educational-stress-due-to-the-coronavirus-pandemic on 28 January 2022.

--- Dataset description provided by original source is as follows ---

The survey collecting this information is still open for responses here.

Context

I just made this public survey because I want someone to be able to do something fun or insightful with the data that's been gathered. You can fill it out too!

Content

Each row represents a response to the survey. A few things have been done to sanitize the raw responses: - Column names and options have been renamed to make them easier to work with without much loss of meaning. - Responses from non-students have been removed. - Responses with ages greater than or equal to 22 have been removed.

Take a look at the column description for each column to see what exactly it represents.

Acknowledgements

This dataset wouldn't exist without the help of others. I'd like to thank the following people for their contributions: - Every student who responded to the survey with valid responses - @radcliff on GitHub for providing the list of countries and abbreviations used in the survey and dataset - Giovanna de Vincenzo for providing the list of US states used in the survey and dataset - Simon Migaj for providing the image used for the survey and this dataset

--- Original source retains full ownership of the source dataset ---
A
‘WHO national life expectancy ’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘WHO national life expectancy ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-who-national-life-expectancy-c4c7/d31e495e/?iid=008-857&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘WHO national life expectancy ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mmattson/who-national-life-expectancy on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

I am developing my data science skills in areas outside of my previous work. An interesting problem for me was to identify which factors influence life expectancy on a national level. There is an existing Kaggle data set that explored this, but that information was corrupted. Part of the problem solving process is to step back periodically and ask "does this make sense?" Without reasonable data, it is harder to notice mistakes in my analysis code (as opposed to unusual behavior due to the data itself). I wanted to make a similar data set, but with reliable information.

This is my first time exploring life expectancy, so I had to guess which features might be of interest when making the data set. Some were included for comparison with the other Kaggle data set. A number of potentially interesting features (like air pollution) were left off due to limited year or country coverage. Since the data was collected from more than one server, some features are present more than once, to explore the differences.

Content

A goal of the World Health Organization (WHO) is to ensure that a billion more people are protected from health emergencies, and provided better health and well-being. They provide public data collected from many sources to identify and monitor factors that are important to reach this goal. This set was primarily made using GHO (Global Health Observatory) and UNESCO (United Nations Educational Scientific and Culture Organization) information. The set covers the years 2000-2016 for 183 countries, in a single CSV file. Missing data is left in place, for the user to decide how to deal with it.

Three notebooks are provided for my cursory analysis, a comparison with the other Kaggle set, and a template for creating this data set.

Inspiration

There is a lot to explore, if the user is interested. The GHO server alone has over 2000 "indicators". - How are the GHO and UNESCO life expectancies calculated, and what is causing the difference? That could also be asked for Gross National Income (GNI) and mortality features. - How does the life expectancy after age 60 compare to the life expectancy at birth? Is the relationship with the features in this data set different for those two targets? - What other indicators on the servers might be interesting to use? Some of the GHO indicators are different studies with different coverage. Can they be combined to make a more useful and robust data feature? - Unraveling the correlations between the features would take significant work.

--- Original source retains full ownership of the source dataset ---
CT-FAN-21 corpus: A dataset for Fake News Detection
zenodo.org
Updated Oct 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4714517
Dataset updated
Oct 23, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
Description
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

Citation

Please cite our work as

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

Task 3a

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Task 3b

public_id- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

domain - domain of the given news article(applicable only for task B)

Output data format

Task 3a

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

Task 3b

public_id- Unique identifier of the news article

predicted_domain- predicted domain

Sample file

public_id, predicted_domain 1, health 2, crime

Additional data for Training

To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

Fakenews Classification Datasets

Fake News Detection Challenge KDD 2020

FakeNewsNet

IMPORTANT!

Fake news article used for task 3b is a subset of task 3a.

We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

Evaluation Metrics

This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

Submission Link: https://competitions.codalab.org/competitions/31238

Related Work

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
A
‘Customer Segmentation Classification’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Customer Segmentation Classification’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-customer-segmentation-classification-4965/7267b2f5/?iid=015-403&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Customer Segmentation Classification’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kaushiksuresh147/customer-segmentation on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

An automobile company has plans to enter new markets with their existing products (P1, P2, P3, P4, and P5). After intensive market research, they’ve deduced that the behavior of the new market is similar to their existing market.

In their existing market, the sales team has classified all customers into 4 segments (A, B, C, D ). Then, they performed segmented outreach and communication for a different segment of customers. This strategy has work e exceptionally well for them. They plan to use the same strategy for the new markets and have identified 2627 new potential customers.

You are required to help the manager to predict the right group of the new customers.

Content

|Variable|Definition| |--|--| |ID|Unique ID| |Gender|Gender of the customer| |Ever_Married|Marital status of the customer| |Age|Age of the customer| |Graduated|Is the customer a graduate?| |Profession|Profession of the customer| |Work_Experience|Work Experience in years| |Spending_Score|Spending score of the customer| |Family_Size|Number of family members for the customer (including the customer)| |Var_1|Anonymised Category for the customer| |Segmentation|(target) Customer Segment of the customer|

Acknowledgements

This dataset was acquired from the Analytics Vidhya hackathon.

--- Original source retains full ownership of the source dataset ---
options-3
kaggle.com
zip
Updated Oct 21, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
_godmode_ (2020). options-3 [Dataset]. https://www.kaggle.com/kaustubh243/options3
Explore at:
zip(14923790 bytes)Available download formats
Dataset updated
Oct 21, 2020
Authors
_godmode_
Description
Dataset

This dataset was created by godmode

Contents
A
‘Cardano Data’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Cardano Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-cardano-data-6b5c/e7fad47b/?iid=003-943&v=presentation
Explore at:
Dataset updated
Nov 13, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Cardano Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varpit94/cardano-data on 13 November 2021.

--- Dataset description provided by original source is as follows ---

What is Cardano?

Cardano is a public blockchain platform. It is open-source and decentralized, with consensus achieved using proof of stake. It can facilitate peer-to-peer transactions with its internal cryptocurrency, Ada. Cardano was founded in 2015 by Ethereum co-founder Charles Hoskinson. The development of the project is overseen and supervised by the Cardano Foundation based in Zug, Switzerland. It is also the largest cryptocurrency to use a proof-of-stake blockchain, which is seen as a greener alternative to proof-of-work protocols.

Data Description

This dataset provides the history of daily prices of Cardano. The data starts from 01-Oct-2017. All the column descriptions are provided. Currency is USD.

--- Original source retains full ownership of the source dataset ---
1000-options-day-data
kaggle.com
zip
Updated Jul 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ChengHB (2023). 1000-options-day-data [Dataset]. https://www.kaggle.com/datasets/chenghb/1000-options-day-data/suggestions?status=pending&yourSuggestions=true
Explore at:
zip(3283565 bytes)Available download formats
Dataset updated
Jul 31, 2023
Authors
ChengHB
Description
Dataset

This dataset was created by ChengHB

Contents
A
‘last.fm Music Artist Scrobbles’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘last.fm Music Artist Scrobbles’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-last-fm-music-artist-scrobbles-b1d2/0776ba62/?iid=000-706&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘last.fm Music Artist Scrobbles’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/pcbreviglieri/lastfm-music-artist-scrobbles on 14 February 2022.

--- Dataset description provided by original source is as follows ---

This dataset is a summarized, sanitized subset of the one released at The 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011), currently hosted at the GroupLens website (here).

Sanitization included: (a) artist name mispelling correction and standardization; (b) reassignment of artists referenced with two or more artist id's; (c) removal of artists listed as 'unknown' or through their website addresses.

The original dataset contains a larger number of files, including tag-related information, in addition to users, artists and scrobble counts. last.fm was contacted by the author and asked for some recent version of this content, in similar format, with no return until June 15th, 2020.

--- Original source retains full ownership of the source dataset ---
Caucasian People - Liveness Detection Dataset
kaggle.com
Updated Apr 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). Caucasian People - Liveness Detection Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/caucasian-people-liveness-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Biometric Attack Dataset, Caucasian People

The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset

The dataset for face anti spoofing and face recognition includes images and videos of сaucasian people. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group.

The videos were gathered by capturing faces of genuine individuals presenting spoofs, using facial presentations. Our dataset proposes a novel approach that learns and detects spoofing techniques, extracting features from the genuine facial images to prevent the capturing of such information by fake users.

The dataset contains images and videos of real humans with various resolutions, views, and colors, making it a comprehensive resource for researchers working on anti-spoofing technologies.

People in the dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F09524087833ccb985350545376670f7d%2FFrame%20102.png?generation=1712318079960855&alt=media" alt="">

Types of files in the dataset:

photo - selfie of the person

video - real video of the person

Our dataset also explores the use of neural architectures, such as deep neural networks, to facilitate the identification of distinguishing patterns and textures in different regions of the face, increasing the accuracy and generalizability of the anti-spoofing models.

💴 For Commercial Usage: Full version of the dataset includes 19,000 files, leave a request on TrainingData to buy the dataset

Metadata for the full dataset:

assignment_id - unique identifier of the media file

worker_id - unique identifier of the person

age - age of the person

true_gender - gender of the person

country - country of the person

ethnicity - ethnicity of the person

video_extension - video extensions in the dataset

video_resolution - video resolution in the dataset

video_duration - video duration in the dataset

video_fps - frames per second for video in the dataset

photo_extension - photo extensions in the dataset

photo_resolution - photo resolution in the dataset

Statistics for the dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F0b17f6b68aea01fda89c4608db97a94f%2FFrame%20101.png?generation=1712314613427348&alt=media" alt="">

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to learn about the price and buy the dataset

Content

The dataset consists of: - files - includes 10 folders corresponding to each person and including 1 image and 1 video, - .csv file - contains information about the files and people in the dataset

File with the extension .csv

id: id of the person,

selfie_link: link to access the photo,

video_link: link to access the video,

age: age of the person,

country: country of the person,

gender: gender of the person,

video_extension: video extension,

video_resolution: video resolution,

video_duration: video duration,

video_fps: frames per second for video,

photo_extension: photo extension,

photo_resolution: photo resolution

TrainingData provides high-quality data annotation tailored to your needs

keywords: liveness detection systems, liveness detection dataset, biometric dataset, biometric data dataset, biometric system attacks, anti-spoofing dataset, face liveness detection, deep learning dataset, face spoofing database, face anti-spoofing, ibeta dataset, face anti spoofing, large-scale face anti spoofing, rich annotations anti spoofing dataset
A
‘Argentina provincial data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Argentina provincial data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-argentina-provincial-data-4425/b5eea614/?iid=001-550&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Argentina
Description
Analysis of ‘Argentina provincial data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kingabzpro/argentina-provincial-data on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

With almost 40 million inhabitants and a diverse geography that encompasses the Andes mountains, glacial lakes, and the Pampas grasslands, Argentina is the second largest country (by area) and has one of the largest economies in South America. It is politically organized as a federation of 23 provinces and an autonomous city, Buenos Aires.

Content

We will analyze ten economic and social indicators collected for each province. Because these indicators are highly correlated, we will use principal component analysis (PCA) to reduce redundancies and highlight patterns that are not apparent in the raw data. After visualizing the patterns, we will use k-means clustering to partition the provinces into groups with similar development levels.

These results can be used to plan public policy by helping allocate resources to develop infrastructure, education, and welfare programs.

Acknowledgements

DataCamp

--- Original source retains full ownership of the source dataset ---
P
Paimon Dataset YOLO Detection Dataset
paperswithcode.com
gts.ai
Updated Mar 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Paimon Dataset YOLO Detection Dataset [Dataset]. https://paperswithcode.com/dataset/paimon-dataset-yolo-detection
Explore at:
Dataset updated
Mar 18, 2025
Description
Description:

👉 Download the dataset here

This dataset consists of a diverse collection of images featuring Paimon, a popular character from the game Genshin Impact. The images have been sourced from in-game gameplay footage and capture Paimon from various angles and in different sizes (scales), making the dataset suitable for training YOLO object detection models.

The dataset provides a comprehensive view of Paimon in different lighting conditions, game environments, and positions, ensuring the model can generalize well to similar characters or object detection tasks. While most annotations are accurately labeled, a small number of annotations may include minor inaccuracies due to manual labeling errors. This is ideal for researchers and developers working on character recognition, object detection in gaming environments, or other AI vision tasks.

Download Dataset

Dataset Features:

Image Format: .jpg files in 640×320 resolution.

Annotation Format: .txt files in YOLO format, containing bounding box data with:

class_id

x_center

y_center

width

height

Use Cases:

Character Detection in Games: Train YOLO models to detect and identify in-game characters or NPCs.

Gaming Analytics: Improve recognition of specific game elements for AI-powered game analytics tools.

Research: Contribute to academic research focused on object detection or computer vision in animated and gaming environments.

Data Structure:

Images: High-quality .jpg images captured from multiple perspectives, ensuring robust model training across various orientations and lighting scenarios.

Annotations: Each image has an associated .txt file that follows the YOLO format. The annotations are structured to include class identification, object location (center coordinates), and

bounding box dimensions.

Key Advantages:

Varied Angles and Scales: The dataset includes Paimon from multiple perspectives, aiding in creating more versatile and adaptable object detection models.

Real-World Scenario: Extracted from actual gameplay footage, the dataset simulates real-world detection challenges such as varying backgrounds, motion blur, and changing character scales.

Training Ready: Suitable for training YOLO models and other deep learning frameworks that require object detection capabilities.

This dataset is sourced from Kaggle.
h
finance-alpaca
huggingface.co
Updated Apr 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
finance-alpaca [Dataset]. https://huggingface.co/datasets/gbharti/finance-alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/2557
Dataset updated
Apr 7, 2023
Authors
Gaurang Bharti
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset is a combination of Stanford's Alpaca (https://github.com/tatsu-lab/stanford_alpaca) and FiQA (https://sites.google.com/view/fiqa/) with another 1.3k pairs custom generated using GPT3.5 Script for tuning through Kaggle's (https://www.kaggle.com) free resources using PEFT/LoRa: https://www.kaggle.com/code/gbhacker23/wealth-alpaca-lora GitHub repo with performance analyses, training and data generation scripts, and inference notebooks: https://github.com/gaurangbharti1/wealth-alpaca… See the full description on the dataset page: https://huggingface.co/datasets/gbharti/finance-alpaca.
Numenta Anomaly Benchmark (NAB)
kaggle.com
Updated Aug 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BoltzmannBrain (2016). Numenta Anomaly Benchmark (NAB) [Dataset]. https://www.kaggle.com/datasets/boltzmannbrain/nab/discussion?sortBy=hot&group=upvoted
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BoltzmannBrain
Description
The Numenta Anomaly Benchmark (NAB) is a novel benchmark for evaluating algorithms for anomaly detection in streaming, online applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications. All of the data and code is fully open-source, with extensive documentation, and a scoreboard of anomaly detection algorithms: github.com/numenta/NAB. The full dataset is included here, but please go to the repo for details on how to evaluate anomaly detection algorithms on NAB.

NAB Data Corpus

The NAB corpus of 58 timeseries data files is designed to provide data for research in streaming anomaly detection. It is comprised of both real-world and artifical timeseries data containing labeled anomalous periods of behavior. Data are ordered, timestamped, single-valued metrics. All data files contain anomalies, unless otherwise noted.

The majority of the data is real-world from a variety of sources such as AWS server metrics, Twitter volume, advertisement clicking metrics, traffic data, and more. All data is included in the repository, with more details in the data readme. We are in the process of adding more data, and actively searching for more data. Please contact us at nab@numenta.org if you have similar data (ideally with known anomalies) that you would like to see incorporated into NAB.

The NAB version will be updated whenever new data (and corresponding labels) is added to the corpus; NAB is currently in v1.0.

Real data

realAWSCloudwatch/

AWS server metrics as collected by the AmazonCloudwatch service. Example metrics include CPU Utilization, Network Bytes In, and Disk Read Bytes.

realAdExchange/

Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM). One of the files is normal, without anomalies.

realKnownCause/

This is data for which we know the anomaly causes; no hand labeling.

ambient_temperature_system_failure.csv: The ambient temperature in an office setting.

cpu_utilization_asg_misconfiguration.csv: From Amazon Web Services (AWS) monitoring CPU usage – i.e. average CPU usage across a given cluster. When usage is high, AWS spins up a new machine, and uses fewer machines when usage is low.

ec2_request_latency_system_failure.csv: CPU usage data from a server in Amazon's East Coast datacenter. The dataset ends with complete system failure resulting from a documented failure of AWS API servers. There's an interesting story behind this data in the "http://numenta.com/blog/anomaly-of-the-week.html">Numenta blog.

machine_temperature_system_failure.csv: Temperature sensor data of an internal component of a large, industrial mahcine. The first anomaly is a planned shutdown of the machine. The second anomaly is difficult to detect and directly led to the third anomaly, a catastrophic failure of the machine.

nyc_taxi.csv: Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the NYC Taxi and Limousine Commission. The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets.

rogue_agent_key_hold.csv: Timing the key holds for several users of a computer, where the anomalies represent a change in the user.

rogue_agent_key_updown.csv: Timing the key strokes for several users of a computer, where the anomalies represent a change in the user.

realTraffic/

Real time traffic data from the Twin Cities Metro area in Minnesota, collected by the Minnesota Department of Transportation. Included metrics include occupancy, speed, and travel time from specific sensors.

realTweets/

A collection of Twitter mentions of large publicly-traded companies such as Google and IBM. The metric value represents the number of mentions for a given ticker symbol every 5 minutes.

Artificial data

artificialNoAnomaly/

Artifically-generated data without any anomalies.

artificialWithAnomaly/

Artifically-generated data with varying types of anomalies.

Acknowledgments

We encourage you to publish your results on running NAB, and share them with us at nab@numenta.org. Please cite the following publication when referring to NAB:

Lavin, Alexander and Ahmad, Subutai. "Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark", Fourteenth International Conference on Machine Learning and Applications, December 2015. [PDF]

Facebook

Twitter

Click to copy link

Link copied

Cite

KOOK HEEJIN (2022). similar-loss [Dataset]. https://www.kaggle.com/datasets/kookheejin/similarloss

similar-loss

Explore at:

zip(1798645210 bytes)Available download formats

Dataset updated

Jan 20, 2022

Authors

KOOK HEEJIN

Description

Dataset

This dataset was created by KOOK HEEJIN

Clear search

Close search

Google apps

Main menu

similar-loss

Dataset

Contents

Critical Habitats Data

Note: There are 5 files

Description

Purpose

Use Limitations

FSDKaggle2018

Alternative predictor variables.

Dollar street 10 - 64x64x3

NYS Alternative Fuel Stations in New York

Content

Context

Acknowledgements

‘Boston House Prices-Advanced Regression Techniques’ analyzed by Analyst-2

Similar Datasets

Context

Attribute Information

Source

Relevant Papers

‘COVID-19's Impact on Educational Stress’ analyzed by Analyst-2

Context

Content

Acknowledgements

‘WHO national life expectancy ’ analyzed by Analyst-2

Context

Content

Inspiration

CT-FAN-21 corpus: A dataset for Fake News Detection

‘Customer Segmentation Classification’ analyzed by Analyst-2

Context

Content

Acknowledgements

options-3

Dataset

Contents

‘Cardano Data’ analyzed by Analyst-2

What is Cardano?

Data Description

1000-options-day-data

Dataset

Contents

‘last.fm Music Artist Scrobbles’ analyzed by Analyst-2

Caucasian People - Liveness Detection Dataset

Biometric Attack Dataset, Caucasian People

The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset

People in the dataset

Types of files in the dataset:

💴 For Commercial Usage: Full version of the dataset includes 19,000 files, leave a request on TrainingData to buy the dataset

Metadata for the full dataset:

Statistics for the dataset

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to learn about the price and buy the dataset

Content

File with the extension .csv

TrainingData provides high-quality data annotation tailored to your needs

‘Argentina provincial data’ analyzed by Analyst-2

Context

Content

Acknowledgements

Paimon Dataset YOLO Detection Dataset

finance-alpaca

Numenta Anomaly Benchmark (NAB)

NAB Data Corpus

Real data

Artificial data

Acknowledgments

similar-loss

Dataset

Contents