This dataset was created by CS21M1005
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Trương Quang Sang
Released under Apache 2.0
This dataset was created by Zacchaeus
It contains the following files:
This dataset was created by Extraction
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Chad Mottershead
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘WHO national life expectancy ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mmattson/who-national-life-expectancy on 28 January 2022.
--- Dataset description provided by original source is as follows ---
I am developing my data science skills in areas outside of my previous work. An interesting problem for me was to identify which factors influence life expectancy on a national level. There is an existing Kaggle data set that explored this, but that information was corrupted. Part of the problem solving process is to step back periodically and ask "does this make sense?" Without reasonable data, it is harder to notice mistakes in my analysis code (as opposed to unusual behavior due to the data itself). I wanted to make a similar data set, but with reliable information.
This is my first time exploring life expectancy, so I had to guess which features might be of interest when making the data set. Some were included for comparison with the other Kaggle data set. A number of potentially interesting features (like air pollution) were left off due to limited year or country coverage. Since the data was collected from more than one server, some features are present more than once, to explore the differences.
A goal of the World Health Organization (WHO) is to ensure that a billion more people are protected from health emergencies, and provided better health and well-being. They provide public data collected from many sources to identify and monitor factors that are important to reach this goal. This set was primarily made using GHO (Global Health Observatory) and UNESCO (United Nations Educational Scientific and Culture Organization) information. The set covers the years 2000-2016 for 183 countries, in a single CSV file. Missing data is left in place, for the user to decide how to deal with it.
Three notebooks are provided for my cursory analysis, a comparison with the other Kaggle set, and a template for creating this data set.
There is a lot to explore, if the user is interested. The GHO server alone has over 2000 "indicators". - How are the GHO and UNESCO life expectancies calculated, and what is causing the difference? That could also be asked for Gross National Income (GNI) and mortality features. - How does the life expectancy after age 60 compare to the life expectancy at birth? Is the relationship with the features in this data set different for those two targets? - What other indicators on the servers might be interesting to use? Some of the GHO indicators are different studies with different coverage. Can they be combined to make a more useful and robust data feature? - Unraveling the correlations between the features would take significant work.
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Dataset Origin: [BBC News Summary] Data Source by: [https://www.kaggle.com/datasets/pariza/bbc-news-summary/data] Language(s) (NLP): [English] License: [More Information Needed]
Uses
[Used to summarize a language model like T5, to produce concise and clean summaries to… See the full description on the dataset page: https://huggingface.co/datasets/SurAyush/News_Summary_Dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template. This dataset is a clean version (all NanN removed) of this dataset : https://www.kaggle.com/datasets/devicharith/language-translation-englishfrench . I'm not the person who posted it first on Kaggle.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information… See the full description on the dataset page: https://huggingface.co/datasets/PaulineSanchez/Translation_words_and_sentences_english_french.
This dataset was created by Azib Hasan
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Phishing website Detector’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/eswarchandt/phishing-website-detector on 12 November 2021.
--- Dataset description provided by original source is as follows ---
The data set is provided both in text file and csv file which provides the following resources that can be used as inputs for model building :
A collection of website URLs for 11000+ websites. Each sample has 30 website parameters and a class label identifying it as a phishing website or not (1 or -1).
The code template containing these code blocks: a. Import modules (Part 1) b. Load data function + input/output field descriptions
The data set also serves as an input for project scoping and tries to specify the functional and non-functional requirements for it.
You are expected to write the code for a binary classification model (phishing website or not) using Python Scikit-Learn that trains on the data and calculates the accuracy score on the test data. You have to use one or more of the classification algorithms to train a model on the phishing website data set.
--- Original source retains full ownership of the source dataset ---
Dataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
This is Huggingface dataset version of https://www.kaggle.com/datasets/ambityga/imagenet100. All credits are given to the original author and please cite the original author.
Acknowledgements
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy… See the full description on the dataset page: https://huggingface.co/datasets/ilee0022/ImageNet100.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Indonesian Abusive and Hate Speech Twitter Text’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ilhamfp31/indonesian-abusive-and-hate-speech-twitter-text on 14 February 2022.
--- Dataset description provided by original source is as follows ---
The original author GitHub: https://github.com/okkyibrohim/id-multi-label-hate-speech-and-abusive-language-detection I upload it to Kaggle because I'm using it for my undergraduate project here. All credit to the original author.
The original author preprocess the data in 5 steps. Here's a kernel I make trying to replicate the preprocess steps done by the original author: https://www.kaggle.com/ilhamfp31/preprocessing-the-indonesian-hate-abusive-text/data
Cite the original author if you use the data:
Muhammad Okky Ibrohim and Indra Budi. 2019. Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. In ALW3: 3rd Workshop on Abusive Language Online, 46-57. (Every paper template may have different citation writting. For LaTex user, you can see citation.bib).
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Equiareal Shape from Template Deformations Dataset is a comprehensive repository of RGB-D real videos generated using Kinect V2 recording various experiment that appear in the paper Equiareal shape from template. Part of the codes associated with each experiment are contained in each folder.
Developed through a collaborative effort between the University of Alcalá, University of Clermont-Auvgerne and EnCoV, the database is organized into folders, each corresponding to a specific experiment analyzed in the study. Inside each folder, users will find RGB images, and matlab files with associated tracking and groundtruths.
This resource is intended to support researchers, educators, and students working on 3D deformable reconstruction and related fields, offering a practical tool for experimentation and analysis.
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
[More Information Needed]
Data Splits
[More Information Needed]
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/harpomaxx/example-dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
PhishingURLDataset
This dataset is created for being used for neural network training, on phishing website detection. It has been generated using this raw template.
Dataset Details
This dataset contains phishing websites, which are labeled with "1" and are called "malignant", and benign websites, which are labeled with "0".
Dataset Sources
Kaggle Dataset on Phishing URLs: https://www.kaggle.com/datasets/siddharthkumar25/malicious-and-benign-urls USOM Phishing… See the full description on the dataset page: https://huggingface.co/datasets/semihGuner2002/PhishingURLsDataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
last_update: 2024-03-19
Dataset Description
Dataset Sources [optional]
Repository: [More Information Needed] Paper [optional]: [More Information Needed] Demo [optional]: [More Information Needed]
Kaggle arvix filter by Computer Science
Uses
Direct Use
[More… See the full description on the dataset page: https://huggingface.co/datasets/rjac/arxiv-cs.
A toy dataset for running linear regression! The dataset consists of inputs and targets. Inputs are of shape (1000, 10), where there are 1000 examples and 10 input features. Targets are of shape (1000,), one target per example. Submit learned weights and biases at https://forms.gle/R4gRgrSYcMTPXZUy9 to get a score! Template notebook to get started: https://www.kaggle.com/code/daviddragon/toy-lr-template/notebook
This dataset was created by Odunayo Ogundepo
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
[More Information Needed]
Data Splits
[More Information Needed]
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset.
This dataset was created by CS21M1005