7 datasets found

cats_vs_dogs
huggingface.co
tensorflow.org
+1more
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2024). cats_vs_dogs [Dataset]. https://huggingface.co/datasets/microsoft/cats_vs_dogs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 23, 2024
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for Cats Vs. Dogs

Dataset Summary

A large set of images of cats and dogs. There are 1738 corrupted images that are dropped. This dataset is part of a now-closed Kaggle competition and represents a subset of the so-called Asirra dataset. From the competition page:

The Asirra data set Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/cats_vs_dogs.
h
Animal_Image_Classification_Dataset
huggingface.co
kaggle.com
Updated Apr 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Álvaro García Vásquez (2024). Animal_Image_Classification_Dataset [Dataset]. https://huggingface.co/datasets/AlvaroVasquezAI/Animal_Image_Classification_Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 4, 2024
Authors
Álvaro García Vásquez
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Summary: The Animal Image Classification Dataset is a comprehensive collection of images tailored for the development and evaluation of machine learning models in the field of computer vision. It contains 3,000 JPG images, carefully segmented into three classes representing common pets and wildlife: cats, dogs, and snakes. Dataset Contents: cats/: A set of 1,000 JPG images of cats, showcasing a wide array of breeds, environments, and postures. dogs/: A diverse compilation of 1,000 dog… See the full description on the dataset page: https://huggingface.co/datasets/AlvaroVasquezAI/Animal_Image_Classification_Dataset.
h
pet-health-symptoms-dataset
huggingface.co
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karen Wong (2025). pet-health-symptoms-dataset [Dataset]. https://huggingface.co/datasets/karenwky/pet-health-symptoms-dataset
Explore at:
Dataset updated
Apr 27, 2025
Authors
Karen Wong
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Pet Health Symptoms Dataset

Overview

This dataset contains 2,000 LLM-generated pet health symptoms text samples covering 5 common pet health condition categories, designed to train ML models for automated pet health classification. Each entry is labeled with:

Pet health condition (1 of 5 distinct classes)
Record type (Owner Observation or Clinical Notes)

Owner observations are expressed in everyday language (e.g., "My cat scratches constantly"), whereas clinical… See the full description on the dataset page: https://huggingface.co/datasets/karenwky/pet-health-symptoms-dataset.
Illinois DOC labeled faces dataset
kaggle.com
Updated Dec 6, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David J. Fisher (2019). Illinois DOC labeled faces dataset [Dataset]. https://www.kaggle.com/davidjfisher/illinois-doc-labeled-faces-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
David J. Fisher
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.

The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.

It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.

Here is the readme file:

---BEGIN README---
Scraped from the Illinois DOC.

https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=

paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d ' ' > showside.txt
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d ' ' > showfront.txt
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d ' ' > inmates_print.txt

aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt

Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.
Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.

front.7z contains mugshots from the front
side.7z contains mugshots from the side
inmates.7z contains all the html files
csv contains the html files converted to CSV

The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.

All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.

There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.

The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."
Some inmates were marked "Not Available", this has been replaced with "N/A".
Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".

The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.

The "weight" column is often rounded to the nearest 5 lbs.

Statistics for hair:
43305 Black
17371 Brown
2887 Blonde or Strawberry
2539 Gray or Partially Gray
740 Red or Auburn
624 Bald
396 Not Available
209 Salt and Pepper
70 White
7 Sandy
1 Unknown

Statistics for sex:
63409 Male
4740 Female

Statistics for race:
37991 Black
20992 White
8637 Hispanic
235 Asian
104 Amer Indian
94 Unknown
92 Bi-Racial
4

Statistics for eyes:
51714 Brown
7808 Blue
4259 Hazel
2469 Green
1382 Black
420 Not Available
87 Gray
9 Maroon
1 Unknown
---END README---

Here is a formal summary:

---BEGIN SUMMARY---
Documentation:

Title: Illinois DOC dataset

Source Information
-- Creators: Illinois DOC
-- Illinois Department of Corrections
1301 Concordia Court
P.O. Box 19277
Springfield, IL 62794-9277
(217) 558-2200 x 2008
-- Donor: Anonymous
-- Date: 2019

Past Usage:
-- None

Relevant Information:
-- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
-- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv...
h
PopularCatClassification
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moclauq (2025). PopularCatClassification [Dataset]. https://huggingface.co/datasets/moclauq/PopularCatClassification
Explore at:
Dataset updated
Jun 1, 2025
Authors
Moclauq
Description
moclauq/PopularCatClassification dataset hosted on Hugging Face and contributed by the HF Datasets community
i
INSPIRE Hunting management areas of Catalonia
catalegs.ide.cat
data.europa.eu
Updated Sep 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). INSPIRE Hunting management areas of Catalonia [Dataset]. https://catalegs.ide.cat/geonetwork/inspire/search?topicCat=planningCadastre
Explore at:
Dataset updated
Sep 4, 2023
Area covered
Catalonia
Description
Limits, and surfaces, of the different hunting management figures of Catalonia, derived from Law 1/1970, and classified as special regime, and of the terrains classified as common us. This data set is conforming to the technical specifications of INSPIRE Area management/restriction/regulation zones and reporting units.
Replication Package - How Do Requirements Evolve During Elicitation? An...
zenodo.org
bin, zip
Updated Apr 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath (2022). Replication Package - How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis [Dataset]. http://doi.org/10.5281/zenodo.6472498
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6472498
Dataset updated
Apr 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.

The package contains the following folders and files.

/R-analysis

This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:

- RQ1-1-analyse-story-rates.R: Tabe 1, user story rates

- RQ1-1-analyse-role-rates.R: Table 1, role rates

- RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates

- RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates

- RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2

- RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2

- RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.

The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.

- RQ1-1-story-rates.csv: Figure 4

- RQ1-1-role-rates.csv: Figure 5

- RQ1-2-categories-phase-1.csv: Figure 8

- RQ1-2-role-category-phase-1.csv: Figure 9

- RQ2-1-user-story-and-roles-phase-2.csv: Figure 13

- RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14

- RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17

- IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15

- IMG-only-RQ2.2-frequent-roles.csv: Figure 18

NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.

/Data-Analysis

This folder contains all the data used to answer the research questions.

RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.

RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: for each category of user story, and for each analyst, there are two lines.

The first one reports the number of user stories in that category for phase 1, and the second one reports the

number of user stories in that category for phase 2, considering the specific analyst.

- Data Source-role: for each category of role, and for each analyst, there are two lines.

The first one reports the number of user stories in that role for phase 1, and the second one reports the

number of user stories in that role for phase 2, considering the specific analyst.

- RQ2.1 rates: reports the final rates for RQ2.1.

NOTE: The other tabs are used to support the computation of the final rates.

RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14

- RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17

- RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.

A separate tab is used given the complexity of the computations.

- Data Source-US-category: same as RQ2.1.xlsx

- Totals: total number of user stories for each analyst in phase 1 and phase 2

- Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file

"img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15

- Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19

- RQ2-3-most-frequent-categories: most frequent novel categories

/Raw-Data-Phase-I

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:

- Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Raw-Data-Phaes-II

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:

- Analysis: includes the annotation of the user stories as belonging to existing original

category (X), or to categories introduced after interviews, or to categories introduced

after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to

entirely novel categories (name of category in "New Category").

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Figures

This folder includes the figures reported in the paper. The boxplots are generated from the

data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are

produced with Excel, and are also reported in the excel files listed above.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Microsoft (2024). cats_vs_dogs [Dataset]. https://huggingface.co/datasets/microsoft/cats_vs_dogs

cats_vs_dogs

Cats Vs. Dogs

microsoft/cats_vs_dogs

Explore at:

20 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 23, 2024

Dataset authored and provided by

Microsofthttp://microsoft.com/

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for Cats Vs. Dogs

  Dataset Summary

A large set of images of cats and dogs. There are 1738 corrupted images that are dropped. This dataset is part of a now-closed Kaggle competition and represents a subset of the so-called Asirra dataset. From the competition page:

The Asirra data set Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/cats_vs_dogs.

Clear search

Close search

Google apps

Main menu

cats_vs_dogs

Animal_Image_Classification_Dataset

pet-health-symptoms-dataset

Illinois DOC labeled faces dataset

PopularCatClassification

INSPIRE Hunting management areas of Catalonia

Replication Package - How Do Requirements Evolve During Elicitation? An...

cats_vs_dogsSee More Versions

Cats Vs. Dogs

microsoft/cats_vs_dogs

cats_vs_dogs