7 datasets found
  1. cats_vs_dogs

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2024). cats_vs_dogs [Dataset]. https://huggingface.co/datasets/microsoft/cats_vs_dogs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 23, 2024
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for Cats Vs. Dogs

      Dataset Summary
    

    A large set of images of cats and dogs. There are 1738 corrupted images that are dropped. This dataset is part of a now-closed Kaggle competition and represents a subset of the so-called Asirra dataset. From the competition page:

    The Asirra data set Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/cats_vs_dogs.

  2. h

    Animal_Image_Classification_Dataset

    • huggingface.co
    • kaggle.com
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Álvaro García Vásquez (2024). Animal_Image_Classification_Dataset [Dataset]. https://huggingface.co/datasets/AlvaroVasquezAI/Animal_Image_Classification_Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2024
    Authors
    Álvaro García Vásquez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Summary: The Animal Image Classification Dataset is a comprehensive collection of images tailored for the development and evaluation of machine learning models in the field of computer vision. It contains 3,000 JPG images, carefully segmented into three classes representing common pets and wildlife: cats, dogs, and snakes. Dataset Contents: cats/: A set of 1,000 JPG images of cats, showcasing a wide array of breeds, environments, and postures. dogs/: A diverse compilation of 1,000 dog… See the full description on the dataset page: https://huggingface.co/datasets/AlvaroVasquezAI/Animal_Image_Classification_Dataset.

  3. h

    pet-health-symptoms-dataset

    • huggingface.co
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karen Wong (2025). pet-health-symptoms-dataset [Dataset]. https://huggingface.co/datasets/karenwky/pet-health-symptoms-dataset
    Explore at:
    Dataset updated
    Apr 27, 2025
    Authors
    Karen Wong
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Pet Health Symptoms Dataset

      Overview
    

    This dataset contains 2,000 LLM-generated pet health symptoms text samples covering 5 common pet health condition categories, designed to train ML models for automated pet health classification. Each entry is labeled with:

    Pet health condition (1 of 5 distinct classes)
    Record type (Owner Observation or Clinical Notes)

    Owner observations are expressed in everyday language (e.g., "My cat scratches constantly"), whereas clinical… See the full description on the dataset page: https://huggingface.co/datasets/karenwky/pet-health-symptoms-dataset.

  4. Illinois DOC labeled faces dataset

    • kaggle.com
    Updated Dec 6, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David J. Fisher (2019). Illinois DOC labeled faces dataset [Dataset]. https://www.kaggle.com/davidjfisher/illinois-doc-labeled-faces-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    David J. Fisher
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.

    The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.

    It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.

    Here is the readme file:

    ---BEGIN README---
    Scraped from the Illinois DOC.

    https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=
    https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=
    https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=

    paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d ' ' > showside.txt
    paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d ' ' > showfront.txt
    paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d ' ' > inmates_print.txt

    aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt

    Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.
    Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.

    front.7z contains mugshots from the front
    side.7z contains mugshots from the side
    inmates.7z contains all the html files
    csv contains the html files converted to CSV

    The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.

    All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
    Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.

    There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.

    The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."
    Some inmates were marked "Not Available", this has been replaced with "N/A".
    Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".

    The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.

    The "weight" column is often rounded to the nearest 5 lbs.

    Statistics for hair:
    43305 Black
    17371 Brown
    2887 Blonde or Strawberry
    2539 Gray or Partially Gray
    740 Red or Auburn
    624 Bald
    396 Not Available
    209 Salt and Pepper
    70 White
    7 Sandy
    1 Unknown

    Statistics for sex:
    63409 Male
    4740 Female

    Statistics for race:
    37991 Black
    20992 White
    8637 Hispanic
    235 Asian
    104 Amer Indian
    94 Unknown
    92 Bi-Racial
    4

    Statistics for eyes:
    51714 Brown
    7808 Blue
    4259 Hazel
    2469 Green
    1382 Black
    420 Not Available
    87 Gray
    9 Maroon
    1 Unknown
    ---END README---

    Here is a formal summary:

    ---BEGIN SUMMARY---
    Documentation:

    1. Title: Illinois DOC dataset

    2. Source Information
      -- Creators: Illinois DOC
      -- Illinois Department of Corrections
      1301 Concordia Court
      P.O. Box 19277
      Springfield, IL 62794-9277
      (217) 558-2200 x 2008
      -- Donor: Anonymous
      -- Date: 2019

    3. Past Usage:
      -- None

    4. Relevant Information:
      -- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
      -- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv...

  5. h

    PopularCatClassification

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moclauq (2025). PopularCatClassification [Dataset]. https://huggingface.co/datasets/moclauq/PopularCatClassification
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Moclauq
    Description

    moclauq/PopularCatClassification dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. i

    INSPIRE Hunting management areas of Catalonia

    • catalegs.ide.cat
    • data.europa.eu
    Updated Sep 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). INSPIRE Hunting management areas of Catalonia [Dataset]. https://catalegs.ide.cat/geonetwork/inspire/search?topicCat=planningCadastre
    Explore at:
    Dataset updated
    Sep 4, 2023
    Area covered
    Catalonia
    Description

    Limits, and surfaces, of the different hunting management figures of Catalonia, derived from Law 1/1970, and classified as special regime, and of the terrains classified as common us. This data set is conforming to the technical specifications of INSPIRE Area management/restriction/regulation zones and reporting units.

  7. Replication Package - How Do Requirements Evolve During Elicitation? An...

    • zenodo.org
    bin, zip
    Updated Apr 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath (2022). Replication Package - How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis [Dataset]. http://doi.org/10.5281/zenodo.6472498
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Apr 21, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.

    The package contains the following folders and files.

    /R-analysis

    This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:

    - RQ1-1-analyse-story-rates.R: Tabe 1, user story rates

    - RQ1-1-analyse-role-rates.R: Table 1, role rates

    - RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates

    - RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates

    - RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2

    - RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2

    - RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.

    The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.

    - RQ1-1-story-rates.csv: Figure 4

    - RQ1-1-role-rates.csv: Figure 5

    - RQ1-2-categories-phase-1.csv: Figure 8

    - RQ1-2-role-category-phase-1.csv: Figure 9

    - RQ2-1-user-story-and-roles-phase-2.csv: Figure 13

    - RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14

    - RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17

    - IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15

    - IMG-only-RQ2.2-frequent-roles.csv: Figure 18

    NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.

    /Data-Analysis

    This folder contains all the data used to answer the research questions.

    RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.

    RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:

    - Data Source-US-category: for each category of user story, and for each analyst, there are two lines.

    The first one reports the number of user stories in that category for phase 1, and the second one reports the

    number of user stories in that category for phase 2, considering the specific analyst.

    - Data Source-role: for each category of role, and for each analyst, there are two lines.

    The first one reports the number of user stories in that role for phase 1, and the second one reports the

    number of user stories in that role for phase 2, considering the specific analyst.

    - RQ2.1 rates: reports the final rates for RQ2.1.

    NOTE: The other tabs are used to support the computation of the final rates.

    RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:

    - Data Source-US-category: same as RQ2.1.xlsx

    - Data Source-role: same as RQ2.1.xlsx

    - RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14

    - RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17

    - RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18

    NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

    RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.

    A separate tab is used given the complexity of the computations.

    - Data Source-US-category: same as RQ2.1.xlsx

    - Totals: total number of user stories for each analyst in phase 1 and phase 2

    - Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file

    "img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15

    - Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.

    NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

    RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:

    - Data Source-US-category: same as RQ2.1.xlsx

    - Data Source-role: same as RQ2.1.xlsx

    - RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19

    - RQ2-3-most-frequent-categories: most frequent novel categories

    /Raw-Data-Phase-I

    The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:

    - Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.

    - Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

    /Raw-Data-Phaes-II

    The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:

    - Analysis: includes the annotation of the user stories as belonging to existing original

    category (X), or to categories introduced after interviews, or to categories introduced

    after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to

    entirely novel categories (name of category in "New Category").

    - Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

    /Figures

    This folder includes the figures reported in the paper. The boxplots are generated from the

    data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are

    produced with Excel, and are also reported in the excel files listed above.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Microsoft (2024). cats_vs_dogs [Dataset]. https://huggingface.co/datasets/microsoft/cats_vs_dogs
Organization logo

cats_vs_dogs

Cats Vs. Dogs

microsoft/cats_vs_dogs

Explore at:
20 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 23, 2024
Dataset authored and provided by
Microsofthttp://microsoft.com/
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for Cats Vs. Dogs

  Dataset Summary

A large set of images of cats and dogs. There are 1738 corrupted images that are dropped. This dataset is part of a now-closed Kaggle competition and represents a subset of the so-called Asirra dataset. From the competition page:

The Asirra data set Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/cats_vs_dogs.

Search
Clear search
Close search
Google apps
Main menu