https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for Cats Vs. Dogs
Dataset Summary
A large set of images of cats and dogs. There are 1738 corrupted images that are dropped. This dataset is part of a now-closed Kaggle competition and represents a subset of the so-called Asirra dataset. From the competition page:
The Asirra data set Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/cats_vs_dogs.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Summary: The Animal Image Classification Dataset is a comprehensive collection of images tailored for the development and evaluation of machine learning models in the field of computer vision. It contains 3,000 JPG images, carefully segmented into three classes representing common pets and wildlife: cats, dogs, and snakes. Dataset Contents: cats/: A set of 1,000 JPG images of cats, showcasing a wide array of breeds, environments, and postures. dogs/: A diverse compilation of 1,000 dog… See the full description on the dataset page: https://huggingface.co/datasets/AlvaroVasquezAI/Animal_Image_Classification_Dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Pet Health Symptoms Dataset
Overview
This dataset contains 2,000 LLM-generated pet health symptoms text samples covering 5 common pet health condition categories, designed to train ML models for automated pet health classification. Each entry is labeled with:
Pet health condition (1 of 5 distinct classes)
Record type (Owner Observation or Clinical Notes)
Owner observations are expressed in everyday language (e.g., "My cat scratches constantly"), whereas clinical… See the full description on the dataset page: https://huggingface.co/datasets/karenwky/pet-health-symptoms-dataset.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.
The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.
It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.
Here is the readme file:
---BEGIN README---
Scraped from the Illinois DOC.
https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '
' > showside.txt
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '
' > showfront.txt
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d '
' > inmates_print.txt
aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt
Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.
Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.
front.7z contains mugshots from the front
side.7z contains mugshots from the side
inmates.7z contains all the html files
csv contains the html files converted to CSV
The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.
All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.
There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.
The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."
Some inmates were marked "Not Available", this has been replaced with "N/A".
Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".
The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.
The "weight" column is often rounded to the nearest 5 lbs.
Statistics for hair:
43305 Black
17371 Brown
2887 Blonde or Strawberry
2539 Gray or Partially Gray
740 Red or Auburn
624 Bald
396 Not Available
209 Salt and Pepper
70 White
7 Sandy
1 Unknown
Statistics for sex:
63409 Male
4740 Female
Statistics for race:
37991 Black
20992 White
8637 Hispanic
235 Asian
104 Amer Indian
94 Unknown
92 Bi-Racial
4
Statistics for eyes:
51714 Brown
7808 Blue
4259 Hazel
2469 Green
1382 Black
420 Not Available
87 Gray
9 Maroon
1 Unknown
---END README---
Here is a formal summary:
---BEGIN SUMMARY---
Documentation:
Title: Illinois DOC dataset
Source Information
-- Creators: Illinois DOC
-- Illinois Department of Corrections
1301 Concordia Court
P.O. Box 19277
Springfield, IL 62794-9277
(217) 558-2200 x 2008
-- Donor: Anonymous
-- Date: 2019
Past Usage:
-- None
Relevant Information:
-- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
-- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv...
moclauq/PopularCatClassification dataset hosted on Hugging Face and contributed by the HF Datasets community
Limits, and surfaces, of the different hunting management figures of Catalonia, derived from Law 1/1970, and classified as special regime, and of the terrains classified as common us. This data set is conforming to the technical specifications of INSPIRE Area management/restriction/regulation zones and reporting units.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.
The package contains the following folders and files.
/R-analysis
This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:
- RQ1-1-analyse-story-rates.R: Tabe 1, user story rates
- RQ1-1-analyse-role-rates.R: Table 1, role rates
- RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates
- RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates
- RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2
- RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2
- RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.
The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.
- RQ1-1-story-rates.csv: Figure 4
- RQ1-1-role-rates.csv: Figure 5
- RQ1-2-categories-phase-1.csv: Figure 8
- RQ1-2-role-category-phase-1.csv: Figure 9
- RQ2-1-user-story-and-roles-phase-2.csv: Figure 13
- RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14
- RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17
- IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15
- IMG-only-RQ2.2-frequent-roles.csv: Figure 18
NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.
/Data-Analysis
This folder contains all the data used to answer the research questions.
RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.
RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: for each category of user story, and for each analyst, there are two lines.
The first one reports the number of user stories in that category for phase 1, and the second one reports the
number of user stories in that category for phase 2, considering the specific analyst.
- Data Source-role: for each category of role, and for each analyst, there are two lines.
The first one reports the number of user stories in that role for phase 1, and the second one reports the
number of user stories in that role for phase 2, considering the specific analyst.
- RQ2.1 rates: reports the final rates for RQ2.1.
NOTE: The other tabs are used to support the computation of the final rates.
RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: same as RQ2.1.xlsx
- Data Source-role: same as RQ2.1.xlsx
- RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14
- RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17
- RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18
NOTE: the other tabs are used to support the computation of the values reported in the tabs above.
RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.
A separate tab is used given the complexity of the computations.
- Data Source-US-category: same as RQ2.1.xlsx
- Totals: total number of user stories for each analyst in phase 1 and phase 2
- Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file
"img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15
- Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.
NOTE: the other tabs are used to support the computation of the values reported in the tabs above.
RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: same as RQ2.1.xlsx
- Data Source-role: same as RQ2.1.xlsx
- RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19
- RQ2-3-most-frequent-categories: most frequent novel categories
/Raw-Data-Phase-I
The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:
- Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.
- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.
/Raw-Data-Phaes-II
The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:
- Analysis: includes the annotation of the user stories as belonging to existing original
category (X), or to categories introduced after interviews, or to categories introduced
after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to
entirely novel categories (name of category in "New Category").
- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.
/Figures
This folder includes the figures reported in the paper. The boxplots are generated from the
data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are
produced with Excel, and are also reported in the excel files listed above.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for Cats Vs. Dogs
Dataset Summary
A large set of images of cats and dogs. There are 1738 corrupted images that are dropped. This dataset is part of a now-closed Kaggle competition and represents a subset of the so-called Asirra dataset. From the competition page:
The Asirra data set Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/cats_vs_dogs.