Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Gender Classification is a dataset for object detection tasks - it contains Gender annotations for 1,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Gender Classification Yolov8 is a dataset for object detection tasks - it contains Male annotations for 1,233 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains audio recordings of 12 different accents across the UK: Northern Ireland (NI), Scotland, Wales (SW), North East England (NE), North West England (NW), Yorkshire and Humber (YAH), East Midlands (EM), West Midlands (WM), East of England (EE), Greater London (GL), South East England (SE), South West England (SW). We split the data into a Male: Female ratio of 1:1, this is labelled with either '_M' for male or '_F' for female within the dataset. The audio dataset was compiled using opensource YouTube videos and it a collation of different accents, the audio files were trimmed for uniformity. The Audio files are of length 30 seconds, with the first 5 seconds and last 5 seconds of the signal being blank. We also resample the audio signals at 8 kHz, again for uniformity and to remove any noise present in the audio signals whilst retaining the underlying characteristics. The intended application of this dataset was to be used in conjunction with a deep neural network for accent and gender classification tasks.
The dataset also contains an unseen dataset of the Google opensource digit dataset, which contains audio files of the digits 1-9. This is included to test any models developed using the original dataset to confirm model performance to data variations.
This dataset was created by Akshit Madan
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to, and bias from the gender of the speaker. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information. In addition, we collect a novel, crowdsourced evaluation benchmark of utterance-level gender rewrites. Distinguishing between gender bias along multiple dimensions is important, as it enables us to train finer-grained gender bias classifiers. We show our classifiers prove valuable for a variety of important applications, such as controlling for gender bias in generative models, detecting gender bias in arbitrary text, and shed light on offensive language in terms of genderedness.
I do a lot of work with image data sets. Often it is necessary to partition the images into male and female data sets. Doing this by hand can be a long and tedious task particularly on large data sets. So I decided to create a classifier that could do the task for me.
I used the CELEBA aligned data set to provide the images. I went through and separated the images visually into 1747 female and 1747 male training images. I also created 100 male and 100 female test image and 100 male, 100 female validation images. I want to only the face to be in the image so I developed an image cropping function using MTCNN to crop all the images. That function is included as one of the notebooks should anyone have a need for a good face cropping function. I also created an image duplicate detector to try to eliminate any of the training images from appearing in the test or validation images. I have developed a general purpose image classification function that works very well for most image classification tasks. It contains the option to select 1 of 7 models for use. For this application I used the MobileNet model because it is less computationally expensive and gives excellent results. On the test set accuracy is near 100%.
The CELEBA aligned data set was used. This data set is very large and of good quality. To crop the images to only include the face I developed a face cropping function using MTCNN. MTCNN is a very accurate program and is reasonably fast, however it is notflawless so after cropping the iages you shouldalways visually inspect the results.
I developed this data set to train a classifier to be able to distinguish the gender shown in an image. Why bother you may ask I can just look at the image and tell. True but lets say you have a data set of 50,000 images that you want to separate it into male and female data sets. Doing that by hand would take forever. With the trained classifier with near 100% accuracy you can use the classifier with model.predict to do the job for you.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classification effectiveness indicators across machine learning algorithms: Gender by age with temperament features.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Gender Classified Dataset with Masked Face â a versatile resource for AI enthusiasts. It combines FFHQ's high-quality images with Google-scraped pictures, enabling gender classification and facial recognition research, even in mask-wearing scenarios..
The HHD_gender dataset contains 819 handwritten forms written by volunteers of
different educational backgrounds and ages (as young as 11 years old and as old as late
60s), both native and non-native Hebrew speakers.
There are 50 variations of the forms; each form contains a text paragraph with
62 words on average.
For the experiments, the HHD gender dataset was randomly subdivided into training (80%), validation (10%), and test
(10%) sets.
---------------------------------------------------------------------------------------------------------------
This database may be used for non-commercial research purpose only.
If you publish material based on this database, we request you to include a reference to the following papers:
[1] I. Rabaev, B. Kurar Barakat, A. Churkin and J. El-Sana. The HHD Dataset.
The 17th International Conference on Frontiers in Handwriting Recognition, pp. 228-233, 2020,
DOI: 10.1109/ICFHR2020.2020.00050
[2] I. Rabaev, M. Litvak, S. Asulin and O.H. Tabibi. Automatic Gender Classification from Handwritten Images: a Case Study, the 19th International Conference on Computer Analysis of Images and Patterns, 2021.
This dataset was created by alfredhhw
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Government Services Group (GSG): Workforce report of PS Act Employees in GSG by Stream, Appointment Type and Gender as at December 2014. GSG previously a business unit of the Department of Premier and Cabinet.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This data frame presents the results of a quantitative content analysis of the occurrence of genderâbased concepts, themes, issues, and solutions within largeâscale political and sociological survey questionnaires fielded crossânationally in Europe and in six European countries: Denmark, Germany, Hungary, Switzerland and the UK, spanning 2000â2023. Data was collected by teams from each country between September 2023âJanuary 2024. Teams collected questions in the original language and provided a translation into English. Analysis was conducted using the translated text. The unit of analysis (âCODING_UNIT_TEXTâ) was the individual 'genderârelated argument' within a survey question. This could be the entire survey question, a subâquestion (in the case of matrix questions), or a singular response option (for multiple choice questions). Coding units were coded in three key domains:(1) Gender concepts, (2) Themes/issues, and (3) Solutions. Up to two Themes/Issues and Solutions could be coded per coding unit. Several coding categories within the Themes/Issues and Solutions domains function hierarchically, where a coder first assigned a higherâlevel category and then as many subcategories as applicable. For example, a question concerning governmentâfunded childcare is coded as B1_Economy â> B1_4_LabourMarket â> B1_4_1_CareWork â> B1_4_1_3_Childcare. The corresponding codebook presents the unitisation process and coding categories in full detail.
Attribution-NonCommercial 2.5 (CC BY-NC 2.5)https://creativecommons.org/licenses/by-nc/2.5/
License information was derived automatically
What is STraDa?
STraDa is a dataset that was presented at the late breaking demo session of ISMIR 2023. The detailed description of the dataset is in README.md.
STraDa is large-scale music audio dataset that contains singers' metadata, tracks' metadata, IDs for downloading audios of 30s (preview parts) by using Deezer API. This dataset could be used for various MIR tasks, such as singer identification, singer recognition, singer gender/age detection, genre classification, language classification. The training set contains 25194 excerpts of 30s, and 5264 singers. The testing set contains 200 songs from 200 singers that are balanced across two genders, 5 languages and 4 age groups (5 song/gender/language/age group), that could be used for bias analysis.
What does STraDa contain?
An important feature of STraDa is that each track only has a single lead singer, which improves the accuracy of annotations.
The annotations in the training set are gathered and cross-validated from 4 different data sources: Deezer, Wikidata, musicbrainz, discogs.
The testing set is curated and annotated manually to ensure perfect accuracy.
Singers' metadata contains gender, birth year and active country. Tracks' metadata contains genre, language and release date.
What could STraDa be used for?
STraDa could be used for singer identification, singer recognition, singer gender/age detection, song genre/language identification. The balance in the testing set could enable bias analysis.
Dataset use
This dataset is only available for conducting non-commercial research related to audio analysis under license Creative Commons Attribution Non Commercial 2.5 Generic. It's important to note that data under this license are data contained in STraDa, not applicable to audios. We do NOT grant permission for any modification, generation or manipulation using these audios.
We wholeheartedly welcome researchers to use STraDa for their own research purpose. Please send an email to ykong@deezer.com if you have any questions about the data.
Citation
If you use STraDa, please cite following paper:
@inproceedings{kong2024stradasingertraitsdataset, title={STraDa: A Singer Traits Dataset}, author={Yuexuan Kong and Viet-Anh Tran and Romain Hennequin}, booktitle={Interspeech 2024}, year={2024} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Compared with Table 2, identifying gender was more difficult than identifying age. Model with image objects yielded greater performance than those with tags. For combined features, we used TF-IDF and Word2Vec, yielding an F1 score of 0.74.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains audio recordings of 12 different accents across the UK: Northern Ireland, Scotland, Wales, North East England, North West England, Yorkshire and Humber, East Midlands, West Midlands, East of England, Greater London, South East England, South West England. We split the data into a Male: Female ratio of 1:1. The audio dataset was compiled using opensource YouTube videos and it a collation of different accents, the audio files were trimmed for uniformity. The Audio files are of length 30 seconds, with the first 5 seconds and last 5 seconds of the signal being blank. We also resample the audio signals at 8 kHz, again for uniformity and to remove any noise present in the audio signals whilst retaining the underlying characteristics. The intended application of this dataset was to be used in conjunction with a deep neural network for accent and gender classification tasks.
This dataset was recorded for an experimentation looking into applying machine learning techniques for the task of classifying song preference amongst generation Z (18 to 24 years) participants. We define a labelling system corresponding to specific songs with 5 ratings: hate, dislike, neutral, like and love. The songs used for this experiment were chosen due their success for various awards, such as the BRIT awards (BRIT), Mercury Prize (MERC), Rolling Stone most influential albums (ROLS). They are as shown:
S1: One Kiss by Calvin Harris and Dua Lipa (BRIT)
S2: Don't Delete the Kisses by Wolf Alice MERC)
S3: Money by Pink Floyd (ROLS)
S4: Shotgun by George Ezra (BRIT)
S5: Location by Dave (MERC)
S6: Smells Like Teen Spirit by Nirvana (ROLS)
S7: God's Plan by Drake (BRIT)
S8: Breezeblocks by alt-J (MERC)
S9: Lucy In The Sky With Diamonds by The Beatles (ROLS)
S10: Thank U, Next by Ariana Grande (BRIT)
S11: Shutdown by Skepta (MERC)
S12: Billie Jean by Micheal Jackson (ROLS)
A Unicorn Hybrid Black was used for recording the EEG data from the participants whilst they were played the control songs listed above. For each of the 12 total song played to a participant during the experiment, there were 8 EEG lead recordings measured of length 20 seconds, with the first 5 seconds and the last 5 seconds being blank for control purposes. The EEG signals were sampled at 250 Hz by the Unicorn Hybrid Black devices, which also filtered the signals to be between 2Hz to 30 Hz in order to remove any noise recorded during the experimentation. There are approximately 5000 data points per reading of a given song, with there being 12 songs played to a total of 10 participants.
The dataset is used for age and gender classification from ear images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used for the experiments presented in the article "Reconstructive Classification for Age and Gender Identification in Social Networks" - IEEE Transactions on Computational Social Systems
The dataset contains text data from 548,761 pins corresponding to 264 users of Pinterest.
There are 7 files.
The first 5 files correspond to the extracted textual features from the pins that are aggregated per user: ats, emojis/emoticons, hashtags, links, and words.
There are 264 lines in each file (one per user), as the concatenation of the extracted features from all the pins corresponding to each user.
The last 2 files are the labels for the age and gender of the users. There are also 264 lines (one per user).
For age, there are 4 possible labels: 18-24, 25-34, 35-46, and 50+
For gender, there are 2 possible labels: F and M
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In general, models from image objects showed greater performance than those from tags. For combined features, we used TF-IDF and Word2Vec, and the model performance increased when the combined features were used. 0.88 is the highest performance.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Government Services Group (GSG) Workforce Report: PS Act Employees in GSG Classified ASO6 & Above by Gender as at December 2014.
Government Services Group (GSG) previous section/business unit of the Department of the Premier and Cabinet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Gender Classification is a dataset for object detection tasks - it contains Gender annotations for 1,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).