Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.
Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.
Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.
Class | Test | Train | Val | % | Original Categories |
bathroom | 5,740 | 51,655 | 200 | 13.4 | bathroom, shower |
bedroom | 11,112 | 100,012 | 600 | 25.9 | bedchamber, bedroom, hotel room, berth, dorm room, youth hostel |
child's room | 4,650 | 41,849 | 300 | 10.8 | child's room, nursery, playroom |
classroom | 3,751 | 33,763 | 200 | 8.7 | classroom, kindergarden classroom |
dressing room | 2,432 | 21,889 | 200 | 5.7 | closet, dressing room |
living room | 9,940 | 89,458 | 500 | 28.7 | home theater, living room, recreation room, television room, waiting room |
studio | 1,404 | 12,633 | 100 | 3.3 | television studio |
swimming pool | 1,505 | 13,547 | 200 | 3.5 | jacuzzi, swimming pool |
Total | 40,534 | 364,806 | 2300 | 100 |
As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.
Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.
Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.
The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.
Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.
These sources were chosen to contrast the Places dataset data from the web with underrepresented data.
The National Software Reference Library (NSRL) collects software from various sources and incorporates file profiles computed from this software into a Reference Data Set (RDS) of information. The RDS can be used by law enforcement, government, and industry organizations to review files on a computer by matching file profiles in the RDS. This alleviates much of the effort involved in determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations. The RDS is a collection of digital signatures of known, traceable software applications. There are application hash values in the hash set which may be considered malicious, i.e. steganography tools and hacking scripts. There are no hash values of illicit data, i.e. child abuse images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Child_Abuse is a dataset for object detection tasks - it contains Abuse annotations for 3,200 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has a collection of various toxic sentences belonging to different categories. It was collected from various sources. It indicates which category each sentence belongs to. The values of the category columns are binary 1 or 0 indicating whether the sentence belongs to that particular category or not. Each sentence belongs to only 1 category.
Columns:
1.comment_text: Contains toxic sentences that are insensitive and offensive, focusing on various categories.
2.mental_health: Binary value 1 indicates that the sentence focuses on mental health.
3.Race:Binary value 1 indicates that the sentence is racist.
4.sex:Binary value 1 indicates that the sentence focuses on sexuality.
5.body_image:Binary value 1 indicates that the sentence focuses on body image.
6.disability:Binary value 1 indicates that the sentence focuses on physical disability and related issues.
7.religion:Binary value 1 indicates that the sentence can be triggering to people who are extremely religious.
8.physical_abuse:Binary value 1 indicates that the sentence focuses on physical abuse issues.
9.politics:Binary value 1 indicates that the sentence focuses on political issues.
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. The general and original hypothesis to be explored in this study was that multi-slice computed tomography (CT) imaging of decedents in whom elder abuse was suspected or reported might enhance the work of the medical examiner by providing novel information not readily available at conventional autopsy and/or by ruling out the need for complete conventional autopsy in cases in which abuse findings were negative, thereby providing: time and cost efficiencies, additional evidentiary support in the form of state-of-the-art images, and, in some cases, compassionate support for families whose religions or cultures required more rapid and/or noninvasive techniques. No quantitative or qualitative data are available with this study. The data are comprised of the images taken from the autopsies of 57 decedents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies a study focused on the development and validation of the Technology-Facilitated Sexual Violence Myth Acceptance Scale (TFSVMA)—a novel psychometric instrument designed to assess the endorsement of myths that justify or trivialize technology-facilitated sexual violence (TFSV). TFSV includes behaviours such as image-based abuse, online sexual harassment, doxing, and other coercive or abusive acts occurring via digital technologies.The dataset includes responses from 433 participants, comprising members of the general public, police personnel, and healthcare/social workers in Australia. This diverse sample allows for the assessment of TFSV myth acceptance across both community and professional populations, with implications for secondary victimization and institutional responses to victims.Data collection and methodology:Participants were recruited online through social media, survey distribution platforms, and targeted professional networks.Data were collected using a 47-item draft version of the TFSVMA scale, developed based on a comprehensive literature review and expert panel feedback.Participants also completed the Acceptance of Modern Myths about Sexual Aggression (AMMSA-21), the Sexual Image-Based Abuse Myth Acceptance Scale (SIAMAS), and the Marlowe-Crowne Social Desirability Scale.The final dataset includes demographic variables and anonymized responses to all scale items.Analytical techniques used include:Exploratory and confirmatory factor analyses (EFA/CFA)Bifactor modellingNetwork analysisReliability analysis (Cronbach’s α, omega coefficients)Convergent and discriminant validity testingDifferential item functioning (DIF) analysesEthical approval for data collection was granted by the University of Jaén Human Research Ethics Committee. All participants provided informed consent prior to participation, and all data were collected anonymously to ensure privacy and confidentiality in accordance with ethical guidelines for human research.This dataset supports the validation of a tool that can inform research, training, and intervention efforts to address TFSV and reduce the normalization of digital sexual violence, especially in professional settings such as law enforcement and healthcare.
Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections.
Please go through the paper https://misinforeview.hks.harvard.edu/article/images-and-misinformation-in-political-groups-evidence-from-whatsapp-in-india/ In this paper you can see how the authors used Machine Learning to tackle to detect misinformation shared during the 2019 Indian elections over WhatsApp.
You may rember about the The Facebook–Cambridge Analytica data breach , how the public data has been used for manipulating the mindset of the voter. So, by sharing this type of fake images change the mindset of the voter and it may cause altering the result of elections. I want to see how kagglers use this dataset to tackle the most important problem of sharing misinformation over social media.
Please upvote this dataset for better reach.
@dataset{julio_c_s_reis_2020_3779157,
author = {Julio C. S. Reis and
Philipe Melo and
Kiran Garimella and
Jussara M. Almeida and
Dean Eckles and
Fabrício Benevenuto},
title = {{A Dataset of Fact-Checked Images Shared on
WhatsApp during the Brazilian and Indian Elections}},
month = apr,
year = 2020,
publisher = {Zenodo},
doi = {10.5281/zenodo.3779157},
url = {https://doi.org/10.5281/zenodo.3779157}
}
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.
Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.
Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.
Class | Test | Train | Val | % | Original Categories |
bathroom | 5,740 | 51,655 | 200 | 13.4 | bathroom, shower |
bedroom | 11,112 | 100,012 | 600 | 25.9 | bedchamber, bedroom, hotel room, berth, dorm room, youth hostel |
child's room | 4,650 | 41,849 | 300 | 10.8 | child's room, nursery, playroom |
classroom | 3,751 | 33,763 | 200 | 8.7 | classroom, kindergarden classroom |
dressing room | 2,432 | 21,889 | 200 | 5.7 | closet, dressing room |
living room | 9,940 | 89,458 | 500 | 28.7 | home theater, living room, recreation room, television room, waiting room |
studio | 1,404 | 12,633 | 100 | 3.3 | television studio |
swimming pool | 1,505 | 13,547 | 200 | 3.5 | jacuzzi, swimming pool |
Total | 40,534 | 364,806 | 2300 | 100 |
As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.
Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.
Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.
The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.
Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.
These sources were chosen to contrast the Places dataset data from the web with underrepresented data.