7 datasets found

Data from: Leveraging Self-Supervised Learning for Scene Classification in...

zenodo.org

csv, zip

Updated Oct 31, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Pedro H. V. Valois; João Macedo; Leo Sampaio Ferraz Ribeiro; Leo Sampaio Ferraz Ribeiro; Jefersson dos Santos; Jefersson dos Santos; Sandra Avila; Sandra Avila; Pedro H. V. Valois; João Macedo (2024). Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery [Dataset]. http://doi.org/10.5281/zenodo.13910526

Explore at:

zip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13910526

Dataset updated

Oct 31, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Pedro H. V. Valois; João Macedo; Leo Sampaio Ferraz Ribeiro; Leo Sampaio Ferraz Ribeiro; Jefersson dos Santos; Jefersson dos Santos; Sandra Avila; Sandra Avila; Pedro H. V. Valois; João Macedo

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Oct 10, 2024

Description

Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.

Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.

Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.

Class	Test	Train	Val	%	Original Categories
bathroom	5,740	51,655	200	13.4	bathroom, shower
bedroom	11,112	100,012	600	25.9	bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
child's room	4,650	41,849	300	10.8	child's room, nursery, playroom
classroom	3,751	33,763	200	8.7	classroom, kindergarden classroom
dressing room	2,432	21,889	200	5.7	closet, dressing room
living room	9,940	89,458	500	28.7	home theater, living room, recreation room, television room, waiting room
studio	1,404	12,633	100	3.3	television studio
swimming pool	1,505	13,547	200	3.5	jacuzzi, swimming pool
Total	40,534	364,806	2300	100

As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.

Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.

Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.

The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.

Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.

These sources were chosen to contrast the Places dataset data from the web with underrepresented data.

National Software Reference Library (NSRL) Reference Data Set (RDS) - NIST...
datasets.ai
datadiscoverystudio.org
+4more
0
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). National Software Reference Library (NSRL) Reference Data Set (RDS) - NIST Special Database 28 [Dataset]. https://datasets.ai/datasets/national-software-reference-library-nsrl-reference-data-set-rds-nist-special-database-28-72db0
Explore at:
0Available download formats
Dataset updated
Aug 6, 2024
Dataset authored and provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The National Software Reference Library (NSRL) collects software from various sources and incorporates file profiles computed from this software into a Reference Data Set (RDS) of information. The RDS can be used by law enforcement, government, and industry organizations to review files on a computer by matching file profiles in the RDS. This alleviates much of the effort involved in determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations. The RDS is a collection of digital signatures of known, traceable software applications. There are application hash values in the hash set which may be considered malicious, i.e. steganography tools and hacking scripts. There are no hash values of illicit data, i.e. child abuse images.
R
Child_abuse Dataset
universe.roboflow.com
zip
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CBAIT (2024). Child_abuse Dataset [Dataset]. https://universe.roboflow.com/cbait/child_abuse-jlhtk/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Feb 22, 2024
Dataset authored and provided by
CBAIT
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Abuse Bounding Boxes
Description
Child_Abuse

## Overview Child_Abuse is a dataset for object detection tasks - it contains Abuse annotations for 3,200 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
z
Toxic Sentence Classification Dataset with labels of categories such as...
zenodo.org
csv, txt
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar; Bhaskarjyoti Das; Bhaskarjyoti Das; Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar (2024). Toxic Sentence Classification Dataset with labels of categories such as religion, mental health, race, sex, body image, disability, physical abuse, and politics [Dataset]. http://doi.org/10.5281/zenodo.14196419
Explore at:
csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14196419
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodo
Authors
Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar; Bhaskarjyoti Das; Bhaskarjyoti Das; Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset has a collection of various toxic sentences belonging to different categories. It was collected from various sources. It indicates which category each sentence belongs to. The values of the category columns are binary 1 or 0 indicating whether the sentence belongs to that particular category or not. Each sentence belongs to only 1 category.

Columns:
1.comment_text: Contains toxic sentences that are insensitive and offensive, focusing on various categories.
2.mental_health: Binary value 1 indicates that the sentence focuses on mental health.
3.Race:Binary value 1 indicates that the sentence is racist.
4.sex:Binary value 1 indicates that the sentence focuses on sexuality.
5.body_image:Binary value 1 indicates that the sentence focuses on body image.
6.disability:Binary value 1 indicates that the sentence focuses on physical disability and related issues.
7.religion:Binary value 1 indicates that the sentence can be triggering to people who are extremely religious.
8.physical_abuse:Binary value 1 indicates that the sentence focuses on physical abuse issues.
9.politics:Binary value 1 indicates that the sentence focuses on political issues.
d
Data from: Utility of Whole-Body CT Imaging in the Post Mortem Detection of...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Utility of Whole-Body CT Imaging in the Post Mortem Detection of Elder Abuse and Neglect in Maryland, 2007 [Dataset]. https://catalog.data.gov/dataset/utility-of-whole-body-ct-imaging-in-the-post-mortem-detection-of-elder-abuse-and-neglect-i-f7509
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. The general and original hypothesis to be explored in this study was that multi-slice computed tomography (CT) imaging of decedents in whom elder abuse was suspected or reported might enhance the work of the medical examiner by providing novel information not readily available at conventional autopsy and/or by ruling out the need for complete conventional autopsy in cases in which abuse findings were negative, thereby providing: time and cost efficiencies, additional evidentiary support in the form of state-of-the-art images, and, in some cases, compassionate support for families whose religions or cultures required more rapid and/or noninvasive techniques. No quantitative or qualitative data are available with this study. The data are comprised of the images taken from the autopsies of 57 decedents.
Technology-Facilitated Sexual Violence Myth Acceptance Validation Dataset
figshare.com
bin
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kate Gray (2025). Technology-Facilitated Sexual Violence Myth Acceptance Validation Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28839650.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28839650.v1
Dataset updated
Apr 23, 2025
Dataset provided by
figshare
Authors
Kate Gray
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset accompanies a study focused on the development and validation of the Technology-Facilitated Sexual Violence Myth Acceptance Scale (TFSVMA)—a novel psychometric instrument designed to assess the endorsement of myths that justify or trivialize technology-facilitated sexual violence (TFSV). TFSV includes behaviours such as image-based abuse, online sexual harassment, doxing, and other coercive or abusive acts occurring via digital technologies.The dataset includes responses from 433 participants, comprising members of the general public, police personnel, and healthcare/social workers in Australia. This diverse sample allows for the assessment of TFSV myth acceptance across both community and professional populations, with implications for secondary victimization and institutional responses to victims.Data collection and methodology:Participants were recruited online through social media, survey distribution platforms, and targeted professional networks.Data were collected using a 47-item draft version of the TFSVMA scale, developed based on a comprehensive literature review and expert panel feedback.Participants also completed the Acceptance of Modern Myths about Sexual Aggression (AMMSA-21), the Sexual Image-Based Abuse Myth Acceptance Scale (SIAMAS), and the Marlowe-Crowne Social Desirability Scale.The final dataset includes demographic variables and anonymized responses to all scale items.Analytical techniques used include:Exploratory and confirmatory factor analyses (EFA/CFA)Bifactor modellingNetwork analysisReliability analysis (Cronbach’s α, omega coefficients)Convergent and discriminant validity testingDifferential item functioning (DIF) analysesEthical approval for data collection was granted by the University of Jaén Human Research Ethics Committee. All participants provided informed consent prior to participation, and all data were collected anonymously to ensure privacy and confidentiality in accordance with ethical guidelines for human research.This dataset supports the validation of a tool that can inform research, training, and intervention efforts to address TFSV and reduce the normalization of digital sexual violence, especially in professional settings such as law enforcement and healthcare.
Fact-Checked Images Shared During Elections(IND)
kaggle.com
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinav Prakash (2020). Fact-Checked Images Shared During Elections(IND) [Dataset]. https://www.kaggle.com/abhinavsp0730/factchecked-images-shared-during-electionsind/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 9, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abhinav Prakash
Description
A Dataset of Fact-Checked Images Shared on WhatsApp during the Indian Elections

Description

Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections.

Paper

Please go through the paper https://misinforeview.hks.harvard.edu/article/images-and-misinformation-in-political-groups-evidence-from-whatsapp-in-india/ In this paper you can see how the authors used Machine Learning to tackle to detect misinformation shared during the 2019 Indian elections over WhatsApp.

Motive

You may rember about the The Facebook–Cambridge Analytica data breach , how the public data has been used for manipulating the mindset of the voter. So, by sharing this type of fake images change the mindset of the voter and it may cause altering the result of elections. I want to see how kagglers use this dataset to tackle the most important problem of sharing misinformation over social media.

Upvote

Please upvote this dataset for better reach.

Citation

@dataset{julio_c_s_reis_2020_3779157, author = {Julio C. S. Reis and Philipe Melo and Kiran Garimella and Jussara M. Almeida and Dean Eckles and Fabrício Benevenuto}, title = {{A Dataset of Fact-Checked Images Shared on WhatsApp during the Brazilian and Indian Elections}}, month = apr, year = 2020, publisher = {Zenodo}, doi = {10.5281/zenodo.3779157}, url = {https://doi.org/10.5281/zenodo.3779157} }
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data from: Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery

Explore at:

zip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13910526

Dataset updated

Oct 31, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Pedro H. V. Valois; João Macedo; Leo Sampaio Ferraz Ribeiro; Leo Sampaio Ferraz Ribeiro; Jefersson dos Santos; Jefersson dos Santos; Sandra Avila; Sandra Avila; Pedro H. V. Valois; João Macedo

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Oct 10, 2024

Description

Class	Test	Train	Val	%	Original Categories
bathroom	5,740	51,655	200	13.4	bathroom, shower
bedroom	11,112	100,012	600	25.9	bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
child's room	4,650	41,849	300	10.8	child's room, nursery, playroom
classroom	3,751	33,763	200	8.7	classroom, kindergarden classroom
dressing room	2,432	21,889	200	5.7	closet, dressing room
living room	9,940	89,458	500	28.7	home theater, living room, recreation room, television room, waiting room
studio	1,404	12,633	100	3.3	television studio
swimming pool	1,505	13,547	200	3.5	jacuzzi, swimming pool
Total	40,534	364,806	2300	100

These sources were chosen to contrast the Places dataset data from the web with underrepresented data.

Clear search

Close search

Google apps

Main menu

Data from: Leveraging Self-Supervised Learning for Scene Classification in...

National Software Reference Library (NSRL) Reference Data Set (RDS) - NIST...

Child_abuse Dataset

Child_Abuse

Toxic Sentence Classification Dataset with labels of categories such as...

Data from: Utility of Whole-Body CT Imaging in the Post Mortem Detection of...

Technology-Facilitated Sexual Violence Myth Acceptance Validation Dataset

Fact-Checked Images Shared During Elections(IND)

A Dataset of Fact-Checked Images Shared on WhatsApp during the Indian Elections

Description

Paper

Motive

Upvote

Citation

Data from: Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery