7 datasets found
  1. Data from: Leveraging Self-Supervised Learning for Scene Classification in...

    • zenodo.org
    csv, zip
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro H. V. Valois; João Macedo; Leo Sampaio Ferraz Ribeiro; Leo Sampaio Ferraz Ribeiro; Jefersson dos Santos; Jefersson dos Santos; Sandra Avila; Sandra Avila; Pedro H. V. Valois; João Macedo (2024). Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery [Dataset]. http://doi.org/10.5281/zenodo.13910526
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pedro H. V. Valois; João Macedo; Leo Sampaio Ferraz Ribeiro; Leo Sampaio Ferraz Ribeiro; Jefersson dos Santos; Jefersson dos Santos; Sandra Avila; Sandra Avila; Pedro H. V. Valois; João Macedo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 10, 2024
    Description

    Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.

    Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.

    Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.

    ClassTestTrainVal%Original Categories
    bathroom5,74051,65520013.4bathroom, shower
    bedroom11,112100,01260025.9bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
    child's room4,65041,84930010.8child's room, nursery, playroom
    classroom3,75133,7632008.7classroom, kindergarden classroom
    dressing room2,43221,8892005.7closet, dressing room
    living room9,94089,45850028.7home theater, living room, recreation room, television room, waiting room
    studio1,40412,6331003.3television studio
    swimming pool1,50513,5472003.5jacuzzi, swimming pool
    Total40,534364,8062300100

    As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.

    Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.

    Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.

    The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.

    Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.

    These sources were chosen to contrast the Places dataset data from the web with underrepresented data.

  2. National Software Reference Library (NSRL) Reference Data Set (RDS) - NIST...

    • datasets.ai
    • datadiscoverystudio.org
    • +4more
    0
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). National Software Reference Library (NSRL) Reference Data Set (RDS) - NIST Special Database 28 [Dataset]. https://datasets.ai/datasets/national-software-reference-library-nsrl-reference-data-set-rds-nist-special-database-28-72db0
    Explore at:
    0Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The National Software Reference Library (NSRL) collects software from various sources and incorporates file profiles computed from this software into a Reference Data Set (RDS) of information. The RDS can be used by law enforcement, government, and industry organizations to review files on a computer by matching file profiles in the RDS. This alleviates much of the effort involved in determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations. The RDS is a collection of digital signatures of known, traceable software applications. There are application hash values in the hash set which may be considered malicious, i.e. steganography tools and hacking scripts. There are no hash values of illicit data, i.e. child abuse images.

  3. R

    Child_abuse Dataset

    • universe.roboflow.com
    zip
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CBAIT (2024). Child_abuse Dataset [Dataset]. https://universe.roboflow.com/cbait/child_abuse-jlhtk/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset authored and provided by
    CBAIT
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Abuse Bounding Boxes
    Description

    Child_Abuse

    ## Overview
    
    Child_Abuse is a dataset for object detection tasks - it contains Abuse annotations for 3,200 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  4. z

    Toxic Sentence Classification Dataset with labels of categories such as...

    • zenodo.org
    csv, txt
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar; Bhaskarjyoti Das; Bhaskarjyoti Das; Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar (2024). Toxic Sentence Classification Dataset with labels of categories such as religion, mental health, race, sex, body image, disability, physical abuse, and politics [Dataset]. http://doi.org/10.5281/zenodo.14196419
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodo
    Authors
    Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar; Bhaskarjyoti Das; Bhaskarjyoti Das; Taha Roshan Mohammed; Pradhan Moksha; Mallapalle Babitha Meenakshi; Methuku Paramesh Kumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset has a collection of various toxic sentences belonging to different categories. It was collected from various sources. It indicates which category each sentence belongs to. The values of the category columns are binary 1 or 0 indicating whether the sentence belongs to that particular category or not. Each sentence belongs to only 1 category.

    Columns:
    1.comment_text: Contains toxic sentences that are insensitive and offensive, focusing on various categories.
    2.mental_health: Binary value 1 indicates that the sentence focuses on mental health.
    3.Race:Binary value 1 indicates that the sentence is racist.
    4.sex:Binary value 1 indicates that the sentence focuses on sexuality.
    5.body_image:Binary value 1 indicates that the sentence focuses on body image.
    6.disability:Binary value 1 indicates that the sentence focuses on physical disability and related issues.
    7.religion:Binary value 1 indicates that the sentence can be triggering to people who are extremely religious.
    8.physical_abuse:Binary value 1 indicates that the sentence focuses on physical abuse issues.
    9.politics:Binary value 1 indicates that the sentence focuses on political issues.

  5. d

    Data from: Utility of Whole-Body CT Imaging in the Post Mortem Detection of...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Utility of Whole-Body CT Imaging in the Post Mortem Detection of Elder Abuse and Neglect in Maryland, 2007 [Dataset]. https://catalog.data.gov/dataset/utility-of-whole-body-ct-imaging-in-the-post-mortem-detection-of-elder-abuse-and-neglect-i-f7509
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Description

    These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. The general and original hypothesis to be explored in this study was that multi-slice computed tomography (CT) imaging of decedents in whom elder abuse was suspected or reported might enhance the work of the medical examiner by providing novel information not readily available at conventional autopsy and/or by ruling out the need for complete conventional autopsy in cases in which abuse findings were negative, thereby providing: time and cost efficiencies, additional evidentiary support in the form of state-of-the-art images, and, in some cases, compassionate support for families whose religions or cultures required more rapid and/or noninvasive techniques. No quantitative or qualitative data are available with this study. The data are comprised of the images taken from the autopsies of 57 decedents.

  6. Technology-Facilitated Sexual Violence Myth Acceptance Validation Dataset

    • figshare.com
    bin
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Gray (2025). Technology-Facilitated Sexual Violence Myth Acceptance Validation Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28839650.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    figshare
    Authors
    Kate Gray
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies a study focused on the development and validation of the Technology-Facilitated Sexual Violence Myth Acceptance Scale (TFSVMA)—a novel psychometric instrument designed to assess the endorsement of myths that justify or trivialize technology-facilitated sexual violence (TFSV). TFSV includes behaviours such as image-based abuse, online sexual harassment, doxing, and other coercive or abusive acts occurring via digital technologies.The dataset includes responses from 433 participants, comprising members of the general public, police personnel, and healthcare/social workers in Australia. This diverse sample allows for the assessment of TFSV myth acceptance across both community and professional populations, with implications for secondary victimization and institutional responses to victims.Data collection and methodology:Participants were recruited online through social media, survey distribution platforms, and targeted professional networks.Data were collected using a 47-item draft version of the TFSVMA scale, developed based on a comprehensive literature review and expert panel feedback.Participants also completed the Acceptance of Modern Myths about Sexual Aggression (AMMSA-21), the Sexual Image-Based Abuse Myth Acceptance Scale (SIAMAS), and the Marlowe-Crowne Social Desirability Scale.The final dataset includes demographic variables and anonymized responses to all scale items.Analytical techniques used include:Exploratory and confirmatory factor analyses (EFA/CFA)Bifactor modellingNetwork analysisReliability analysis (Cronbach’s α, omega coefficients)Convergent and discriminant validity testingDifferential item functioning (DIF) analysesEthical approval for data collection was granted by the University of Jaén Human Research Ethics Committee. All participants provided informed consent prior to participation, and all data were collected anonymously to ensure privacy and confidentiality in accordance with ethical guidelines for human research.This dataset supports the validation of a tool that can inform research, training, and intervention efforts to address TFSV and reduce the normalization of digital sexual violence, especially in professional settings such as law enforcement and healthcare.

  7. Fact-Checked Images Shared During Elections(IND)

    • kaggle.com
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinav Prakash (2020). Fact-Checked Images Shared During Elections(IND) [Dataset]. https://www.kaggle.com/abhinavsp0730/factchecked-images-shared-during-electionsind/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abhinav Prakash
    Description

    A Dataset of Fact-Checked Images Shared on WhatsApp during the Indian Elections

    Description

    Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections.

    Paper

    Please go through the paper https://misinforeview.hks.harvard.edu/article/images-and-misinformation-in-political-groups-evidence-from-whatsapp-in-india/ In this paper you can see how the authors used Machine Learning to tackle to detect misinformation shared during the 2019 Indian elections over WhatsApp.

    Motive

    You may rember about the The Facebook–Cambridge Analytica data breach , how the public data has been used for manipulating the mindset of the voter. So, by sharing this type of fake images change the mindset of the voter and it may cause altering the result of elections. I want to see how kagglers use this dataset to tackle the most important problem of sharing misinformation over social media.

    Upvote

    Please upvote this dataset for better reach.

    Citation

    @dataset{julio_c_s_reis_2020_3779157,
     author    = {Julio C. S. Reis and
             Philipe Melo and
             Kiran Garimella and
             Jussara M. Almeida and
             Dean Eckles and
             Fabrício Benevenuto},
     title    = {{A Dataset of Fact-Checked Images Shared on 
              WhatsApp during the Brazilian and Indian Elections}},
     month    = apr,
     year     = 2020,
     publisher  = {Zenodo},
     doi     = {10.5281/zenodo.3779157},
     url     = {https://doi.org/10.5281/zenodo.3779157}
    }
    
  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pedro H. V. Valois; João Macedo; Leo Sampaio Ferraz Ribeiro; Leo Sampaio Ferraz Ribeiro; Jefersson dos Santos; Jefersson dos Santos; Sandra Avila; Sandra Avila; Pedro H. V. Valois; João Macedo (2024). Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery [Dataset]. http://doi.org/10.5281/zenodo.13910526
Organization logo

Data from: Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery

Related Article
Explore at:
zip, csvAvailable download formats
Dataset updated
Oct 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pedro H. V. Valois; João Macedo; Leo Sampaio Ferraz Ribeiro; Leo Sampaio Ferraz Ribeiro; Jefersson dos Santos; Jefersson dos Santos; Sandra Avila; Sandra Avila; Pedro H. V. Valois; João Macedo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Oct 10, 2024
Description

Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.

Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.

Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.

ClassTestTrainVal%Original Categories
bathroom5,74051,65520013.4bathroom, shower
bedroom11,112100,01260025.9bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
child's room4,65041,84930010.8child's room, nursery, playroom
classroom3,75133,7632008.7classroom, kindergarden classroom
dressing room2,43221,8892005.7closet, dressing room
living room9,94089,45850028.7home theater, living room, recreation room, television room, waiting room
studio1,40412,6331003.3television studio
swimming pool1,50513,5472003.5jacuzzi, swimming pool
Total40,534364,8062300100

As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.

Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.

Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.

The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.

Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.

These sources were chosen to contrast the Places dataset data from the web with underrepresented data.

Search
Clear search
Close search
Google apps
Main menu