83 datasets found
  1. EmotionLib Media Filter Dataset Extended + Inter

    • kaggle.com
    zip
    Updated Sep 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAI Course (2025). EmotionLib Media Filter Dataset Extended + Inter [Dataset]. https://www.kaggle.com/datasets/saicourse/emotionlib-media-filter-dataset-extended-inter
    Explore at:
    zip(33512290 bytes)Available download formats
    Dataset updated
    Sep 28, 2025
    Authors
    SAI Course
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data for Training and Evaluating Video Content Safety Classifiers

    Context

    This dataset is part of the EmotionLib project, an open-source C library designed for real-time media content analysis. The primary goal of EmotionLib is to identify potentially sensitive material (NSFW/Gore) and predict appropriate age ratings (MPAA-like).

    This specific dataset was created to train and evaluate the final classification stage implemented within the samp.dll component of EmotionLib. This stage uses a hybrid model (involving pre-calculated features and an LSTM) to make a final decision on video safety and rating based on inputs from earlier processing stages.

    Content

    The dataset consists of two main parts:

    1. mlp-intermediate.csv:

      • This file contains pre-computed features for each video clip.
      • filename: An anonymized identifier for the video clip (e.g., G-101, R-1020, NC-17-118, GORE-1). These filenames are intentionally anonymized and DO NOT correspond to the original video titles. This was done to address ethical concerns, as the dataset contains a significant amount of NSFW (Not Safe For Work) and Gore content derived from various movies and media.
      • predict1 to predict30: These 30 columns represent the intermediate outputs from an ensemble of three Multilayer Perceptrons (MLPs). The architectures of these MLPs were found using Neural Architecture Search (NAS) techniques (inspired by works like F. Minutoli, M. Ciranni). These NAS-MLPs processed various statistical features (mean, std dev, skewness, kurtosis, etc.) extracted from the per-frame analysis performed by filter.dll and positiveness.dll. These 30 features serve as condensed statistical representations of the video content.
      • target: The ground truth label for the safety classification task.
        • 0.0: Represents content deemed "Safe for Work" (derived from original MPAA ratings G, PG, PG-13, R).
        • 1.0: Represents content deemed "Not Safe for Work" or potentially harmful (derived from original MPAA rating NC-17 or explicit Gore classifications).
    2. /Data Directory:

      • This directory contains the raw, per-frame analysis outputs from the initial EmotionLib components for each corresponding anonymized filename. These are stored in binary files:
      • .efp files (EmotionLib Filter Predictions):
        • Generated by filter.dll.
        • Binary format: int32 (num_frames), int32 (frame_sec_interval), followed by num_frames records of float32 (Safe prob.), float32 (Explicit prob.), float32 (Gore prob.).
      • .epp files (EmotionLib Positiveness Predictions):
        • Generated by positiveness.dll.
        • Binary format: int32 (num_frames), int32 (frame_sec_interval), followed by num_frames records of float32 (Negative prob.), float32 (Positive prob.).

    Anonymization and Ethical Considerations

    As mentioned, the filenames in this dataset are anonymized (G-xxx, PG-xxx, R-xxxx, NC-17-xxx, GORE-x, etc.) and do not reflect the original source titles. This is a crucial ethical consideration due to the inclusion of sensitive content (NSFW/Gore) from various media sources. Providing original titles could lead to direct association with potentially disturbing or copyrighted material outside the intended research context of evaluating the EmotionLib filtering system. The focus is on the content patterns recognized by the preliminary filters, not the specific source media itself.

    Examples and Dataset Characteristics

    The dataset includes a diverse range of content types and presents interesting challenges for classification models.

    • MPAA Rating Challenges: When evaluating models trained on this dataset (like the MPAA prediction part of samp.dll, although the primary target here is safety), some misclassifications highlighted the difficulty of the task. For instance, models sometimes struggled with boundary cases (most deviant cases):

      • 'Kickboxer' (1989, typically R) as PG-13 equivalent (filename: R-1010).
      • 'Titanic' (PG-13) as R equivalent (filename: PG-13-404).
      • 'Sonic the Hedgehog' (PG) as PG-13 equivalent (filename: PG-353).
      • 'Way of the Dragon' (1972, often PG but contains significant martial arts violence sometimes pushing it towards R in modern contexts) as PG equivalent (filename: R-1020).
    • Video Length: The dataset contains clips of varying durations. The longest video processed corresponds to the anonymized file R-1041, with a duration of approximately 6 hours and 26 minutes. This clip, derived from the anime series 'Fate/Zero', was correctly identified as requiring an R-equivalent rating by the system based on its content. This demonstrates the syste...

  2. A Baseflow Filter for Hydrologic Models in R

    • catalog.data.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). A Baseflow Filter for Hydrologic Models in R [Dataset]. https://catalog.data.gov/dataset/a-baseflow-filter-for-hydrologic-models-in-r-41440
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    A Baseflow Filter for Hydrologic Models in R Resources in this dataset:Resource Title: A Baseflow Filter for Hydrologic Models in R. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=383&modecode=20-72-05-00 download page

  3. v

    Global exporters importers-export import data of R o filter

    • volza.com
    csv
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global exporters importers-export import data of R o filter [Dataset]. https://www.volza.com/trade-data-global/global-exporters-importers-export-import-data-of-r+o+filter
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset authored and provided by
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of exporters, Count of importers, Count of shipments, Sum of export import value
    Description

    672 Global exporters importers export import shipment records of R o filter with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  4. f

    Raw data and R filtering code for "An investigation of genetic connectivity...

    • datasetcatalog.nlm.nih.gov
    • opal.latrobe.edu.au
    • +1more
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tonkin, Zeb; Dawson, David; Amtstaetter, Frank; Lyon, Jarod; Harrisson, Katherine; Murphy, Nicholas; O'Dwyer, James; Koster, Wayne (2022). Raw data and R filtering code for "An investigation of genetic connectivity shines a light on the relative roles of isolation by distance and oceanic currents in three diadromous fish species" [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000445393
    Explore at:
    Dataset updated
    Mar 17, 2022
    Authors
    Tonkin, Zeb; Dawson, David; Amtstaetter, Frank; Lyon, Jarod; Harrisson, Katherine; Murphy, Nicholas; O'Dwyer, James; Koster, Wayne
    Description

    This data set contains the Raw SNP output files for each of the three diadromous species studied within this manuscript. Additionally the covariates file containing all environmental and individual data about all individuals is included. All R code used to filter SNP's to the quality thresholds within this paper is additionally provided

  5. Meta-Analysis and modeling of vegetated filter removal of sediment using...

    • catalog.data.gov
    Updated Nov 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Meta-Analysis and modeling of vegetated filter removal of sediment using global dataset [Dataset]. https://catalog.data.gov/dataset/meta-analysis-and-modeling-of-vegetated-filter-removal-of-sediment-using-global-dataset
    Explore at:
    Dataset updated
    Nov 22, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data on vegetated filter strips, sediment loading into and out of riparian corridors/buffers (VFS), removal efficiency of sediment, meta-analysis of removal efficiencies, dimensional analysis of predictor variables, and regression modeling of VFS removal efficiencies. This dataset is associated with the following publication: Ramesh, R., L. Kalin, M. Hantush, and A. Chaudhary. A secondary assessment of sediment trapping effectiveness by vegetated buffers. ECOLOGICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 159: 106094, (2021).

  6. Z

    Data from: Dataset from : Browsing is a strong filter for savanna tree...

    • data.niaid.nih.gov
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archibald, Sally; Wayne Twine; Craddock Mthabini; Nicola Stevens (2021). Dataset from : Browsing is a strong filter for savanna tree seedlings in their first growing season [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4972083
    Explore at:
    Dataset updated
    Oct 1, 2021
    Dataset provided by
    Centre for African Ecology, School of Animal Plant and Environmental Sciences, University of Witwatersrand, Johannesburg, South Africa
    School of Animal Plant and Environmental Sciences, University of Witwatersrand, Johannesburg, South Africa
    Centre for African Ecology, School of Animal Plant and Environmental Sciences, University of Witwatersrand, Johannesburg, South Africa AND Environmental Change Institute, School of Geography and the Environment, University of Oxford, Oxford OX1 3QY, United Kingdom
    Authors
    Archibald, Sally; Wayne Twine; Craddock Mthabini; Nicola Stevens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data presented here were used to produce the following paper:

    Archibald, Twine, Mthabini, Stevens (2021) Browsing is a strong filter for savanna tree seedlings in their first growing season. J. Ecology.

    The project under which these data were collected is: Mechanisms Controlling Species Limits in a Changing World. NRF/SASSCAL Grant number 118588

    For information on the data or analysis please contact Sally Archibald: sally.archibald@wits.ac.za

    Description of file(s):

    File 1: cleanedData_forAnalysis.csv (required to run the R code: "finalAnalysis_PostClipResponses_Feb2021_requires_cleanData_forAnalysis_.R"

    The data represent monthly survival and growth data for ~740 seedlings from 10 species under various levels of clipping.

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes)

    File 2: Herbivory_SurvivalEndofSeason_march2017.csv (required to run the R code: "FinalAnalysisResultsSurvival_requires_Herbivory_SurvivalEndofSeason_march2017.R"

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes) genus Genus MAR Mean Annual Rainfall for that Species distribution (mm) rainclass High/medium/low

    File 3: allModelParameters_byAge.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    Age.of.plant Age in days species_code Species pred_SD_mm Predicted stem diameter in mm pred_SD_up top 75th quantile of stem diameter in mm pred_SD_low bottom 25th quantile of stem diameter in mm treatdate date when clipped pred_surv Predicted survival probability pred_surv_low Predicted 25th quantile survival probability pred_surv_high Predicted 75th quantile survival probability species_code species code Bite.probability Daily probability of being eaten max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc genus genus rainclass low/med/high

    File 4: EatProbParameters_June2020.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    shtspec species name species_code species code genus genus rainclass low/medium/high seed mass mass of seed (g per 1000seeds)
    Surv_intercept coefficient of the model predicting survival from age of clip for this species Surv_slope coefficient of the model predicting survival from age of clip for this species GR_intercept coefficient of the model predicting stem diameter from seedling age for this species GR_slope coefficient of the model predicting stem diameter from seedling age for this species species_code species code max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc AgeAtEscape_duiker[t] age of plant when its stem diameter is larger than a mean duiker bite AgeAtEscape_duiker_min[t] age of plant when its stem diameter is larger than a min duiker bite AgeAtEscape_duiker_max[t] age of plant when its stem diameter is larger than a max duiker bite AgeAtEscape_kudu[t] age of plant when its stem diameter is larger than a mean kudu bite AgeAtEscape_kudu_min[t] age of plant when its stem diameter is larger than a min kudu bite AgeAtEscape_kudu_max[t] age of plant when its stem diameter is larger than a max kudu bite

  7. i

    Studies Generation R Autism-Spectrum Quotient (AQ-28) Filter results

    • data.individualdevelopment.nl
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Studies Generation R Autism-Spectrum Quotient (AQ-28) Filter results [Dataset]. https://data.individualdevelopment.nl/dataset/f14199ca2602b31991ce13fb1fbd06aa
    Explore at:
    Dataset updated
    Oct 17, 2024
    Description

    The Autism-Spectrum Quotient (AQ-28) is a 28-item questionnaire that assesses self-reported autistic-like traits in adults with normal intelligence. The questionnaire rates symptoms of autism spectrum disorder on a 4-point Likert scale. In Generation R, a validated abbreviated version AQ-28 was used among parents.

  8. i

    Studies Generation R Five-Minute Speech Sample Filter results

    • data.individualdevelopment.nl
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Studies Generation R Five-Minute Speech Sample Filter results [Dataset]. https://data.individualdevelopment.nl/dataset/d225138c28a45d30826a944b3db136d1
    Explore at:
    Dataset updated
    Oct 17, 2024
    Description

    The Coding of Expressed Emotion from the Five Minute Speech Sample is a method to assess the emotional climate in families. The Five Minute Speech Sample is a task in which a family member speaks about a topic of their choice for five minutes while being recorded. The sample is then transcribed and coded for emotional expression using the EE coding system. There are three components: criticism, emotional overinvolvement, and warmth. The speech sample was obtained during a home visit during pregnancy. Generation R used an adapted version of the Expressed Emotion coding. Citation

  9. i

    Studies Generation R Rosenberg Self-Esteem Scale (RSE) Filter results

    • data.individualdevelopment.nl
    Updated Oct 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Studies Generation R Rosenberg Self-Esteem Scale (RSE) Filter results [Dataset]. https://data.individualdevelopment.nl/dataset/57d6fff6dc4d6dc79ce38cf56fafa3b8
    Explore at:
    Dataset updated
    Oct 17, 2024
    Description

    The Rosenberg Self-Esteem Scale (RSE) is a 10-item scale that measures global self-worth in adolescents by measuring both positive (5 items) and negative (5 items) feelings about the self. Although originally constructed as a Guttman-type scale (i.e., items with an ordinal pattern on the attribute), most researchers use a 4-point response format ranging from strongly agree to strongly disagree.

  10. Data, Statistical Models and R package

    • figshare.com
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natália Mello; Luís Gustavo Sanchez; Felipe Gawryszewski (2023). Data, Statistical Models and R package [Dataset]. http://doi.org/10.6084/m9.figshare.14547441.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Natália Mello; Luís Gustavo Sanchez; Felipe Gawryszewski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistical models and data. Unzip files and open datafiles in R using the readRDS function. Access models and data using the brms package. Data are stored in model_name$dataCustom made R package Argiope to process RAW images.

  11. NCHS mortality data 2014-2022

    • zenodo.org
    bin
    Updated Jul 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weinberger Daniel; Weinberger Daniel (2024). NCHS mortality data 2014-2022 [Dataset]. http://doi.org/10.5281/zenodo.12808102
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Weinberger Daniel; Weinberger Daniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a database (parquet format) containing publicly available multiple cause mortality data from the US (CDC/NCHS) for 2014-2022. Not all variables are included on this export. Please see below for restrictions on the use of these data imposed by NCHS. You can use the arrow package in R to open the file. See here for example analysis; https://github.com/DanWeinberger/pneumococcal_mortality/blob/main/analysis_nongeo.Rmd . For instance, save this file in a folder called "parquet3":

    library(arrow)

    library(dplyr)

    pneumo.deaths.in <- open_dataset("R:/parquet3", format = "parquet") %>% #open the dataset
    filter(grepl("J13|A39|J181|A403|B953|G001", all_icd)) %>% #filter to records that have the selected ICD codes
    collect() #call the dataset into memory. Note you should do any operations you canbefore calling 'collect()" due to memory issues

    The variables included are named: (see full dictionary:https://www.cdc.gov/nchs/nvss/mortality_public_use_data.htm)

    year: Calendar year of death

    month: Calendar month of death

    age_detail_number: number indicating year or part of year; can't be interpreted itself here. see agey variable instead

    sex: M/F

    place_of_death:

    Place of Death and Decedent’s Status
    Place of Death and Decedent’s Status
    1 ... Hospital, Clinic or Medical Center
    - Inpatient
    2 ... Hospital, Clinic or Medical Center
    - Outpatient or admitted to Emergency Room
    3 ... Hospital, Clinic or Medical Center
    - Dead on Arrival
    4 ... Decedent’s home
    5 ... Hospice facility
    6 ... Nursing home/long term care
    7 ... Other
    9 ... Place of death unknown

    all_icd: Cause of death coded as ICD10 codes. ICD1-ICD21 pasted into a single string, with separation of codes by an underscore

    hisp_recode: 0=Non-Hispanic; 1=Hispanic; 999= Not specified

    race_recode: race coding prior to 2018 (reconciled in race_recode_new)

    race_recode_alt: race coding after 2018 (reconciled in race_recode_new)

    race_recode_new:

    1='White'

    2= 'Black'

    3='Hispanic'

    4='American Indian'

    5='Asian/Pacific Islanders'

    agey:

    age in years (or partial years for kids <12months)

    https://www.cdc.gov/nchs/data_access/restrictions.htm

    Please Read Carefully Before Using NCHS Public Use Survey Data

    The National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC), conducts statistical and epidemiological activities under the authority granted by the Public Health Service Act (42 U.S.C. § 242k). NCHS survey data are protected by Federal confidentiality laws including Section 308(d) Public Health Service Act [42 U.S.C. 242m(d)] and the Confidential Information Protection and Statistical Efficiency Act or CIPSEA [Pub. L. No. 115-435, 132 Stat. 5529 § 302]. These confidentiality laws state the data collected by NCHS may be used only for statistical reporting and analysis. Any effort to determine the identity of individuals and establishments violates the assurances of confidentiality provided by federal law.

    Terms and Conditions

    NCHS does all it can to assure that the identity of individuals and establishments cannot be disclosed. All direct identifiers, as well as any characteristics that might lead to identification, are omitted from the dataset. Any intentional identification or disclosure of an individual or establishment violates the assurances of confidentiality given to the providers of the information. Therefore, users will:

    1. Use the data in this dataset for statistical reporting and analysis only.
    1. Make no attempt to learn the identity of any person or establishment included in these data.
    1. Not link this dataset with individually identifiable data from other NCHS or non-NCHS datasets.
    1. Not engage in any efforts to assess disclosure methodologies applied to protect individuals and establishments or any research on methods of re-identification of individuals and establishments.

    By using these data you signify your agreement to comply with the above-stated statutorily based requirements.

    Sanctions for Violating NCHS Data Use Agreement

    Willfully disclosing any information that could identify a person or establishment in any manner to a person or agency not entitled to receive it, shall be guilty of a class E felony and imprisoned for not more than 5 years, or fined not more than $250,000, or both.

  12. C

    Theft Filter

    • data.cityofchicago.org
    Updated Dec 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Police Department (2025). Theft Filter [Dataset]. https://data.cityofchicago.org/Public-Safety/Theft-Filter/aqvv-ggim
    Explore at:
    application/geo+json, xml, kml, csv, xlsx, kmzAvailable download formats
    Dataset updated
    Dec 1, 2025
    Authors
    Chicago Police Department
    Description

    This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

  13. h

    claude-filter

    • huggingface.co
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lipika R (2025). claude-filter [Dataset]. https://huggingface.co/datasets/lra10/claude-filter
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2025
    Authors
    Lipika R
    Description

    lra10/claude-filter dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. f

    Data from: Electricity cost of rapid filter backwashing in a water treatment...

    • scielo.figshare.com
    • resodate.org
    png
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raynner Menezes Lopes; Ananda Cristina Froes Alves; Jorge Fernando Hungria Ferreira; Marcelo Giulian Marques; José Almir Rodrigues Pereira (2023). Electricity cost of rapid filter backwashing in a water treatment plant [Dataset]. http://doi.org/10.6084/m9.figshare.11997345.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELO journals
    Authors
    Raynner Menezes Lopes; Ananda Cristina Froes Alves; Jorge Fernando Hungria Ferreira; Marcelo Giulian Marques; José Almir Rodrigues Pereira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT This paper aims to determine the cost of electricity for rapid filter backwashing procedures at water treatment plants, whose flow rate is 45 L/s. Flow and electricity consumption were monitored in order to analyze the performance of the pumping systems. Additionally, the effluent water quality of three washes was monitored in filter 4 and a survey of the electricity fare data of the treatment unit was carried out. With the results obtained, it was observed that the effluent turbidity values at the end of the three washes, in this case 31, 30, and 27 NTU, did not reach the minimum values recommended in the technical literature, which is at least 15 NTU. It was also observed the impossibility of prolonging the backwashing time in order to reach the standard of the literature, due to the double function of the elevated washing water reservoir (treated water), whose main purpose is to provide treated water to the water distribution network. Taking into account these limitations and the final quality of the wash effluent, it was observed that the duration of backwashing procedures should be around 380 seconds (6.3 minutes), consuming 23.36 m3/wash. The backwashing procedure cost was estimated at R$ 1.36/m3, which resulted in R$ 31.83/wash. Considering the whole filtration unit, the backwashing procedure cost was R$ 254.64/day, R$ 7,639.20/month, and R$ 91,670.4/year. This value can be classified as too expensive considering the treatment plant studied.

  15. C

    HOMICIDE FILTER

    • data.cityofchicago.org
    Updated Nov 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Police Department (2025). HOMICIDE FILTER [Dataset]. https://data.cityofchicago.org/Public-Safety/HOMICIDE-FILTER/4ser-6e2h
    Explore at:
    kml, xml, csv, kmz, application/geo+json, xlsxAvailable download formats
    Dataset updated
    Nov 24, 2025
    Authors
    Chicago Police Department
    Description

    This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

  16. Case Study: Cyclist

    • kaggle.com
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PatrickRCampbell (2021). Case Study: Cyclist [Dataset]. https://www.kaggle.com/patrickrcampbell/case-study-cyclist
    Explore at:
    zip(193057270 bytes)Available download formats
    Dataset updated
    Jul 27, 2021
    Authors
    PatrickRCampbell
    Description

    Phase 1: ASK

    Key Objectives:

    1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.

    2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.

    3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members

    Phase 2: PREPARE:

    Key Objectives:

    1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.

    2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.

    #Installing packages
    install.packages("tidyverse", repos = "http://cran.us.r-project.org")
    install.packages("readr", repos = "http://cran.us.r-project.org")
    install.packages("janitor", repos = "http://cran.us.r-project.org")
    install.packages("geosphere", repos = "http://cran.us.r-project.org")
    install.packages("gridExtra", repos = "http://cran.us.r-project.org")
    
    library(tidyverse)
    library(readr)
    library(janitor)
    library(geosphere)
    library(gridExtra)
    
    #Importing data & verifying the information within the dataset
    all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")
    
    glimpse(all_tripdata_clean)
    
    summary(all_tripdata_clean)
    
    

    Phase 3: PROCESS

    Key Objectives:

    1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.

    #Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
    all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),] 
    
    #Creating columns for the individual date components (days_of_week should be run last)
    all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
    all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
    all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
    all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
    all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")
    
    

    ** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**

    #Calculating the ride length in miles & minutes
    all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")
    
    all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
    all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles
    
    #Calculating the mean time and distance based on the user groups
    userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))
    
    
    userType_means <- all_tripdata_clean %>% 
     group_by(member_casual) %>% 
     summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))
    

    Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.

    #Calculations
    
    with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")
    
    with_bike_type %>%
     mutate(weekday = wday(started_at, label = TRUE)) %>% 
     group_by(member_casual,rideable_type,weekday) %>%
     summarise(totals=n(), .groups="drop") %>%
     
    with_bike_type %>%
     group_by(member_casual,rideable_type) %>%
     summarise(totals=n(), .groups="drop") %>%
    
     #Calculating the ride differential
     
     all_tripdata_clean %>% 
     mutate(weekday = wkday(started_at, label = TRUE)) %>% 
     group_by(member_casual, weekday) %>% 
     summarise(number_of_rides = n()
          ,average_duration = mean(ride_length),.groups = 'drop') %>% 
     arrange(me...
    
  17. Additional file 4 of Impact of adaptive filtering on power and false...

    • springernature.figshare.com
    txt
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sonja Zehetmayer; Martin Posch; Alexandra Graf (2023). Additional file 4 of Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments [Dataset]. http://doi.org/10.6084/m9.figshare.21206774.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Sonja Zehetmayer; Martin Posch; Alexandra Graf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4. Example code for real data example for programming language R.

  18. f

    Open data: Visual load effects on the auditory steady-state responses to...

    • su.figshare.com
    • demo.researchdata.se
    • +2more
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Wiens; Malina Szychowska (2023). Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones [Dataset]. http://doi.org/10.17045/sthlmuni.12582002.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Stockholm University
    Authors
    Stefan Wiens; Malina Szychowska
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The main results file are saved separately:- ASSR2.html: R output of the main analyses (N = 33)- ASSR2_subset.html: R output of the main analyses for the smaller sample (N = 25)FIGSHARE METADATACategories- Biological psychology- Neuroscience and physiological psychology- Sensory processes, perception, and performanceKeywords- crossmodal attention- electroencephalography (EEG)- early-filter theory- task difficulty- envelope following responseReferences- https://doi.org/10.17605/OSF.IO/6FHR8- https://github.com/stamnosslin/mn- https://doi.org/10.17045/sthlmuni.4981154.v3- https://biosemi.com/- https://www.python.org/- https://mne.tools/stable/index.html#- https://www.r-project.org/- https://rstudio.com/products/rstudio/GENERAL INFORMATION1. Title of Dataset:Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones2. Author Information A. Principal Investigator Contact Information Name: Stefan Wiens Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/swiens-1.184142 Email: sws@psychology.su.se B. Associate or Co-investigator Contact Information Name: Malina Szychowska Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.researchgate.net/profile/Malina_Szychowska Email: malina.szychowska@psychology.su.se3. Date of data collection: Subjects (N = 33) were tested between 2019-11-15 and 2020-03-12.4. Geographic location of data collection: Department of Psychology, Stockholm, Sweden5. Information about funding sources that supported the collection of the data:Swedish Research Council (Vetenskapsrådet) 2015-01181SHARING/ACCESS INFORMATION1. Licenses/restrictions placed on the data: CC BY 4.02. Links to publications that cite or use the data: Szychowska M., & Wiens S. (2020). Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones. Submitted manuscript.The study was preregistered:https://doi.org/10.17605/OSF.IO/6FHR83. Links to other publicly accessible locations of the data: N/A4. Links/relationships to ancillary data sets: N/A5. Was data derived from another source? No 6. Recommended citation for this dataset: Wiens, S., & Szychowska M. (2020). Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones. Stockholm: Stockholm University. https://doi.org/10.17045/sthlmuni.12582002DATA & FILE OVERVIEWFile List:The files contain the raw data, scripts, and results of main and supplementary analyses of an electroencephalography (EEG) study. Links to the hardware and software are provided under methodological information.ASSR2_experiment_scripts.zip: contains the Python files to run the experiment. ASSR2_rawdata.zip: contains raw datafiles for each subject- data_EEG: EEG data in bdf format (generated by Biosemi)- data_log: logfiles of the EEG session (generated by Python)ASSR2_EEG_scripts.zip: Python-MNE scripts to process the EEG dataASSR2_EEG_preprocessed_data.zip: EEG data in fif format after preprocessing with Python-MNE scriptsASSR2_R_scripts.zip: R scripts to analyze the data together with the main datafiles. The main files in the folder are: - ASSR2.html: R output of the main analyses- ASSR2_subset.html: R output of the main analyses but after excluding eight subjects who were recorded as pilots before preregistering the studyASSR2_results.zip: contains all figures and tables that are created by Python-MNE and R.METHODOLOGICAL INFORMATION1. Description of methods used for collection/generation of data:The auditory stimuli were amplitude-modulated tones with a carrier frequency (fc) of 500 Hz and modulation frequencies (fm) of 20.48 Hz, 40.96 Hz, or 81.92 Hz. The experiment was programmed in python: https://www.python.org/ and used extra functions from here: https://github.com/stamnosslin/mnThe EEG data were recorded with an Active Two BioSemi system (BioSemi, Amsterdam, Netherlands; www.biosemi.com) and saved in .bdf format.For more information, see linked publication.2. Methods for processing the data:We conducted frequency analyses and computed event-related potentials. See linked publication3. Instrument- or software-specific information needed to interpret the data:MNE-Python (Gramfort A., et al., 2013): https://mne.tools/stable/index.html#Rstudio used with R (R Core Team, 2020): https://rstudio.com/products/rstudio/Wiens, S. (2017). Aladins Bayes Factor in R (Version 3). https://www.doi.org/10.17045/sthlmuni.4981154.v34. Standards and calibration information, if appropriate:For information, see linked publication.5. Environmental/experimental conditions:For information, see linked publication.6. Describe any quality-assurance procedures performed on the data:For information, see linked publication.7. People involved with sample collection, processing, analysis and/or submission:- Data collection: Malina Szychowska with assistance from Jenny Arctaedius.- Data processing, analysis, and submission: Malina Szychowska and Stefan WiensDATA-SPECIFIC INFORMATION:All relevant information can be found in the MNE-Python and R scripts (in EEG_scripts and analysis_scripts folders) that process the raw data. For example, we added notes to explain what different variables mean.

  19. u

    Data from: Snow depth estimation from Geoprecision-Maxbotic ultrasonic...

    • produccioncientifica.uca.es
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Pablo, Miguel Ángel; Rosado Moscoso, Belén; de Pablo, Miguel Ángel; Rosado Moscoso, Belén (2025). Snow depth estimation from Geoprecision-Maxbotic ultrasonic devices: R processing code and example datasets from Antarctica [Dataset]. https://produccioncientifica.uca.es/documentos/688b604017bb6239d2d4a3e6
    Explore at:
    Dataset updated
    2025
    Authors
    de Pablo, Miguel Ángel; Rosado Moscoso, Belén; de Pablo, Miguel Ángel; Rosado Moscoso, Belén
    Area covered
    Antártida
    Description

    This dataset provides the R script and example data used to estimate snow depth from ultrasonic distance measurements collected by low-cost Geoprecision-Maxbotic devices, designed for autonomous operation in polar conditions. The dataset includes:

    The full R script used for data preprocessing, filtering, and snow depth calculation, with all parameters fully documented.

    Example raw and clean data files, ready to use, acquired from a sensor installed in the South Shetland Islands (Antarctica) between 2023 and 2024.

    The processing pipeline includes outlier removal (Hampel filter), gap interpolation, moving average smoothing, reference level estimation, and snow depth conversion in millimetres and centimetres. Derived snow depths are exported alongside summary statistics.

    This code was developed as part of a research project evaluating the performance and limitations of low-cost ultrasonic snow depth measurement systems in Antarctic permafrost monitoring networks. Although the script was designed for the specific configuration of Geoprecision dataloggers and Maxbotic MB7574-SCXL-Maxsonar-WRST7 sensors, it can be easily adapted to other distance-measuring devices providing similar output formats.

    All files are provided in open formats (CSV, and R) to facilitate reuse and reproducibility. Users are encouraged to modify the script to fit their own instrumentation and field conditions.

  20. d

    Data from: Parentage and relatedness reconstruction in Pinus sylvestris...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Mar 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Hall; Wei Zhao; Ulfstand Wennström; Bengt Andersson Gull; Xiao-Ru Wang (2020). Parentage and relatedness reconstruction in Pinus sylvestris using genotyping by sequencing [Dataset]. http://doi.org/10.5061/dryad.h44j0zpg5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Dryad
    Authors
    David Hall; Wei Zhao; Ulfstand Wennström; Bengt Andersson Gull; Xiao-Ru Wang
    Time period covered
    Feb 17, 2020
    Description

    The dataset contains several files:

    Vasthus_001_m06.vcf.gz: VCF-file which has been slightly pre filtered to reduce size, see vcf_filter.txt

    Vasterhus.txt: Sample names of samples in the study

    refkeep.txt: Sample names of the samples used as allele frequency reference

    Parental_ID.txt: The registered names for the parental trees in the study

    vcf_filter.txt: Description on how to filter the VCF file according to the manuscript

    Rfiles Dataset-2.RData: The data set resulting from working through the vcf filtering and rhe R-script files

    Rcode_related.R: R-script for relatedness estimation using the R-package 'related'

    view_relationships.R: Utilizing the result from the previous R-script to visualize pairwise relatedness and reconstructing some figures from the manuscript

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SAI Course (2025). EmotionLib Media Filter Dataset Extended + Inter [Dataset]. https://www.kaggle.com/datasets/saicourse/emotionlib-media-filter-dataset-extended-inter
Organization logo

EmotionLib Media Filter Dataset Extended + Inter

Explore at:
zip(33512290 bytes)Available download formats
Dataset updated
Sep 28, 2025
Authors
SAI Course
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Data for Training and Evaluating Video Content Safety Classifiers

Context

This dataset is part of the EmotionLib project, an open-source C library designed for real-time media content analysis. The primary goal of EmotionLib is to identify potentially sensitive material (NSFW/Gore) and predict appropriate age ratings (MPAA-like).

This specific dataset was created to train and evaluate the final classification stage implemented within the samp.dll component of EmotionLib. This stage uses a hybrid model (involving pre-calculated features and an LSTM) to make a final decision on video safety and rating based on inputs from earlier processing stages.

Content

The dataset consists of two main parts:

  1. mlp-intermediate.csv:

    • This file contains pre-computed features for each video clip.
    • filename: An anonymized identifier for the video clip (e.g., G-101, R-1020, NC-17-118, GORE-1). These filenames are intentionally anonymized and DO NOT correspond to the original video titles. This was done to address ethical concerns, as the dataset contains a significant amount of NSFW (Not Safe For Work) and Gore content derived from various movies and media.
    • predict1 to predict30: These 30 columns represent the intermediate outputs from an ensemble of three Multilayer Perceptrons (MLPs). The architectures of these MLPs were found using Neural Architecture Search (NAS) techniques (inspired by works like F. Minutoli, M. Ciranni). These NAS-MLPs processed various statistical features (mean, std dev, skewness, kurtosis, etc.) extracted from the per-frame analysis performed by filter.dll and positiveness.dll. These 30 features serve as condensed statistical representations of the video content.
    • target: The ground truth label for the safety classification task.
      • 0.0: Represents content deemed "Safe for Work" (derived from original MPAA ratings G, PG, PG-13, R).
      • 1.0: Represents content deemed "Not Safe for Work" or potentially harmful (derived from original MPAA rating NC-17 or explicit Gore classifications).
  2. /Data Directory:

    • This directory contains the raw, per-frame analysis outputs from the initial EmotionLib components for each corresponding anonymized filename. These are stored in binary files:
    • .efp files (EmotionLib Filter Predictions):
      • Generated by filter.dll.
      • Binary format: int32 (num_frames), int32 (frame_sec_interval), followed by num_frames records of float32 (Safe prob.), float32 (Explicit prob.), float32 (Gore prob.).
    • .epp files (EmotionLib Positiveness Predictions):
      • Generated by positiveness.dll.
      • Binary format: int32 (num_frames), int32 (frame_sec_interval), followed by num_frames records of float32 (Negative prob.), float32 (Positive prob.).

Anonymization and Ethical Considerations

As mentioned, the filenames in this dataset are anonymized (G-xxx, PG-xxx, R-xxxx, NC-17-xxx, GORE-x, etc.) and do not reflect the original source titles. This is a crucial ethical consideration due to the inclusion of sensitive content (NSFW/Gore) from various media sources. Providing original titles could lead to direct association with potentially disturbing or copyrighted material outside the intended research context of evaluating the EmotionLib filtering system. The focus is on the content patterns recognized by the preliminary filters, not the specific source media itself.

Examples and Dataset Characteristics

The dataset includes a diverse range of content types and presents interesting challenges for classification models.

  • MPAA Rating Challenges: When evaluating models trained on this dataset (like the MPAA prediction part of samp.dll, although the primary target here is safety), some misclassifications highlighted the difficulty of the task. For instance, models sometimes struggled with boundary cases (most deviant cases):

    • 'Kickboxer' (1989, typically R) as PG-13 equivalent (filename: R-1010).
    • 'Titanic' (PG-13) as R equivalent (filename: PG-13-404).
    • 'Sonic the Hedgehog' (PG) as PG-13 equivalent (filename: PG-353).
    • 'Way of the Dragon' (1972, often PG but contains significant martial arts violence sometimes pushing it towards R in modern contexts) as PG equivalent (filename: R-1020).
  • Video Length: The dataset contains clips of varying durations. The longest video processed corresponds to the anonymized file R-1041, with a duration of approximately 6 hours and 26 minutes. This clip, derived from the anime series 'Fate/Zero', was correctly identified as requiring an R-equivalent rating by the system based on its content. This demonstrates the syste...

Search
Clear search
Close search
Google apps
Main menu