100+ datasets found
  1. h

    sensitive_document_classification

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mouhamet, sensitive_document_classification [Dataset]. https://huggingface.co/datasets/mouhamet/sensitive_document_classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Mouhamet
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Sensitive Document Classification

    Preventing data violation becomes increasingly crucial. Several data breaches have been reported during the last years. To prevent data violation, we need to determine the sensitivity level of documents. Deep learning techniques perform well in document classification but require large amount of data. However, a lack of public dataset in this context, due to the sensitive nature of documents, prevent reseacher to to design powerful models. We… See the full description on the dataset page: https://huggingface.co/datasets/mouhamet/sensitive_document_classification.

  2. Radio Science Documentation Bundle - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Radio Science Documentation Bundle - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/radio-science-documentation-bundle
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This bundle contains documentation about data products that are collected using radio science and supporting equipment. With one exception, each member collection contains one or more versions of a single Software Interface Specification (SIS) or an equivalent document. A SIS describes the format and content of a data file at a granularity suffient for use -- typically byte-level, but sometimes bit-level. Examples of products and descriptions of their use may also be included in a collection, as appropriate. The exception is the DOCUMENT collection, which contains supporting material -- usually journal publications, technical reports, or other documents that describe investigations, analysis methods, and/or data but not at the level of a SIS. Members of the DOCUMENT collection were usually released once, whereas a SIS often evolves over many years.

  3. e

    Data from: How to Document Ontology Design Patterns : Supporting Data Part 2...

    • data.europa.eu
    • researchdata.se
    unknown
    Updated Aug 24, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jönköping University (2016). How to Document Ontology Design Patterns : Supporting Data Part 2 [Dataset]. https://data.europa.eu/88u/dataset/https-doi-org-10-57817-g1th-p594
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Aug 24, 2016
    Dataset authored and provided by
    Jönköping University
    Description

    Survey data presented and discussed in the paper 'How to Document Ontology Design Patterns' presented at the Workshop on Ontology and Semantic Web Patterns in conjunction with the International Semantic Web Conference 2016.

    The dataset contains two CSV files, each corresponding to one of the two surveys discussed in Section 3 of the paper in question. Both files include the questions (row 1), answer options (row 2), and provided answers (row 3 and onward). OEMS-Data.csv contains the data discussed in Section 3.1 (Table 2/3) and ODPT-Data.csv contains the data discussed in Section 3.2 (Table 4).

    The dataset was originally published in DiVA and moved to SND in 2024.

  4. d

    2019 Public Data File - Parents

    • catalog.data.gov
    • data.cityofnewyork.us
    • +2more
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). 2019 Public Data File - Parents [Dataset]. https://catalog.data.gov/dataset/2019-public-data-file-parents
    Explore at:
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    Data represents feedback on learning environment from families. Aids in facilitating the understanding of families perceptions of students, teachers, environment of their school. The survey is aligned to the DOE's framework for great schools. It is designed to collect important information about each schools ability to support success.

  5. Training images

    • redivis.com
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). Training images [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
    Explore at:
    Dataset updated
    Oct 20, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Aug 8, 2022
    Description

    This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_kd.

  6. D

    Document Fraud Detection Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Document Fraud Detection Market Research Report 2033 [Dataset]. https://dataintelo.com/report/document-fraud-detection-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Document Fraud Detection Market Outlook



    According to our latest research, the global Document Fraud Detection market size reached USD 8.2 billion in 2024, reflecting robust demand across various sectors. The market is expected to expand at a compound annual growth rate (CAGR) of 13.7% from 2025 to 2033, projecting a value of USD 25.1 billion by 2033. This impressive growth is driven primarily by the increasing sophistication of fraud attempts, rapid digital transformation, and heightened regulatory requirements for identity verification and data security in both public and private sectors worldwide.



    A major growth factor for the Document Fraud Detection market is the escalating complexity and frequency of fraudulent activities targeting sensitive documentation. As businesses and governments digitize more of their operations, the risk of document forgery, identity theft, and data manipulation has surged. Organizations are increasingly investing in advanced fraud detection solutions to safeguard their assets, maintain customer trust, and comply with evolving regulations. The proliferation of remote onboarding and digital transactions, especially post-pandemic, has further amplified the need for robust document authentication and identity verification processes. Consequently, innovative technologies such as artificial intelligence, machine learning, and biometrics are being integrated into document fraud detection systems to enhance accuracy, speed, and scalability.



    Another significant driver for the market is the tightening of regulatory frameworks across the globe. Governments and regulatory bodies are mandating stringent Know Your Customer (KYC), Anti-Money Laundering (AML), and data privacy standards, especially in sectors like banking, financial services, healthcare, and government services. Failure to comply with these regulations can result in hefty fines and reputational damage. As a result, organizations are prioritizing investment in comprehensive document fraud detection solutions that not only ensure compliance but also provide audit trails and real-time alerts. The need for continuous monitoring and proactive risk management is pushing companies to adopt both on-premises and cloud-based solutions, depending on their operational requirements and data sensitivity.



    Furthermore, the rising adoption of digital identity verification in emerging markets is propelling the growth of the Document Fraud Detection market. Countries in Asia Pacific, Latin America, and Africa are experiencing rapid digitalization, with increased access to smartphones and the internet. This has led to a surge in online banking, e-commerce, e-governance, and digital healthcare services, all of which require secure and reliable document verification processes. Local and international vendors are tapping into these opportunities by offering scalable, cloud-native solutions tailored to regional needs and regulatory environments. The growing awareness about the risks of document fraud, coupled with the need for seamless user experiences, is expected to further accelerate market expansion in these regions.



    From a regional perspective, North America currently dominates the Document Fraud Detection market due to its advanced technological infrastructure, high incidence of digital fraud, and strict regulatory landscape. However, Asia Pacific is expected to witness the fastest growth over the forecast period, driven by rapid digital adoption, increasing investments in cybersecurity, and supportive government initiatives for digital identity management. Europe remains a significant market, supported by GDPR and other data protection regulations, while the Middle East & Africa and Latin America are emerging as lucrative markets, fueled by ongoing digital transformation projects and rising awareness about document security.



    Component Analysis



    The Component segment of the Document Fraud Detection market is categorized into software, hardware, and services, each playing a crucial role in the overall ecosystem. Software solutions form the backbone of fraud detection initiatives, providing the necessary algorithms, analytics, and user interfaces for identifying and preventing fraudulent activities. These solutions are continuously evolving, incorporating advanced technologies such as artificial intelligence, machine learning, and natural language processing to enhance detection accuracy and reduce false positives. Software vendors are also focusing on developing modular, scalable platfor

  7. Digital Record Store (including data from former Electronic Document...

    • ckan.publishing.service.gov.uk
    • data.europa.eu
    • +1more
    Updated Sep 6, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2013). Digital Record Store (including data from former Electronic Document Management (EDM) operated by Capita to May 2011 [Dataset]. https://ckan.publishing.service.gov.uk/dataset/digital-record-store-including-data-from-former-electronic-document-management-edm-operate-2011
    Explore at:
    Dataset updated
    Sep 6, 2013
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    Records associated with claims for compensation

  8. IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection...

    • zenodo.org
    zip
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lulu Xie; Lulu Xie; Yancheng Wang; Yancheng Wang; Hong Guan; Hong Guan; Soham Nag; Soham Nag; Rajeev Goel; Niranjan Erappa Narayana Swamy; Yingzhen Yang; Yingzhen Yang; Chaowei Xiao; Jonathan Prisby; Ross Maciejewski; Ross Maciejewski; Jia Zou; Jia Zou; Rajeev Goel; Niranjan Erappa Narayana Swamy; Chaowei Xiao; Jonathan Prisby (2025). IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection (part 6) [Dataset]. http://doi.org/10.5281/zenodo.13855175
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lulu Xie; Lulu Xie; Yancheng Wang; Yancheng Wang; Hong Guan; Hong Guan; Soham Nag; Soham Nag; Rajeev Goel; Niranjan Erappa Narayana Swamy; Yingzhen Yang; Yingzhen Yang; Chaowei Xiao; Jonathan Prisby; Ross Maciejewski; Ross Maciejewski; Jia Zou; Jia Zou; Rajeev Goel; Niranjan Erappa Narayana Swamy; Chaowei Xiao; Jonathan Prisby
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is part 6 of the IDNet dataset of our research paper "IDNet: A Novel Identity Document Dataset via Few-Shot and Quality-Driven Synthetic Data Generation. Here's a link to the paper: https://ieeexplore.ieee.org/document/10825017

    Citation:

    @inproceedings{xie2024idnet,
    title={IDNet: A Novel Identity Document Dataset via Few-Shot and Quality-Driven Synthetic Data Generation},
    author={Xie, Lulu and Wang, Yancheng and Guan, Hong and Nag, Soham and Goel, Rajeev and Swamy, Niranjan and Yang, Yingzhen and Xiao, Chaowei and Prisby, Jonathan and Maciejewski, Ross and others},
    booktitle={2024 IEEE International Conference on Big Data (BigData)},
    pages={2244--2253},
    year={2024},
    organization={IEEE}
    }

    @article{guan2024idnet,
    title={IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection},
    author={Guan, Hong and Wang, Yancheng and Xie, Lulu and Nag, Soham and Goel, Rajeev and Swamy, Niranjan Erappa Narayana and Yang, Yingzhen and Xiao, Chaowei and Prisby, Jonathan and Maciejewski, Ross and Zou, Jia},
    journal={arXiv preprint arXiv:2408.01690},
    year={2024}
    }

  9. a

    Records Search Help Document

    • data-test-lakecountyil.opendata.arcgis.com
    • datasets.ai
    • +4more
    Updated Jan 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lake County Illinois GIS (2019). Records Search Help Document [Dataset]. https://data-test-lakecountyil.opendata.arcgis.com/documents/c619181f6f414eb58bf4be95a798db45
    Explore at:
    Dataset updated
    Jan 22, 2019
    Dataset authored and provided by
    Lake County Illinois GIS
    License

    https://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/datahttps://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/data

    Area covered
    Description

    Use the Records Search to do the following: Search for records, such as agreements with other government agencies, maps and other documents from the Public Works, Transportation, and Planning, Building and Development departments.

  10. n

    Data from: A New Image Dataset for Document Corner Localization

    • narcis.nl
    • data.mendeley.com
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dizaj, S (via Mendeley Data) (2020). A New Image Dataset for Document Corner Localization [Dataset]. http://doi.org/10.17632/x3nm4cxr83.3
    Explore at:
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Dizaj, S (via Mendeley Data)
    Description

    To use this dataset and respect for copyright, please cite the following paper: https://ieeexplore.ieee.org/abstract/document/9116896/ We present a new dataset that covers almost all the scenarios that may exist on document images that were taken by a smartphone. The collection includes 1111 images. We tested two state-of-the-art algorithms for finding the corners of the document in our dataset and the results also provided. The results indicate that there are still situations that these algorithms fail and it needs more research.

  11. a

    Data from: Data Dictionary Template

    • data-academy-tempegov.hub.arcgis.com
    • data.tempe.gov
    • +8more
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2020). Data Dictionary Template [Dataset]. https://data-academy-tempegov.hub.arcgis.com/datasets/data-dictionary-template
    Explore at:
    Dataset updated
    Jun 5, 2020
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Dictionary template for Tempe Open Data.

  12. Contract - Electronic scanning and document storage - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Oct 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2023). Contract - Electronic scanning and document storage - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/contract-electronic-scanning-and-document-storage
    Explore at:
    Dataset updated
    Oct 3, 2023
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    The London Borough of Barnet has entered into a contract for Electronic scanning and document storage with Stor-A-File. The contract is for the provision of electronic scanning and document storage and commenced on 1st August 2023 and will run until 31st July 2024. Further details on the Contract Award can be found in the link below. Personal data relating to junior officer names and commercial interests has been redacted from the contract attachment.

  13. Understanding Different File Formats

    • kaggle.com
    zip
    Updated Jan 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AKR (2022). Understanding Different File Formats [Dataset]. https://www.kaggle.com/datasets/raj401/understanding-different-file-formats
    Explore at:
    zip(18580 bytes)Available download formats
    Dataset updated
    Jan 24, 2022
    Authors
    AKR
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Working with different File Formats

    I have created supporting notebook to explain all these file formats. You can find the notebook here:- https://www.kaggle.com/raj401/working-with-different-file-formats

    The Datasets

    1. datafile.csv

    2. datafile.json

    3. datafile.ods

    4. datafile.xls

    All the above Datasets contain same data in different file formats.

    The data shows pattern of land utilisation under categories such as orest, permanent pastures and another grazing lands, land not available for cultivation etc.

    The data contains following features:-

    'Year (Col.1)' 'Geographical Area (Col.2)' 'Reporting area for Land utilisation statistics (Col.3 = Col.4+Col.7+ Col.11+Col.14+Col.15)' 'Forests (Col.4)' 'Not available for cultivation - Area under non-agricultural uses (Col.5)' 'Not available for cultivation - Barren and unculturable Land (Col.6)' 'Not available for cultivation - Total (Col.7 = Col.5+Col.6)' 'Other uncultivated Land excluding Fallow Land - Permanent pastures & other Grazing Lands (Col.8)' 'Other uncultivated Land excluding Fallow Land - Land under Misc. tree crops & groves (not incl. in net area sown) (Col.9)' 'Other uncultivated Land excluding Fallow Land - Culturable waste Land (Col.10)' 'Other uncultivated Land excluding Fallow Land - Total (Col.11 = Col.8 to Col.10)' 'Fallow Lands - Fallow Lands other than current fallows (Col.12)' 'Fallow Lands - Current fallows (Col.13)' 'Fallow Lands - Total Col.14 = (Col.12+Col.13)' 'Net area Sown (Col.15)' 'Total cropped area (Col.16)' 'Area sown more than once (Col.17 = Col.16-Col.15)' 'Agricultural Land/Cultivable Land/Culturable Land/Arable Land (Col.18 = Col.9+Col.10+Col.14+Col.15)' 'Cultivated Land (Col.19 = Col.13+Col.15)' 'Cropping Intensity (Col.20 = % of Col.16 over Col.15)'

    Acknowledgements

    I am really thankful to Indian government for storing these valuable data. Source:- https://data.gov.in/

    Inspiration

    I am inspired by everyone here on Kaggle for the level of their dedication and hard work.

  14. H

    Documentation and Metadata

    • dataverse.harvard.edu
    • dataverse.lib.virginia.edu
    Updated May 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2015). Documentation and Metadata [Dataset]. http://doi.org/10.7910/DVN/8KN41O
    Explore at:
    application/x-download(21383), pptx(3299456), doc(71680), application/x-download(30506), xlsx(67819), application/x-download(33870), pdf(286050), doc(72192)Available download formats
    Dataset updated
    May 22, 2015
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data Documentation and Metadata session from the 2015 Virginia Data Management Bootcamp. Introduces non-structural (data dictionaries, read me files, code books) and structured ways (XML schemas) to document research data.

  15. a

    Recorded Document

    • hub.arcgis.com
    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    • +1more
    Updated Mar 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Delaware County, Ohio (2021). Recorded Document [Dataset]. https://hub.arcgis.com/maps/Delco::recorded-document
    Explore at:
    Dataset updated
    Mar 31, 2021
    Dataset authored and provided by
    Delaware County, Ohio
    Area covered
    Description

    This dataset consists of points that represent recorded documents in the Delaware County Recorder's Plat Books, Cabinet/Slides and Instruments Records which are not represented by subdivision plats that are active. They are documents such as; vacations, subdivisions, centerline surveys, surveys, annexations, and miscellaneous documents within Delaware County, Ohio.

  16. e

    Relationship and Entity Extraction Evaluation Dataset (Documents)

    • data.europa.eu
    • data.wu.ac.at
    json
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Defence Science and Technology Laboratory (2022). Relationship and Entity Extraction Evaluation Dataset (Documents) [Dataset]. https://data.europa.eu/data/datasets/relationship-and-entity-extraction-evaluation-dataset?locale=ga
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 30, 2022
    Dataset authored and provided by
    Defence Science and Technology Laboratory
    Description

    This document dataset was the output of a project aimed to create a 'gold standard' dataset that could be used to train and validate machine learning approaches to natural language processing (NLP). The project was carried out by Aleph Insights and Committed Software on behalf of the Defence Science and Technology Laboratory (Dstl). The data set specifically focusing on entity and relationship extraction relevant to somebody operating in the role of a defence and security intelligence analyst. The dataset was therefore constructed using documents and structured schemas that were relevant to the defence and security analysis domain. A number of data subsets were produced (this is the BBC Online data subset). Further information about this data subset (BBC Online) and the others produced (together with licence conditions, attribution and schemas) many be found at the main project GitHub repository webpage (https://github.com/dstl/re3d). Note that the 'documents.json' file is to be used together with the 'entities.json' and 'relations.json' files (also found on this data.gov.uk webpage and their structures and relationship described on the given GitHub webpage.

  17. o

    Protected documents - Dataset - Open Government Data Portal

    • opendata.gov.jo
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Protected documents - Dataset - Open Government Data Portal [Dataset]. https://opendata.gov.jo/dataset/protected-documents-2871-2022
    Explore at:
    Dataset updated
    Nov 28, 2023
    Description

    Documents issued by the Protected Documents Office to civil status and passport offices, including passports, cards, certificates, and family books.

  18. Invasive Plant Inventory at San Diego National Wildlife Refuge- Data...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Invasive Plant Inventory at San Diego National Wildlife Refuge- Data Documentation [Dataset]. https://catalog.data.gov/dataset/invasive-plant-inventory-at-san-diego-national-wildlife-refuge-data-documentation
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    U.S. Fish and Wildlife Servicehttp://www.fws.gov/
    Description

    In 2012, an invasive plant inventory of priority invasive plant species in priority areas was conducted at San Diego National Wildlife Refuge. Results from this effort will inform the development of invasive plant management objectives, strategies, and serves as a baseline for assessing change in the status of invasive plant distribution or abundance over time.

  19. H

    Extracted Data From: TRI Basic Data Plus Files

    • dataverse.harvard.edu
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US EPA (2025). Extracted Data From: TRI Basic Data Plus Files [Dataset]. http://doi.org/10.7910/DVN/PFMTZR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    US EPA
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2016 - Dec 31, 2023
    Area covered
    United States
    Description

    This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information: TRI basic plus data files guides. (2024, September 18). US EPA. https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-guides If you have questions about the underlying data stored here, please contact tri.help@epa.gov. If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu. "EPA has been collecting Toxics Release Inventory (TRI) data since 1987. The "Basic Plus" data files include ten file types that collectively contain all of the data fields from the TRI Reporting Form R and Form A. The files themselves are in tab-delimited .txt format and then compressed into a .zip file. 1a: Facility, chemical, releases and other waste management summary information 1b: Chemical activities and uses 2a: On- and off-site disposal, treatment, energy recovery, and recycling information; non-production-related waste managed quantities; production/activity ratio information; and source reduction activities 2b: Detailed on-site waste treatment methods and efficiency 3a: Transfers off site for disposal and further waste management 3b: Transfers to Publicly Owned Treatment Works (POTWs) (RY1987 - RY2010) 3c: Transfers to Publicly Owned Treatment Works (POTWs) (RY2011 - Present) 4: Facility information 5: Optional information on source reduction, recycling and pollution control (RY2005 - Present) 6: Additional miscellaneous and optional information (RY2010 - Present) Quantities of dioxin and dioxin-like compounds are reported in grams, while all other chemicals are reported in pounds. This webpage contains the most recent versions of all TRI data files; facilities may revise previous years' TRI submissions if necessary, and any such changes will be reflected in these files. For this reason, data contained in these files may differ from data used to construct the TRI National Analysis." [Quote from https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-calendar-years-1987-present]

  20. Security Request Documentation Process

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Security Request Documentation Process [Dataset]. https://catalog.data.gov/dataset/security-request-documentation-process
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    An Electronic Repository created to streamline the storing/recording of various Security Requests, including SSA-120s/1121s, ATSAFE-613, E-mails, etc

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mouhamet, sensitive_document_classification [Dataset]. https://huggingface.co/datasets/mouhamet/sensitive_document_classification

sensitive_document_classification

mouhamet/sensitive_document_classification

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Mouhamet
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Sensitive Document Classification

Preventing data violation becomes increasingly crucial. Several data breaches have been reported during the last years. To prevent data violation, we need to determine the sensitivity level of documents. Deep learning techniques perform well in document classification but require large amount of data. However, a lack of public dataset in this context, due to the sensitive nature of documents, prevent reseacher to to design powerful models. We… See the full description on the dataset page: https://huggingface.co/datasets/mouhamet/sensitive_document_classification.

Search
Clear search
Close search
Google apps
Main menu