100+ datasets found
  1. Dataset of pdf files

    • kaggle.com
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manisha717 (2024). Dataset of pdf files [Dataset]. https://www.kaggle.com/datasets/manisha717/dataset-of-pdf-files
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Manisha717
    Description

    The dataset consists of diverse PDF files covering a wide range of topics. These files include reports, articles, manuals, and more, spanning various fields such as science, technology, history, literature, and business. With its broad content, the dataset offers versatility for testing and various purposes, making it valuable for researchers, developers, educators, and enthusiasts alike.

  2. d

    Louisville Metro KY - Annual Open Data Report 2022

    • catalog.data.gov
    • data.louisvilleky.gov
    • +3more
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2025). Louisville Metro KY - Annual Open Data Report 2022 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-annual-open-data-report-2022
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    On August 25th, 2022, Metro Council Passed Open Data Ordinance; previously open data reports were published on Mayor Fischer's Executive Order, You can find here both the Open Data Ordinance, 2022 (PDF) and the Mayor's Open Data Executive Order, 2013 Open Data Annual ReportsPage 6 of the Open Data Ordinance, Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously, it was led by the former Data Officer, Michael Schnuerle and prior to that by Director of IT.Open Data Ordinance O-243-22 TextLouisville Metro GovernmentLegislation TextFile #: O-243-22, Version: 3ORDINANCE NO._, SERIES 2022AN ORDINANCE CREATING A NEW CHAPTER OF THE LOUISVILLE/JEFFERSONCOUNTY METRO CODE OF ORDINANCES CREATING AN OPEN DATA POLICYAND REVIEW. (AMENDMENT BY SUBSTITUTION)(AS AMENDED).SPONSORED BY: COUNCIL MEMBERS ARTHUR, WINKLER, CHAMBERS ARMSTRONG,PIAGENTINI, DORSEY, AND PRESIDENT JAMESWHEREAS, Metro Government is the catalyst for creating a world-class city that provides itscitizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovationand a high quality of life;WHEREAS, it should be easy to do business with Metro Government. Online governmentinteractions mean more convenient services for citizens and businesses and online governmentinteractions improve the cost effectiveness and accuracy of government operations;WHEREAS, an open government also makes certain that every aspect of the builtenvironment also has reliable digital descriptions available to citizens and entrepreneurs for deepengagement mediated by smart devices;WHEREAS, every citizen has the right to prompt, efficient service from Metro Government;WHEREAS, the adoption of open standards improves transparency, access to publicinformation and improved coordination and efficiencies among Departments and partnerorganizations across the public, non-profit and private sectors;WHEREAS, by publishing structured standardized data in machine readable formats, MetroGovernment seeks to encourage the local technology community to develop software applicationsand tools to display, organize, analyze, and share public record data in new and innovative ways;WHEREAS, Metro Government’s ability to review data and datasets will facilitate a betterUnderstanding of the obstacles the city faces with regard to equity;WHEREAS, Metro Government’s understanding of inequities, through data and datasets, willassist in creating better policies to tackle inequities in the city;WHEREAS, through this Ordinance, Metro Government desires to maintain its continuousimprovement in open data and transparency that it initiated via Mayoral Executive Order No. 1,Series 2013;WHEREAS, Metro Government’s open data work has repeatedly been recognized asevidenced by its achieving What Works Cities Silver (2018), Gold (2019), and Platinum (2020)certifications. What Works Cities recognizes and celebrates local governments for their exceptionaluse of data to inform policy and funding decisions, improve services, create operational efficiencies,and engage residents. The Certification program assesses cities on their data-driven decisionmakingpractices, such as whether they are using data to set goals and track progress, allocatefunding, evaluate the effectiveness of programs, and achieve desired outcomes. These datainformedstrategies enable Certified Cities to be more resilient, respond in crisis situations, increaseeconomic mobility, protect public health, and increase resident satisfaction; andWHEREAS, in commitment to the spirit of Open Government, Metro Government will considerpublic information to be open by default and will proactively publish data and data containinginformation, consistent with the Kentucky Open Meetings and Open Records Act.NOW, THEREFORE, BE IT ORDAINED BY THE COUNCIL OF THELOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:SECTION I: A new chapter of the Louisville Metro Code of Ordinances (“LMCO”) mandatingan Open Data Policy and review process is hereby created as follows:§ XXX.01 DEFINITIONS. For the purpose of this Chapter, the following definitions shall apply unlessthe context clearly indicates or requires a different meaning.OPEN DATA. Any public record as defined by the Kentucky Open Records Act, which could bemade available online using Open Format data, as well as best practice Open Data structures andformats when possible, that is not Protected Information or Sensitive Information, with no legalrestrictions on use or reuse. Open Data is not information that is treated as exempt under KRS61.878 by Metro Government.OPEN DATA REPORT. The annual report of the Open Data Management Team, which shall (i)summarize and comment on the state of Open Data availability in Metro Government Departmentsfrom the previous year, including, but not limited to, the progress toward achieving the goals of MetroGovernment’s Open Data portal, an assessment of the current scope of compliance, a list of datasetscurrently available on the Open Data portal and a description and publication timeline for datasetsenvisioned to be published on the portal in the following year; and (ii) provide a plan for the next yearto improve online public access to Open Data and maintain data quality.OPEN DATA MANAGEMENT TEAM. A group consisting of representatives from each Departmentwithin Metro Government and chaired by the Data Officer who is responsible for coordinatingimplementation of an Open Data Policy and creating the Open Data Report.DATA COORDINATORS. The members of an Open Data Management Team facilitated by theData Officer and the Office of Civic Innovation and Technology.DEPARTMENT. Any Metro Government department, office, administrative unit, commission, board,advisory committee, or other division of Metro Government.DATA OFFICER. The staff person designated by the city to coordinate and implement the city’sopen data program and policy.DATA. The statistical, factual, quantitative or qualitative information that is maintained or created byor on behalf of Metro Government.DATASET. A named collection of related records, with the collection containing data organized orformatted in a specific or prescribed way.METADATA. Contextual information that makes the Open Data easier to understand and use.OPEN DATA PORTAL. The internet site established and maintained by or on behalf of MetroGovernment located at https://data.louisvilleky.gov/ or its successor website.OPEN FORMAT. Any widely accepted, nonproprietary, searchable, platform-independent, machinereadablemethod for formatting data which permits automated processes.PROTECTED INFORMATION. Any Dataset or portion thereof to which the Department may denyaccess pursuant to any law, rule or regulation.SENSITIVE INFORMATION. Any Data which, if published on the Open Data Portal, could raiseprivacy, confidentiality or security concerns or have the potential to jeopardize public health, safety orwelfare to an extent that is greater than the potential public benefit of publishing that data.§ XXX.02 OPEN DATA PORTAL(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by MetroGovernment.(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an OpenFormat.(C) In the event a successor website is used, the Data Officer shall notify the Metro Council andshall provide notice to the public on the main city website.§ XXX.03 OPEN DATA MANAGEMENT TEAM(A) The Data Officer of Metro Government will work with the head of each Department to identify aData Coordinator in each Department. The Open Data Management Team will work to establish arobust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data Policy that will adopt prevailingOpen Format standards for Open Data and develop agreements with regional partners to publish andmaintain Open Data that is open and freely available while respecting exemptions allowed by theKentucky Open Records Act or other federal or state law.§ XXX.04 DEPARTMENT OPEN DATA CATALOGUE(A) Each Department shall retain ownership over the Datasets they submit to the Open DataPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityof the Dataset contents, including updating its Data and associated Metadata.(B) Each Department shall be responsible for creating an Open Data catalogue which shall includecomprehensive inventories of information possessed and/or managed by the Department.(C) Each Department’s Open Data catalogue will classify information holdings as currently “public”or “not yet public;” Departments will work with the Office of Civic Innovation and Technology todevelop strategies and timelines for publishing Open Data containing information in a way that iscomplete, reliable and has a high level of detail.§ XXX.05 OPEN DATA REPORT AND POLICY REVIEW(A) Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council anannual Open Data Report.(B) Metro Council may request a specific Department to report on any data or dataset that may bebeneficial or pertinent in implementing policy and legislation.(C) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy shouldshall be reviewed annually and considered for revisions or additions that will continue to positionMetro Government as a leader on issues of

  3. O

    Open Data Portal Tutorial for Maryland State Agencies

    • opendata.maryland.gov
    • datasets.ai
    • +1more
    csv, xlsx, xml
    Updated Feb 13, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Information Technology (2017). Open Data Portal Tutorial for Maryland State Agencies [Dataset]. https://opendata.maryland.gov/Administrative/Open-Data-Portal-Tutorial-for-Maryland-State-Agenc/qr3x-vmfc
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Feb 13, 2017
    Dataset authored and provided by
    Department of Information Technology
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Maryland
    Description

    This is a PDF document created by the Department of Information Technology (DoIT) and the Governor's Office of Performance Improvement to assist training Maryland state employees on use of the Open Data Portal, https://opendata.maryland.gov. This document covers direct data entry, uploading Excel spreadsheets, connecting source databases, and transposing data. Please note that this tutorial is intended for use by state employees, as non-state users cannot upload datasets to the Open Data Portal.

  4. a

    City of Rochester's Open Data Portal Terms of Use

    • hub.arcgis.com
    Updated Jun 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open_Data_Admin (2020). City of Rochester's Open Data Portal Terms of Use [Dataset]. https://hub.arcgis.com/documents/f404506f94bd47198b661f84f42eee50
    Explore at:
    Dataset updated
    Jun 2, 2020
    Dataset authored and provided by
    Open_Data_Admin
    Description

    The City's Open Data Portal provides free, public access to the City's data. For specific terms of use, please read this PDF document that outlines the terms of use for all content created by the City of Rochester that is hosted on this Open Data Portal. The document outlines:Disclaimer of liabilityUses of the data provided on the portalIndemnity clausePrivacy policyHow to report errors or problems on the siteIf you have any questions or concerns about this policy or any content on the portal, please email us at opendata@cityofrochester.gov.Note that content created by other organizations may follow different terms of use. Please reference the "Terms of Use" section on each item page to ensure proper use of any/all content.

  5. McKinsey opensource PDF dataset

    • kaggle.com
    zip
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Sharma (2024). McKinsey opensource PDF dataset [Dataset]. https://www.kaggle.com/datasets/rahultheogre/mckinsey-opensource-pdf-dataset
    Explore at:
    zip(24712244 bytes)Available download formats
    Dataset updated
    Jan 29, 2024
    Authors
    Rahul Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Rahul Sharma

    Released under CC0: Public Domain

    Contents

  6. Company Documents Dataset

    • kaggle.com
    zip
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayoub Cherguelaine (2024). Company Documents Dataset [Dataset]. https://www.kaggle.com/datasets/ayoubcherguelaine/company-documents-dataset
    Explore at:
    zip(9789538 bytes)Available download formats
    Dataset updated
    May 23, 2024
    Authors
    Ayoub Cherguelaine
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    This dataset contains a collection of over 2,000 company documents, categorized into four main types: invoices, inventory reports, purchase orders, and shipping orders. Each document is provided in PDF format, accompanied by a CSV file that includes the text extracted from these documents, their respective labels, and the word count of each document. This dataset is ideal for various natural language processing (NLP) tasks, including text classification, information extraction, and document clustering.

    Dataset Content

    PDF Documents: The dataset includes 2,677 PDF files, each representing a unique company document. These documents are derived from the Northwind dataset, which is commonly used for demonstrating database functionalities.

    The document types are:

    • Invoices: Detailed records of transactions between a buyer and a seller.
    • Inventory Reports: Records of inventory levels, including items in stock and units sold.
    • Purchase Orders: Requests made by a buyer to a seller to purchase products or services.
    • Shipping Orders: Instructions for the delivery of goods to specified recipients.

    Example Entries

    Here are a few example entries from the CSV file:

    Shipping Order:

    • Order ID: 10718
    • Shipping Details: "Ship Name: Königlich Essen, Ship Address: Maubelstr. 90, Ship City: ..."
    • Word Count: 120

    Invoice:

    • Order ID: 10707
    • Customer Details: "Customer ID: Arout, Order Date: 2017-10-16, Contact Name: Th..."
    • Word Count: 66

    Purchase Order:

    • Order ID: 10892
    • Order Details: "Order Date: 2018-02-17, Customer Name: Catherine Dewey, Products: Product ..."
    • Word Count: 26

    Applications

    This dataset can be used for:

    • Text Classification: Train models to classify documents into their respective categories.
    • Information Extraction: Extract specific fields and details from the documents.
    • Document Clustering: Group similar documents together based on their content.
    • OCR and Text Mining: Improve OCR (Optical Character Recognition) models and text mining techniques using real-world data.
  7. a

    1 - Open Data User Guide (PDF)

    • hub.arcgis.com
    Updated Feb 3, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avon and Somerset Constabulary (2017). 1 - Open Data User Guide (PDF) [Dataset]. https://hub.arcgis.com/datasets/ASPolice::1-open-data-user-guide-pdf
    Explore at:
    Dataset updated
    Feb 3, 2017
    Dataset authored and provided by
    Avon and Somerset Constabulary
    Description

    Downloadable PDF user guide / FAQs for Partners and public users of our (ASC) open Data portal.

  8. O

    Release Program - Department of the Premier and Cabinet dataset listing

    • data.qld.gov.au
    • researchdata.edu.au
    • +1more
    csv
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Premier and Cabinet (2025). Release Program - Department of the Premier and Cabinet dataset listing [Dataset]. https://www.data.qld.gov.au/dataset/release-program-department-of-the-premier-and-cabinet-dataset-listing
    Explore at:
    csv(5.7 KiB)Available download formats
    Dataset updated
    Jun 16, 2025
    Dataset authored and provided by
    Premier and Cabinet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A list of the Department of the Premier and Cabinet (DPC) datasets assessed for release to the open data portal in line with the department's Open Data Strategy

  9. Government Open Data Management Platform Market Analysis, Size, and Forecast...

    • technavio.com
    pdf
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Government Open Data Management Platform Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Italy, and UK), APAC (Australia, China, and India), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/government-open-data-management-platform-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 20, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    United States
    Description

    Snapshot img

    Government Open Data Management Platform Market Size 2025-2029

    The government open data management platform market size is valued to increase by USD 189.4 million, at a CAGR of 12.5% from 2024 to 2029. Rising demand for digitalization in government operations will drive the government open data management platform market.

    Market Insights

    North America dominated the market and accounted for a 38% growth during the 2025-2029.
    By End-user - Large enterprises segment was valued at USD 108.50 million in 2023
    By Deployment - On-premises segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 138.56 million 
    Market Future Opportunities 2024: USD 189.40 million
    CAGR from 2024 to 2029 : 12.5%
    

    Market Summary

    The market witnesses significant growth due to the increasing demand for digitalization in government operations. Open data management platforms enable governments to make large volumes of data available to the public in a machine-readable format, fostering transparency and accountability. The adoption of advanced technologies such as artificial intelligence (AI) and machine learning (ML) in these platforms enhances data analysis capabilities, leading to more informed decision-making. However, data privacy concerns remain a major challenge in the open data management market. Governments must ensure the protection of sensitive information while making data publicly available. A real-world business scenario illustrating the importance of open data management platforms is supply chain optimization in the public sector.
    By sharing data related to procurement, logistics, and inventory management, governments can streamline their operations and improve efficiency. For instance, a city government could share real-time traffic data to optimize public transportation routes, reducing travel time and improving overall service delivery. Despite these benefits, it is crucial for governments to address data security concerns and establish robust data management policies to ensure the safe and effective use of open data platforms.
    

    What will be the size of the Government Open Data Management Platform Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    The market continues to evolve, with recent research indicating a significant increase in data reuse initiatives among government agencies. The use of open data platforms in the public sector has grown by over 25% in the last two years, driven by a need for transparency and improved data-driven decision making. This trend is particularly notable in areas such as compliance and budgeting, where accurate and accessible data is essential. Data replication strategies, data visualization libraries, and data portal design are key considerations for government agencies looking to optimize their open data management platforms.
    Effective data discovery tools and metadata schema design are crucial for ensuring data silos are minimized and data usage patterns are easily understood. Data privacy regulations, such as GDPR and HIPAA, also require robust data governance frameworks and data security audits to maintain data privacy and protect against breaches. Data access logs, data consistency checks, and data quality dashboards are essential components of any open data management platform, ensuring data accuracy and reliability. Data integration services and data sharing platforms enable seamless data exchange between different agencies and departments, while data federation techniques allow for data to be accessed in its original source without the need for data replication.
    Ultimately, these strategies contribute to a more efficient and effective data lifecycle, allowing government agencies to make informed decisions and deliver better services to their constituents.
    

    Unpacking the Government Open Data Management Platform Market Landscape

    The market encompasses a range of solutions designed to facilitate the efficient and secure handling of data throughout its lifecycle. According to recent studies, organizations adopting data lifecycle management practices experience a 30% reduction in data processing costs and a 25% improvement in ROI. Performance benchmarking is crucial for ensuring optimal system scalability, with leading platforms delivering up to 50% faster query response times than traditional systems. Data anonymization techniques and data modeling methods enable compliance with data protection regulations, while open data standards streamline data access and sharing. Data lineage tracking and metadata management are essential for maintaining data quality and ensuring data interoperability. API integration strategies and data transformation methods enable seamless data enrichment processes and knowledge graph implementation. Data access control, data versioning, and data security protocols

  10. w

    2018 Open Data Plan: FOIL Summary Statistics

    • data.wu.ac.at
    csv, json, xml
    Updated Oct 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). 2018 Open Data Plan: FOIL Summary Statistics [Dataset]. https://data.wu.ac.at/schema/data_ny_gov/Y3ZzZS1wZXJk
    Explore at:
    json, xml, csvAvailable download formats
    Dataset updated
    Oct 4, 2018
    Description

    Local Law 7 of 2016 requires agencies to “review responses to freedom of information law [FOIL] requests that include the release of data to determine if such responses consist of or include public data sets that have not yet been included on the single web portal or the inclusion” on the Open Data Portal. Additionally, each City agency shall disclose “the total number, since the last update, of such agency’s freedom of information law responses that included the release of data, the total number of such responses determined to consist of or include a public data set that had not yet been included on the single web portal and the name of such public data set, where applicable, and the total number of such responses that resulted in voluntarily disclosed information being made accessible through the single web portal.”

    See the itemized public datasets used to respond to FOIL requests not yet published on the Open Data Portal in FY2018 here: https://data.cityofnewyork.us/City-Government/2018-Open-Data-Plan-FOIL-Datasets/sjdi-a6us

    See the 2018 Open Data for All Report and Open Data Plan here: https://opendata.cityofnewyork.us/wp-content/uploads/2018/09/2018-NYC-OD4A-report.pdf

  11. Dataset - Understanding the software and data used in the social sciences

    • eprints.soton.ac.uk
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna (2023). Dataset - Understanding the software and data used in the social sciences [Dataset]. http://doi.org/10.5281/zenodo.7785710
    Explore at:
    Dataset updated
    Mar 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna
    Description

    This is a repository for a UKRI Economic and Social Research Council (ESRC) funded project to understand the software used to analyse social sciences data. Any software produced has been made available under a BSD 2-Clause license and any data and other non-software derivative is made available under a CC-BY 4.0 International License. Note that the software that analysed the survey is provided for illustrative purposes - it will not work on the decoupled anonymised data set. Exceptions to this are: Data from the UKRI ESRC is mostly made available under a CC BY-NC-SA 4.0 Licence. Data from Gateway to Research is made available under an Open Government Licence (Version 3.0). Contents Survey data & analysis: esrc_data-survey-analysis-data.zip Other data: esrc_data-other-data.zip Transcripts: esrc_data-transcripts.zip Data Management Plan: esrc_data-dmp.zip Survey data & analysis The survey ran from 3rd February 2022 to 6th March 2023 during which 168 responses were received. Of these responses, three were removed because they were supplied by people from outside the UK without a clear indication of involvement with the UK or associated infrastructure. A fourth response was removed as both came from the same person which leaves us with 164 responses in the data. The survey responses, Question (Q) Q1-Q16, have been decoupled from the demographic data, Q17-Q23. Questions Q24-Q28 are for follow-up and have been removed from the data. The institutions (Q17) and funding sources (Q18) have been provided in a separate file as this could be used to identify respondents. Q17, Q18 and Q19-Q23 have all been independently shuffled. The data has been made available as Comma Separated Values (CSV) with the question number as the header of each column and the encoded responses in the column below. To see what the question and the responses correspond to you will have to consult the survey-results-key.csv which decodes the question and responses accordingly. A pdf copy of the survey questions is available on GitHub. The survey data has been decoupled into: survey-results-key.csv - maps a question number and the responses to the actual question values. q1-16-survey-results.csv- the non-demographic component of the survey responses (Q1-Q16). q19-23-demographics.csv - the demographic part of the survey (Q19-Q21, Q23). q17-institutions.csv - the institution/location of the respondent (Q17). q18-funding.csv - funding sources within the last 5 years (Q18). Please note the code that has been used to do the analysis will not run with the decoupled survey data. Other data files included CleanedLocations.csv - normalised version of the institutions that the survey respondents volunteered. DTPs.csv - information on the UKRI Doctoral Training Partnerships (DTPs) scaped from the UKRI DTP contacts web page in October 2021. projectsearch-1646403729132.csv.gz - data snapshot from the UKRI Gateway to Research released on the 24th February 2022 made available under an Open Government Licence. locations.csv - latitude and longitude for the institutions in the cleaned locations. subjects.csv - research classifications for the ESRC projects for the 24th February data snapshot. topics.csv - topic classification for the ESRC projects for the 24th February data snapshot. Interview transcripts The interview transcripts have been anonymised and converted to markdown so that it's easier to process in general. List of interview transcripts: 1269794877.md 1578450175.md 1792505583.md 2964377624.md 3270614512.md 40983347262.md 4288358080.md 4561769548.md 4938919540.md 5037840428.md 5766299900.md 5996360861.md 6422621713.md 6776362537.md 7183719943.md 7227322280.md 7336263536.md 75909371872.md 7869268779.md 8031500357.md 9253010492.md Data Management Plan The study's Data Management Plan is provided in PDF format and shows the different data sets used throughout the duration of the study and where they have been deposited, as well as how long the SSI will keep these records.

  12. Data PDF to Text

    • kaggle.com
    zip
    Updated Jan 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyễn Gia Bảo (2024). Data PDF to Text [Dataset]. https://www.kaggle.com/datasets/baorbaor/data-pdf-to-text
    Explore at:
    zip(17483337 bytes)Available download formats
    Dataset updated
    Jan 13, 2024
    Authors
    Nguyễn Gia Bảo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Nguyễn Gia Bảo

    Released under Apache 2.0

    Contents

  13. d

    2018 NYC Open Data Plan: Future Releases

    • datasets.ai
    23, 40, 55, 8
    Updated Nov 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of New York (2020). 2018 NYC Open Data Plan: Future Releases [Dataset]. https://datasets.ai/datasets/2018-nyc-open-data-plan-future-releases
    Explore at:
    23, 8, 55, 40Available download formats
    Dataset updated
    Nov 10, 2020
    Dataset authored and provided by
    City of New York
    Description

    This inventory includes all datasets scheduled for release after September 15, 2018.

    See the 2018 Open Data for All Report and Open Data Plan here: https://opendata.cityofnewyork.us/wp-content/uploads/2018/09/2018-NYC-OD4A-report.pdf

  14. P

    Printable Maps

    • data.pompanobeachfl.gov
    Updated Jan 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Datasets (2022). Printable Maps [Dataset]. https://data.pompanobeachfl.gov/dataset/printable-maps
    Explore at:
    arcgis geoservices rest api, htmlAvailable download formats
    Dataset updated
    Jan 11, 2022
    Dataset provided by
    cjennings_BCGIS
    Authors
    External Datasets
    Description

    {{description}}

  15. P

    GeoHub Gallery PDF Navigation Gallery

    • data.pompanobeachfl.gov
    • geohub-bcgis.opendata.arcgis.com
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Datasets (2023). GeoHub Gallery PDF Navigation Gallery [Dataset]. https://data.pompanobeachfl.gov/dataset/geohub-gallery-pdf-navigation-gallery
    Explore at:
    html, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    cjennings_BCGIS
    Authors
    External Datasets
    Description

    Portal gallery for all printable maps. These maps are PDF format.

  16. f

    Data Sheet 1_Challenges of open data in aquatic sciences: issues faced by...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barbosa, Carolina C.; Calhoun-Grosch, Stacy; Pilla, Rachel M.; Ladwig, Robert; Lewis, Abigail S. L.; Münzner, Karla; La Fuente, R. Sofia; Suresh, Keerthana; Grossart, Hans-Peter; Olsson, Freya; Nkwalale, Lipa G. T.; Wain, Danielle J.; Mesman, Jorrit P. (2024). Data Sheet 1_Challenges of open data in aquatic sciences: issues faced by data users and data providers.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001417085
    Explore at:
    Dataset updated
    Dec 17, 2024
    Authors
    Barbosa, Carolina C.; Calhoun-Grosch, Stacy; Pilla, Rachel M.; Ladwig, Robert; Lewis, Abigail S. L.; Münzner, Karla; La Fuente, R. Sofia; Suresh, Keerthana; Grossart, Hans-Peter; Olsson, Freya; Nkwalale, Lipa G. T.; Wain, Danielle J.; Mesman, Jorrit P.
    Description

    Free use and redistribution of data (i.e., Open Data) increases the reproducibility, transparency, and pace of aquatic sciences research. However, barriers to both data users and data providers may limit the adoption of Open Data practices. Here, we describe common Open Data challenges faced by data users and data providers within the aquatic sciences community (i.e., oceanography, limnology, hydrology, and others). These challenges were synthesized from literature, authors’ experiences, and a broad survey of 174 data users and data providers across academia, government agencies, industry, and other sectors. Through this work, we identified seven main challenges: 1) metadata shortcomings, 2) variable data quality and reusability, 3) open data inaccessibility, 4) lack of standardization, 5) authorship and acknowledgement issues 6) lack of funding, and 7) unequal barriers around the globe. Our key recommendation is to improve resources to advance Open Data practices. This includes dedicated funds for capacity building, hiring and maintaining of skilled personnel, and robust digital infrastructures for preparation, storage, and long-term maintenance of Open Data. Further, to incentivize data sharing we reinforce the need for standardized best practices to handle data acknowledgement and citations for both data users and data providers. We also highlight and discuss regional disparities in resources and research practices within a global perspective.

  17. Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katharina Zinke; Katharina Zinke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Dresden
    Description

    Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

    This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

    The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

    ## Data sources

    Folder 01_SourceData/

    - PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

    - ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

    - ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

    - Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

    ## Automatic classification

    Folder 02_AutomaticClassification/

    - (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

    - (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

    - PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

    - oddpub_results_wDOIs.csv (results file of the ODDPub classification)

    - PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

    ## Manual coding

    Folder 03_ManualCheck/

    - CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

    - ManualCheck_2023-06-08.csv (Manual coding results file)

    - PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

    ## Explorative analysis for the discoverability of open data

    Folder04_FurtherAnalyses

    Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

    ## R-Script

    Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

  18. P

    GeoHub Gallery PDF Capital Improvements

    • data.pompanobeachfl.gov
    Updated Oct 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Datasets (2021). GeoHub Gallery PDF Capital Improvements [Dataset]. https://data.pompanobeachfl.gov/dataset/geohub-gallery-pdf-capital-improvements
    Explore at:
    arcgis geoservices rest api, htmlAvailable download formats
    Dataset updated
    Oct 14, 2021
    Dataset provided by
    cjennings_BCGIS
    Authors
    External Datasets
    Description

    {{description}}

  19. Cambridgeshire Insight Open Data inventory list - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 25, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2014). Cambridgeshire Insight Open Data inventory list - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/cambridgeshire-insight-open-data-inventory-list
    Explore at:
    Dataset updated
    Jun 25, 2014
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Cambridgeshire
    Description

    The Dataset Inventory is a list of all the electronic data held by the Cambridgeshire Insight: Open Data website (http://opendata.cambridgeshireinsight.org.uk). It contains data from a variety of information themes and organisations. Dataset Sets of records, however structured, and not just datasets according to standard database terminology. Resource The resource(s) for the dataset. Here they are either: “Data†(files or feeds/streams of data records available), or “Document†(files or web pages describing datasets). Rendition Relates to the resource and is the format that the resource is available in (can be multiple). In regards to data they could be, for example, in csv, xml, pdf format. They may represent physical files or API calls. For a document, this could be as odf, pdf, and html for example

  20. O

    Charles County Real Property Assessments: Hidden Property Owner Names

    • opendata.maryland.gov
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SDAT (State Department of Assessments and Taxation) and MDP (Maryland Department of Planning) (2025). Charles County Real Property Assessments: Hidden Property Owner Names [Dataset]. https://opendata.maryland.gov/Business-and-Economy/Charles-County-Real-Property-Assessments-Hidden-Pr/bjbs-4wm4
    Explore at:
    xml, kmz, application/geo+json, xlsx, kml, csvAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    SDAT (State Department of Assessments and Taxation) and MDP (Maryland Department of Planning)
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Charles County
    Description

    Please read all metadata before accessing the dataset. Note that records shown here are updated at different frequencies from data in products from MDP and SDAT. Please see the full documentation at https://opendata.maryland.gov/api/views/ed4q-f8tm/files/WtRzMltUzm25OasOCYtu7PgOGUfrplWsZTalSH4Iukg?download=true&filename=Real%20Property%20Records%20Documentation.pdf and review the dedicated metadata site (https://opendata.maryland.gov/dataset/Beta-Maryland-Statewide-Real-Property-Assessments-/ed4q-f8tm/about).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Manisha717 (2024). Dataset of pdf files [Dataset]. https://www.kaggle.com/datasets/manisha717/dataset-of-pdf-files
Organization logo

Dataset of pdf files

Dataset of pdf for testing

Explore at:
18 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 1, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Manisha717
Description

The dataset consists of diverse PDF files covering a wide range of topics. These files include reports, articles, manuals, and more, spanning various fields such as science, technology, history, literature, and business. With its broad content, the dataset offers versatility for testing and various purposes, making it valuable for researchers, developers, educators, and enthusiasts alike.

Search
Clear search
Close search
Google apps
Main menu