benitoals/my-pdf-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Saivivek25/my-pdf-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About
IUST-PDFCorpus is a large set of various PDF files, aimed at building and manipulating new PDF files, to test, debug, and improve the qualification of real-world PDF readers such as Adobe Acrobat Reader, Foxit Reader, Nitro Reader, MuPDF. IUST-PDFCorpus contains 6,141 PDF complete files in various sizes and contents. The corpus includes 507,299 PDF data objects and 151,132 PDF streams extracted from the set of complete files. Data objects are in the textual format while streams have a binary format and together they make PDF files. In addition, we attached the code coverage of each PDF file when it used as test data in testing MuPDF. The coverage info is available in both binary and XML formats. PDF data objects are organized into three categories. The first category contains all objects in the corpus. Each file in this category holds all PDF objects extracted from one PDF file without any preprocessing. The second category is a dataset made by merging all files in the first category with some preprocessing. The dataset is spilled into train, test and validation set which is useful for using in the machine learning tasks. The third category is the same as the second category but in a smaller size for using in the developing stage of different algorithms. IUST-PDFCorpus is collected from various sources including the Mozilla PDF.js open test corpus, some PDFs which are used in AFL as initial seed, and PDFs gathered from existing e-books, software documents, and public web in different languages. We first introduced IUST-PDFCorpus in our paper “Format-aware learn&fuzz: deep test data generation for efficient fuzzing” where we used it to build an intelligent file format fuzzer, called IUST-DeepFuzz. For the time being, we are gathering other file formats to automate testing of related applications.
Citing IUST-PDFCorpus
If IUST-PDFCorpus is used in your work in any form please cite the relevant paper: https://arxiv.org/abs/1812.09961v2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
PDF Figure Detection is a dataset for object detection tasks - it contains Figures annotations for 264 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://choosealicense.com/licenses/osl-3.0/https://choosealicense.com/licenses/osl-3.0/
Atlas PDF Image Cluster Dataset
Derives from the following Python Pipeline code: https://github.com/atlasunified/PDF-to-Image-Cluster
Dataset Description
This dataset is a collection of text extracted from PDF files, originating from various online resources. The dataset was generated using a series of Python scripts forming a robust pipeline that automated the tasks of downloading, converting, and managing the data.
Dataset Summary
Sample JPG
Corresponding… See the full description on the dataset page: https://huggingface.co/datasets/AtlasUnified/atlas-pdf-img-cluster.
Lake Mead is a large interstate reservoir located in the Mojave Desert of southeastern Nevada and northwestern Arizona. It was impounded in 1935 by the construction of Hoover Dam and is one of a series of multi-purpose reservoirs on the Colorado River. The lake extends 183 km from the mouth of the Grand Canyon to Black Canyon, the site of Hoover Dam, and provides water for residential, commercial, industrial, recreational, and other non-agricultural users in communities across the southwestern United States. Extensive research has been conducted on Lake Mead, but a majority of the studies have involved determining levels of anthropogenic contaminants such as synthetic organic compounds, heavy metals and dissolved ions, furans/dioxins, and nutrient loading in lake water, sediment, and biota (Preissler, et al., 1998; Bevans et al, 1996; Bevans et al., 1998; Covay and Leiker, 1998; LaBounty and Horn, 1997; Paulson, 1981). By contrast, little work has focused on the sediments in the lake and the processes of deposition (Gould, 1951). To address these questions, sidescan-sonar imagery and high-resolution seismic-reflection profiles were collected throughout Lake Mead by the USGS in cooperation with researchers from University of Nevada Las Vegas (UNLV). These data allow a detailed mapping of the surficial geology and the distribution and thickness of sediment that has accumulated in the lake since the completion of Hoover Dam. Results indicate that the accumulation of post-impoundment sediment is primarily restricted to former river and stream beds that are now submerged below the lake while the margins of the lake appear to be devoid of post-impoundment sediment. The sediment cover along the original Colorado River bed is continuous and is typically greater than 10 m thick through much of its length. Sediment thickness in some areas exceeds 35 m while the smaller tributary valleys typically are filled with less than 4 m of sediment. Away from the river beds that are now covered with post-impoundment sediment, pre-impoundment alluvial deposits and rock outcrops are still exposed on the lake floor.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Pdf Batch_0 is a dataset for object detection tasks - it contains Exercise Problems annotations for 1,171 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
asoria/pdf-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PDF Solutions reported $0.19 in EPS Earnings Per Share for its fiscal quarter ending in June of 2025. Data for PDF Solutions | PDFS - EPS Earnings Per Share including historical, tables and charts were last updated by Trading Economics this last October in 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
See the corresponding blogpost
identifying bibliographic data and links to source PDFs here: http://dx.doi.org/10.6084/m9.figshare.105633
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PDF Solutions reported $134.79M in Current Assets for its fiscal quarter ending in June of 2025. Data for PDF Solutions | PDFS - Current Assets including historical, tables and charts were last updated by Trading Economics this last October in 2025.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global PDF reader software market size is USD 1958.2 million in 2024. It will expand at a compound annual growth rate (CAGR) of 13.30% from 2024 to 2031.
North America held the major market share for more than 40% of the global revenue with a market size of USD 783.28 million in 2024 and will grow at a compound annual growth rate (CAGR) of 11.5% from 2024 to 2031.
Europe accounted for a market share of over 30% of the global revenue with a market size of USD 587.46 million.
Asia Pacific held a market share of around 23% of the global revenue with a market size of USD 450.39 million in 2024 and will grow at a compound annual growth rate (CAGR) of 15.3% from 2024 to 2031.
Latin America had a market share for more than 5% of the global revenue with a market size of USD 97.91 million in 2024 and will grow at a compound annual growth rate (CAGR) of 12.7% from 2024 to 2031.
Middle East and Africa had a market share of around 2% of the global revenue and was estimated at a market size of USD 39.16 million in 2024 and will grow at a compound annual growth rate (CAGR) of 13.0% from 2024 to 2031.
The without editional function held the highest PDF reader software market revenue share in 2024.
Market Dynamics of PDF reader software Market
Key Drivers for PDF reader software Market
Growing adoption of digital documents to increase the demand globally
The growing adoption of digital documents is significantly increasing demand globally for PDF reader software. As businesses and individuals transition towards digital workflows, the need for efficient document management tools becomes paramount. Digital documents offer advantages such as easier storage, faster retrieval, and reduced environmental impact compared to traditional paper-based systems. This shift is particularly evident in sectors like finance, healthcare, education, and legal services, where paper-intensive processes are being replaced by digital solutions. Furthermore, the rise in remote work and virtual collaboration due to global events has accelerated this trend, driving up the demand for versatile PDF readers capable of supporting seamless document sharing, annotation, and editing across different devices and platforms. As a result, PDF reader software providers are poised to capitalize on these trends by continually innovating and enhancing their offerings to meet the evolving needs of digital document users worldwide.
Rising mobile device usage to propel market growth
The increasing prevalence of mobile devices is a significant catalyst for market growth in PDF reader software. With more people relying on smartphones and tablets as primary computing devices, the demand for mobile-friendly PDF readers is on the rise. Mobile devices enable users to access and interact with documents on the go, enhancing productivity and convenience. This trend is particularly pronounced in sectors such as sales, field service, and education, where mobile devices facilitate real-time access to critical documents and information. PDF reader software that optimizes for mobile platforms by offering intuitive interfaces, responsive design, and features like annotation and cloud integration stands to capitalize on this trend. As mobile device usage continues to grow globally, PDF reader providers have a strategic opportunity to innovate and expand their market presence by catering to the evolving needs of mobile-centric users.
Restraint Factor for the PDF reader software Market
Competition from free alternatives to Limit the Sales
Competition from free alternatives poses a significant challenge to the sales potential of PDF reader software. Many users opt for freely available PDF readers like Adobe Acrobat Reader DC, Foxit Reader, or built-in PDF viewers in operating systems, which offer basic functionalities without requiring payment. These free alternatives often satisfy the needs of casual users who only require simple document viewing and basic interaction features. To counter this competition, paid PDF reader software must differentiate themselves by offering compelling value propositions such as advanced editing capabilities, enhanced security features, seamless integration with other software ecosystems, and superior customer support. Furthermore, emphasizing additional benefits such as improved user experience, regular updates, and specialized featur...
broadfield-dev/pdf-ocr-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The PDF Editor Software market size is poised to witness significant growth from 2024 to 2032, with a projected CAGR of 11.5% during this period. In 2023, the global market size was valued at approximately USD 1.5 billion and is expected to reach USD 4.1 billion by 2032. This rapid expansion is driven by increasing digitalization, the rising need for efficient document management, and the growing adoption of electronic signatures in various sectors.
One of the primary growth factors contributing to this market surge is the ubiquitous adoption of digital documentation across industries. The shift from paper-based processes to digital solutions has been accelerated by the global move towards sustainability and efficiency. Enterprises and government bodies are increasingly deploying PDF editor software to streamline their document management processes, which significantly reduces operational costs and enhances productivity. Moreover, the integration of advanced features such as Optical Character Recognition (OCR) and AI-based editing tools in PDF editors has further fueled their adoption.
Another critical driver for the PDF Editor Software market is the rise in remote working and the demand for collaborative tools. The COVID-19 pandemic has prompted a permanent shift towards remote and hybrid work environments, necessitating efficient digital tools to manage and edit documents. PDF editor software has become indispensable for professionals working remotely, enabling seamless collaboration, editing, and sharing of documents in real-time. This trend is expected to continue, further propelling the demand for PDF editor software in the coming years.
The increasing demand for enhanced security features in document management systems is also a significant growth factor. With the rise in cyber threats and data breaches, organizations are prioritizing the security of their digital documents. PDF editor software that offers robust security features such as encryption, password protection, and secure sharing capabilities is witnessing higher adoption rates. This focus on security is particularly pronounced in sectors such as finance, healthcare, and government, where the confidentiality of documents is paramount.
Regionally, North America currently holds the largest market share and is expected to maintain its dominance throughout the forecast period. The region's advanced IT infrastructure, coupled with the high adoption rate of digital technologies among enterprises, drives this dominance. Furthermore, the presence of major PDF editor software providers in the region contributes to the sustained market growth. However, the Asia Pacific region is anticipated to register the highest CAGR due to the rapid digital transformation in emerging economies and increasing investments in IT infrastructure.
The PDF Editor Software market is segmented by components into software and services. The software segment dominates the market and is expected to maintain its lead throughout the forecast period. This segment includes standalone PDF editor applications as well as integrated solutions within larger document management systems. The continuous advancements in software features, such as enhanced user interfaces, cloud integration, and AI capabilities, are driving the adoption of PDF editor software. Additionally, the increasing availability of subscription-based pricing models has made these software solutions more accessible to a broader range of users.
On the other hand, the services segment, though smaller, plays a crucial role in the overall market. This includes various support services, such as implementation, training, and maintenance, which are essential for the effective utilization of PDF editor software. Managed services are also gaining traction, offering enterprises the convenience of outsourcing their document management needs. The rising complexity of digital document workflows and the need for customized solutions are further fueling the demand for professional services in this segment.
The integration of cloud services with PDF editor software is another noteworthy trend within the component segment. Cloud-based PDF editors offer several advantages, including easier accessibility, real-time collaboration, and automatic updates. These benefits are particularly appealing to small and medium enterprises (SMEs) that may lack the resources to maintain extensive IT infrastructure. As a result, the services segment is witnessing a growing demand for cloud management and support
This study was conducted with a study group consisting of parents to define the relationship between obesity and the family environment related to nutrition and physical activity in school-aged children aged 5–14 years and to determine the relationship with the variables of school level, gender, and parental education level affecting this environment. The study was conducted online with 531 parents—289 male (father) and 242 female (mother)—who have children in preschool, primary, and secondary school during the fall semester of 2024. Data were collected with questions designed to determine sociodemographic characteristics, and the Family Nutrition and Physical Activity Screening Scale (FNPA-TR) was adapted into Turkish. The relationships between the scores obtained from the FNPA scale and children's body mass index (BMI), as well as some socio-demographic variables, were examined using the appropriate variance model and correlation analysis according to the structure and distribution of the data. When examining the results of this study, it was revealed that the higher education level of parents contributes to children having lower BMI values. In addition, it was observed that family and child activities play an important role in children's BMI, and children with lower BMI were more active. A healthy environment and family sleep patterns were also found to positively affect BMI. The gender of the children did not make a significant difference in BMI. It is clear that family dietary habits and physical activity levels are important factors influencing childhood obesity risk, but family eating patterns and dietary habits do not directly influence BMI in interaction with environmental factors.
A dataset of mentions, growth rate, and total volume of the keyphrase 'Pdf' over time.
This is the landing page for generating PDF reports for Form 6180.54 Rail Equipment Accident/Incident, Form 6180.55a Injury/Illness [Casualty], Form 6180.57 Highway-Rail Grade Crossing Accident/Incident and Form 6180.71 Crossing Inventory.
Demo of a PDF document
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This corpus consists of 2110 PDF Files and 2110 XML files with the text extracted from the PDF files. All PDF files are contracts in German publically available on the internet. Most of these contracts are from the city governments of Hamburg and Bremen and were collected from the websites http://suche.transparenz.hamburg.de/dataset?q=vertrag&esq_title=&check_all_ and https://www.transparenz.bremen.de.
In the XML files the texts are segmented into sentences. Each sentence also has some additional information on the freuency of use in the corpus.
The root of each XML file is the element document, that has a referece to the original PDF in an attribute. A document is divided into pages. Pages then consists of the elements heading and sentence. Each sentence has two identifiers, sid for the sentece and cid for the cluster it belongs to. Sentenecs with the same sentence identifier are identical. Sentences with the same cluster identifier are very similar but not necessarily identical. Sentences were clustered with single link clustering based on trigram (character) overlap.
The corpus consists of 106,539 (non-unique) sentences and 3,635,371 tokens, including interpunction.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Market Overview The global PDF Merge Software market is projected to reach USD XXX million by 2033, growing at a CAGR of XX% during the forecast period (2025-2033). The rising demand for efficient document management solutions, coupled with the increasing adoption of digital workflows, is driving the market growth. The ability of PDF Merge Software to combine multiple PDF files into a single cohesive document, simplifying editing, sharing, and storage, has made it indispensable for individuals and businesses alike. Key Drivers and Trends Key drivers propelling the market include the increasing popularity of cloud-based PDF Merge Software, offering greater accessibility and collaboration options. The adoption of mobile devices and the proliferation of remote work models further fuel demand for solutions that enable seamless document merging on any platform. Additionally, the growing awareness of data security and compliance regulations is driving the adoption of secure and compliant PDF Merge Software solutions. Trends shaping the market include the integration of artificial intelligence (AI) and machine learning (ML) technologies to automate document merging tasks, enhancing accuracy and efficiency. The emergence of advanced features, such as drag-and-drop functionality and real-time collaboration tools, is also contributing to the market's growth prospects.
benitoals/my-pdf-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community