27 datasets found
  1. Datasets and Models for Historical Newspaper Article Segmentation

    • zenodo.org
    • explore.openaire.eu
    json, txt, zip
    Updated Jan 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira; Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira (2021). Datasets and Models for Historical Newspaper Article Segmentation [Dataset]. http://doi.org/10.5281/zenodo.3706863
    Explore at:
    json, txt, zipAvailable download formats
    Dataset updated
    Jan 31, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira; Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira
    Description

    This record contains the datasets and models used and produced for the work reported in the paper "Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers" (link).

    Please cite this paper if you are using the models/datasets or find it relevant to your research:

    @article{barman_combining_2020,
      title = {{Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers}},
      author = {Raphaël Barman and Maud Ehrmann and Simon Clematide and Sofia Ares Oliveira and Frédéric Kaplan},
      journal= {Journal of Data Mining \& Digital Humanities},
      volume= {HistoInformatics}
      DOI = {10.5281/zenodo.4065271},
      year = {2021},
      url = {https://jdmdh.episciences.org/7097},
    }


    Please note that this record contains data under different licenses.

    1. DATA

    • Annotations (json files): JSON files contains image annotations, with one file per newspaper containing region annotations (label and coordinates) in VIA format. The following licenses apply:
      • luxwort.json: those annotations are under a CC0 1.0 license. Please refer to the right statement specified for each image in the file.
      • GDL.json, IMP.json and JDG.json: those annotations are under a CC BY-SA 4.0 license.

    • Image files: The archive images.zip contains the Swiss titles image files (GDL, IMP, JDG) used for the experiments described in the paper. Those images are under copyright (property of the journal Le Temps and of ArcInfo) and can be used for academic research or educational purposes only. Redistribution, publication or commercial use are not permitted. These terms of use are similar to the following right statement: http://rightsstatements.org/vocab/InC-EDU/1.0/

    2. MODELS

    Some of the best models are released under a CC BY-SA 4.0 license (they are also available as assets of the current Github release).

    • JDG_flair-FT: this model was trained on JDG using french Flair and FastText embeddings. It is able to predict the four classes presented in the paper (Serial, Weather, Death notice and Stocks).
    • Luxwort_obituary_flair-bpemb: this model was trained on Luxwort using multilingual Flair and Byte-pair embeddings. It is able to predict the Death notice class.
    • Luxwort_obituary_flair-FT_indomain: this model was trained on Luxwort using in-domain Flair and FastText embeddings (trained on Luxwort data). It is also able to predict the Death notice class.

    Those models can be used to predict probabilities on new images using the same code as in the original dhSegment repository. One needs to adjust three parameters to the predict function: 1) embeddings_path (the path to the embeddings list), 2) embeddings_map_path(the path to the compressed embedding map), and 3) embeddings_dim (the size of the embeddings).

    Please refer to the paper for further information or contact us.

    3. CODE:

    https://github.com/dhlab-epfl/dhSegment-text


    4. ACKNOWLEDGEMENTS
    We warmly thank the journal Le Temps (owner of La Gazette de Lausanne and the Journal de Genève) and the group ArcInfo (owner of L'Impartial) for accepting to share the related datasets for academic purposes. We also thank the National Library of Luxembourg for its support with all steps related to the Luxemburger Wort annotation release.
    This work was realized in the context of the impresso - Media Monitoring of the Past project and supported by the Swiss National Science Foundation under grant CR- SII5_173719.

    5. CONTACT
    Maud Ehrmann (EPFL-DHLAB)
    Simon Clematide (UZH)

  2. A hotel's customers dataset

    • kaggle.com
    Updated Nov 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuno Antonio (2020). A hotel's customers dataset [Dataset]. https://www.kaggle.com/nantonio/a-hotels-customers-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nuno Antonio
    Description

    Context

    This real-world customer dataset with 31 variables describes 83,590 instances (customers) from a hotel in Lisbon, Portugal.

    Content

    The data comprehends three full years of customer personal, behavioral, demographic, and geographical information.

    Acknowledgements

    Additional information on this dataset can be found in the article A Hotel's customers personal, behavioral, demographic, and geographic dataset from Lisbon, Portugal (2015-2018), written by Nuno Antonio, Ana de Almeida, and Luis Nunes for Data in Brief (online November 2020).

    Inspiration

    This dataset can be used in data mining, machine learning, and other analytical field problems in the scope of data science. Due to its unit of analysis, it is a dataset especially suitable for building customer segmentation models, including clustering and RFM (Recency, Frequency, and Monetary value) models, but also be used in classification and regression problems.

  3. S

    An open-pit mine segmentation dataset for deep learning

    • scidb.cn
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lin Gang (2024). An open-pit mine segmentation dataset for deep learning [Dataset]. http://doi.org/10.57760/sciencedb.15701
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Lin Gang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The open-pit mine segmentation dataset for deep learning is a carefully curated resource. It was developed through a systematic process. Firstly, by conducting comprehensive literature research, the Point of Interest (POI) data of open-pit mines was summarized. Based on this, Google Level 17 images were obtained. Then, through painstaking manual annotation work, the boundaries of the mines were precisely marked out. The dataset provides the boundary information in YOLO format, which is highly conducive to the training and optimization of deep learning models for tasks related to open-pit mine segmentation. It enables researchers and practitioners in the field of deep learning to have a reliable and accurate dataset at their disposal, facilitating the development of more effective algorithms and applications for understanding and analyzing the characteristics and patterns of open-pit mines, which can have significant implications for mine management, safety monitoring, and resource optimization in the open-pit mining industry.

  4. Global Data Mining and Modeling Market Future Outlook 2025-2032

    • statsndata.org
    excel, pdf
    Updated Apr 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Data Mining and Modeling Market Future Outlook 2025-2032 [Dataset]. https://www.statsndata.org/report/data-mining-and-modeling-market-51239
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Apr 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Data Mining and Modeling market has emerged as a cornerstone of decision-making across various industries, leveraging vast amounts of data to derive meaningful insights that drive strategic actions. With its roots in statistics and machine learning, data mining involves extracting patterns from large datasets, w

  5. D

    Deep Learning Market Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Deep Learning Market Report [Dataset]. https://www.marketreportanalytics.com/reports/deep-learning-market-11200
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The deep learning market, valued at $4.97 billion in 2025, is experiencing rapid expansion, projected to grow at a compound annual growth rate (CAGR) of 26.06% from 2025 to 2033. This robust growth is fueled by several key drivers. The increasing availability of large datasets and powerful computing resources, including specialized hardware like GPUs and TPUs, are enabling the development and deployment of increasingly sophisticated deep learning models. Furthermore, the rising adoption of deep learning across diverse applications, such as image and voice recognition, video surveillance and diagnostics, and data mining, is significantly contributing to market expansion. The demand for automation, improved accuracy in various tasks, and the ability to extract valuable insights from complex data are driving businesses across sectors to integrate deep learning solutions. Significant advancements in algorithmic efficiency and the emergence of novel architectures, such as transformer networks, are further accelerating market growth. Competition is intense, with major technology companies like Google, Amazon, Microsoft, and NVIDIA leading the charge, alongside specialized deep learning startups. However, challenges remain, including the need for skilled professionals to develop and maintain these systems, ethical concerns surrounding algorithmic bias, and the high computational costs associated with training complex models. The market segmentation reveals significant opportunities. The software segment currently dominates, driven by the development of user-friendly frameworks and libraries. However, the hardware segment is anticipated to witness significant growth, fueled by advancements in specialized processors and memory technologies designed to accelerate deep learning computations. Geographically, North America and Europe currently hold the largest market share due to established technological infrastructure and high adoption rates. However, the Asia-Pacific region is expected to experience substantial growth in the coming years, driven by increasing digitalization and government investments in AI technologies. The competitive landscape is characterized by a mix of established technology giants and innovative startups, leading to ongoing innovation and competitive pricing. This dynamic environment necessitates continuous adaptation and innovation to maintain market leadership. The forecast period (2025-2033) promises further consolidation and the emergence of new applications, driving the continued expansion of the deep learning market.

  6. f

    Drill image dataset for training part I.

    • plos.figshare.com
    zip
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu (2024). Drill image dataset for training part I. [Dataset]. http://doi.org/10.1371/journal.pone.0299471.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structural planes decrease the strength and stability of rock masses, severely affecting their mechanical properties and deformation and failure characteristics. Therefore, investigation and analysis of structural planes are crucial tasks in mining rock mechanics. The drilling camera obtains image information of deep structural planes of rock masses through high-definition camera methods, providing important data sources for the analysis of deep structural planes of rock masses. This paper addresses the problems of high workload, low efficiency, high subjectivity, and poor accuracy brought about by manual processing based on current borehole image analysis and conducts an intelligent segmentation study of borehole image structural planes based on the U2-Net network. By collecting data from 20 different borehole images in different lithological regions, a dataset consisting of 1,013 borehole images with structural plane type, lithology, and color was established. Data augmentation methods such as image flipping, color jittering, blurring, and mixup were applied to expand the dataset to 12,421 images, meeting the requirements for deep network training data. Based on the PyTorch deep learning framework, the initial U2-Net network weights were set, the learning rate was set to 0.001, the training batch was 4, and the Adam optimizer adaptively adjusted the learning rate during the training process. A dedicated network model for segmenting structural planes was obtained, and the model achieved a maximum F-measure value of 0.749 when the confidence threshold was set to 0.7, with an accuracy rate of up to 0.85 within the range of recall rate greater than 0.5. Overall, the model has high accuracy for segmenting structural planes and very low mean absolute error, indicating good segmentation accuracy and certain generalization of the network. The research method in this paper can serve as a reference for the study of intelligent identification of structural planes in borehole images.

  7. D

    Data Science Platform Industry Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Data Science Platform Industry Report [Dataset]. https://www.marketreportanalytics.com/reports/data-science-platform-industry-89665
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Science Platform market is experiencing robust growth, projected to reach $10.15 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 23.50% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing volume and complexity of data generated across diverse industries necessitates sophisticated platforms for analysis and insights extraction. Businesses are increasingly adopting cloud-based solutions for their scalability, cost-effectiveness, and accessibility, driving the growth of the cloud deployment segment. Furthermore, the rising demand for advanced analytics capabilities across sectors like BFSI (Banking, Financial Services, and Insurance), retail and e-commerce, and IT & Telecom is significantly boosting market demand. The availability of robust and user-friendly platforms is empowering businesses of all sizes, from SMEs to large enterprises, to leverage data science effectively for improved decision-making and competitive advantage. The market is witnessing the emergence of innovative solutions such as automated machine learning (AutoML) and integrated platforms that combine data preparation, model building, and deployment capabilities. The market segmentation reveals significant opportunities across various offerings and deployment models. While the platform segment holds a larger share, the services segment is poised for significant growth driven by the need for expert consulting and support in data science projects. Geographically, North America currently dominates the market, but the Asia-Pacific region is expected to witness faster growth due to increasing digitalization and technological advancements. Key players like IBM, Google, Microsoft, and Amazon are driving innovation and competition, with new entrants continuously emerging, adding to the market's dynamism. While challenges such as data security and privacy concerns remain, the overall market outlook is exceptionally positive, promising considerable growth over the forecast period. Continued technological innovation, coupled with rising adoption across a wider array of industries, will be central to the market's continued expansion. Recent developments include: November 2023 - Stagwell announced a partnership with Google Cloud and SADA, a Google Cloud premier partner, to develop generative AI (gen AI) marketing solutions that support Stagwell agencies, client partners, and product development within the Stagwell Marketing Cloud (SMC). The partnership will help in harnessing data analytics and insights by developing and training a proprietary Stagwell large language model (LLM) purpose-built for Stagwell clients, productizing data assets via APIs to create new digital experiences for brands, and multiplying the value of their first-party data ecosystems to drive new revenue streams using Vertex AI and open source-based models., May 2023 - IBM launched a new AI and data platform, watsonx, it is aimed at allowing businesses to accelerate advanced AI usage with trusted data, speed and governance. IBM also introduced GPU-as-a-service, which is designed to support AI intensive workloads, with an AI dashboard to measure, track and help report on cloud carbon emissions. With watsonx, IBM offers an AI development studio with access to IBMcurated and trained foundation models and open-source models, access to a data store to gather and clean up training and tune data,. Key drivers for this market are: Rapid Increase in Big Data, Emerging Promising Use Cases of Data Science and Machine Learning; Shift of Organizations Toward Data-intensive Approach and Decisions. Potential restraints include: Rapid Increase in Big Data, Emerging Promising Use Cases of Data Science and Machine Learning; Shift of Organizations Toward Data-intensive Approach and Decisions. Notable trends are: Small and Medium Enterprises to Witness Major Growth.

  8. O

    Open Source Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Open Source Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-tools-1936277
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The open-source tools market is experiencing robust growth, driven by increasing demand for cost-effective, flexible, and customizable solutions across diverse sectors. The market, encompassing tools for data cleaning, visualization, mining, and applications like machine learning, natural language processing, and computer vision, is projected to witness substantial expansion over the forecast period (2025-2033). Factors such as the rising adoption of cloud computing, the growing need for data-driven decision-making, and the increasing preference for collaborative development models are key drivers. While the specific CAGR isn't provided, a conservative estimate based on industry trends suggests a compound annual growth rate of around 15-20% is realistic for the period. This growth is anticipated across all segments, with the data science and machine learning sectors exhibiting particularly strong performance. Geographic expansion is also a prominent trend, with North America and Europe leading the market initially, followed by a significant increase in adoption across Asia Pacific and other regions as digital transformation initiatives accelerate. However, challenges remain. Security concerns surrounding open-source software and the need for robust support and maintenance infrastructure could potentially restrain market growth. Nevertheless, ongoing improvements in security protocols and the burgeoning community support surrounding many open-source projects are mitigating these challenges. The diverse range of applications and tool types within the open-source market ensures its versatility. Universal tools, catering to broad needs, and specialized tools like data visualization and mining software are all experiencing increased demand. The presence of established players like IBM and Oracle alongside a large community of contributors ensures a dynamic market ecosystem. The continued development of innovative tools, improved documentation, and enhanced community support are expected to further fuel market growth, making open-source solutions increasingly attractive to businesses of all sizes. Specific segmentation data, while not explicitly provided, shows a spread across applications indicating a healthy, diversified market that is expected to evolve rapidly within the forecast period.

  9. Deep Learning Market Analysis US - Size and Forecast 2024-2028

    • technavio.com
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Deep Learning Market Analysis US - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/us-deep-learning-market-industry-analysis
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States
    Description

    Snapshot img

    US Deep Learning Market Size 2024-2028

    The US deep learning market size is forecast to increase by USD 3.55 billion at a CAGR of 27.17% between 2023 and 2028. The market is experiencing significant growth due to several key drivers. Firstly, the increasing demand for industry-specific solutions is fueling market expansion. Additionally, the high data requirements for deep learning applications are leading to increased data generation and collection. Cloud analytics is another significant trend, as companies seek to leverage cloud computing for cost savings and scalability. However, challenges persist, including the escalating cyberattack rate and the need for strong customer data security. Education institutes are also investing in deep learning research and development to prepare the workforce for the future. Overall, the market is poised for continued growth, driven by these factors and the potential for innovation and advancement in various sectors.

    Request Free Sample

    Deep learning, a subset of artificial intelligence (AI), is a machine learning technique that uses neural networks to model and solve complex problems. This technology is gaining significant traction in various industries across the US, driven by the availability of large datasets and advancements in cloud-based technology. One of the primary areas where deep learning is making a mark is in data centers. Deep learning algorithms are being used to analyze vast amounts of data, enabling businesses to gain valuable insights and make informed decisions. Cloud-based technology is facilitating the deployment of deep learning models at scale, making it an attractive solution for businesses looking to leverage their data.

    Furthermore, the market is rapidly evolving, driven by innovations in cloud-based technology, neural networks, and big-data analytics. The integration of machine vision technology and image and visual recognition has driven advancements in industries such as self driving vehicles, digital marketing, and virtual assistance. Companies are leveraging generative adversarial networks (GANs) for cutting-edge news accumulation and content generation. Additionally, machine vision is transforming sectors like retail and manufacturing by enhancing automation and human behavior analysis. With the use of human brain cells generated information, researchers are pushing the boundaries of artificial intelligence. The growing importance of photos and visual data in decision-making further accelerates the market, highlighting the potential of deep learning technologies.

    Market Segmentation

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Application
    
      Image recognition
      Voice recognition
      Video surveillance and diagnostics
      Data mining
    
    
    Type
    
      Software
      Services
      Hardware
    
    
    End-user
    
      Security
      Automotive
      Healthcare
      Retail and commerce
      Others
    
    
    Geography
    
      US
    

    By Application Insights

    The Image recognition segment is estimated to witness significant growth during the forecast period. Deep learning, a subset of artificial intelligence (AI), is revolutionizing various industries in the US through its ability to analyze and interpret complex data. One of its key applications is image recognition, which utilizes neural networks and graphics processing units (GPUs) to identify objects or patterns within images and videos. This technology is increasingly being adopted in data centers and cloud-based solutions for applications such as visual search, product recommendations, and inventory management. In the automotive sector, image recognition is integral to advanced driver assistance systems (ADAS) and autonomous vehicles, enabling the identification of pedestrians, other vehicles, road signs, and lane markings.

    Additionally, image recognition is essential for cybersecurity applications, industrial automation, Internet of Things (IoT) devices, and robots, enhancing their functionality and efficiency. Image recognition is transforming industries by providing accurate and real-time insights from visual data, ultimately improving user experience and productivity.

    Get a glance at the market share of various segments Request Free Sample

    The Image recognition segment was valued at USD 265.10 billion in 2017 and showed a gradual increase during the forecast period.

    Our market researchers analyzed the data with 2023 as the base year, along with the key drivers, trends, and challenges. A holistic analysis of drivers will help companies refine their marketing strategies to gain a competitive advantage.

    Market Driver

    Industry-specific solutions is the key driver of the market. Deep learning has become a pivotal technology in addressing classification tasks across numerous industrie

  10. A

    Advanced and Predictive Analytics Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Advanced and Predictive Analytics Report [Dataset]. https://www.marketresearchforecast.com/reports/advanced-and-predictive-analytics-44935
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 21, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The advanced and predictive analytics market is experiencing robust growth, driven by the increasing adoption of data-driven decision-making across various sectors. The market's expansion is fueled by several key factors, including the exponential growth of data volume and velocity, the decreasing cost of data storage and processing, and the rising demand for real-time insights. Businesses across industries are leveraging advanced analytics techniques like machine learning, deep learning, and artificial intelligence to improve operational efficiency, enhance customer experience, optimize resource allocation, and mitigate risks. The banking and financial services, insurance, and healthcare sectors are particularly significant adopters, using predictive models for fraud detection, risk assessment, customer segmentation, and personalized medicine. However, challenges such as data security concerns, the need for skilled data scientists, and the complexity of implementing and integrating advanced analytics solutions continue to present hurdles for wider adoption. The market segmentation reveals a significant contribution from the banking and financial services sector, followed closely by insurance and healthcare. Geographical distribution shows strong growth in North America and Europe, driven by early adoption and mature technological infrastructure. However, the Asia-Pacific region is expected to witness significant growth in the coming years due to increasing digitalization and government initiatives promoting data analytics. The competitive landscape is characterized by both established technology giants like IBM, Microsoft, and SAP, and specialized analytics companies like SAS and FICO, leading to innovation and diverse solutions. Future growth will be shaped by advancements in cloud computing, big data technologies, and the development of more sophisticated and explainable AI algorithms. The continued focus on data privacy and regulatory compliance will also play a crucial role in shaping the market's trajectory.

  11. f

    Data from: New Variable Selection Method Using Interval Segmentation Purity...

    • acs.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu (2023). New Variable Selection Method Using Interval Segmentation Purity with Application to Blockwise Kernel Transform Support Vector Machine Classification of High-Dimensional Microarray Data [Dataset]. http://doi.org/10.1021/ci900032q.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant, or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. A new method for key gene selection has been proposed on the basis of interval segmentation purity that is defined as the purity of samples belonging to a certain class in intervals segmented by a mode search algorithm. This method identifies key variables most discriminative for each class, which offers possibility of unraveling the biological implication of selected genes. A salient advantage of the new strategy over existing methods is the capability of selecting genes that, though possibly exhibit a multimodal distribution, are the most discriminative for the classes of interest, considering that the expression levels of some genes may reflect systematic difference in within-class samples derived from different pathogenic mechanisms. On the basis of the key genes selected for individual classes, a support vector machine with block-wise kernel transform is developed for the classification of different classes. The combination of the proposed gene mining approach with support vector machine is demonstrated in cancer classification using two public data sets. The results reveal that significant genes have been identified for each class, and the classification model shows satisfactory performance in training and prediction for both data sets.

  12. T

    Text Analysis Software Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Text Analysis Software Report [Dataset]. https://www.marketresearchforecast.com/reports/text-analysis-software-42331
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Mar 20, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global text analysis software market, valued at $1723.5 million in 2025, is projected to experience robust growth, driven by the increasing volume of unstructured text data generated across various sectors and the rising need for efficient data analysis. The market's Compound Annual Growth Rate (CAGR) of 5.6% from 2025 to 2033 indicates a steady expansion, fueled by the adoption of cloud-based solutions, the growing demand for real-time insights from social media, customer reviews, and other textual sources, and advancements in natural language processing (NLP) technologies like sentiment analysis and topic modeling. Key players like Microsoft, IBM, and Google are driving innovation through continuous product development and strategic partnerships, further contributing to market growth. The segmentation, encompassing on-premises and cloud-based deployment models along with large enterprises and SMEs as user segments, highlights diverse application areas across industries like finance, healthcare, and marketing, where text analysis is critical for decision-making and competitive advantage. Market restraints include the complexity of implementing and integrating text analysis solutions, the need for skilled professionals, and the associated costs of data storage and processing. However, the ongoing development of user-friendly interfaces and the increasing affordability of cloud-based solutions are mitigating these challenges. The geographic distribution shows a significant market share held by North America, fueled by high technological adoption and a strong presence of major technology companies. However, the Asia-Pacific region is anticipated to witness significant growth in the coming years, driven by rising digitalization and increasing government investments in technology infrastructure. The continuous advancements in AI and machine learning will further fuel the market’s growth trajectory, creating new avenues for data-driven insights and automation across various industry applications.

  13. m

    Data Analytics Consulting Service Market Global Size, Share & Industry...

    • marketresearchintellect.com
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Intellect (2024). Data Analytics Consulting Service Market Global Size, Share & Industry Forecast 2033 [Dataset]. https://www.marketresearchintellect.com/product/data-analytics-consulting-service-market/
    Explore at:
    Dataset updated
    Jun 27, 2024
    Dataset authored and provided by
    Market Research Intellect
    License

    https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy

    Area covered
    Global
    Description

    The size and share of this market is categorized based on Descriptive Analytics (Data Visualization, Reporting, Business Intelligence, Dashboard Development, Data Mining) and Predictive Analytics (Forecasting, Risk Management, Customer Segmentation, Trend Analysis, Predictive Modeling) and Prescriptive Analytics (Optimization Techniques, Simulation, Decision Analysis, Resource Allocation, Strategic Planning) and Big Data Analytics (Data Engineering, Data Lakes, Data Warehouse Solutions, Real-Time Analytics, Cloud-Based Analytics) and Advanced Analytics (Machine Learning, Artificial Intelligence, Natural Language Processing, Deep Learning, Text Analytics) and geographical regions (North America, Europe, Asia-Pacific, South America, Middle-East and Africa).

  14. Data from: Globe230k: A Benchmark Dense-Pixel Annotation Dataset for Global...

    • zenodo.org
    • explore.openaire.eu
    txt, zip
    Updated Jul 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qian Shi; Da He; Zhengyu Liu; Xiaoping Liu; Jingqian Xue; Qian Shi; Da He; Zhengyu Liu; Xiaoping Liu; Jingqian Xue (2024). Globe230k: A Benchmark Dense-Pixel Annotation Dataset for Global Land Cover Mapping [Dataset]. http://doi.org/10.5281/zenodo.10435661
    Explore at:
    txt, zipAvailable download formats
    Dataset updated
    Jul 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Qian Shi; Da He; Zhengyu Liu; Xiaoping Liu; Jingqian Xue; Qian Shi; Da He; Zhengyu Liu; Xiaoping Liu; Jingqian Xue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We (Intelligent Mining and Analysis of Remote Sensing big data, IMARS) create a large-scale annotated dataset (Globe230k) for land use/land cover (LULC) mapping, which is annotated on Google Earth image of 1 m spatial resolution. Globe230k is annotated by numerous experts and students major in survey and mapping after necessary training, through visual interpretation on very high-resolution images, as well as in-situ field survey, under the guidance of the organized annotation pipeline. Globe230k has three superiorities:

    1) Large scale: the Globe230k includes 232,819 annotated images with the size of 512x512 and spatial resolution of 1 m, with more than 3x1010 annotated pixels, and it includes 10 first-level categories.

    2) Rich diversity: the annotated images are sampled from worldwide regions, with coverage area of over 60,000 km2, indicating a high variability and diversity. Besides, in order to ensure the category balance, we intentionally give more chance to the rare categories to be sampled, such as wetland, ice/snow, etc.

    3) Multi-modal: Globe230k not only contains RGB bands, but also include other important features for Earth system research, such as Normalized differential vegetation index (NDVI), digital elevation model (DEM), vertical-vertical polarization (VV) bands, vertical-horizontal polarization (VH) bands, which can facilitate the multi-modal data fusion research. Due to the large size of the multi-modal dataset (DEM 1.91G, NDVI 164G, VVVH 372G), these dataset are stored on Baidu Yunpan, the download link is :https://pan.baidu.com/s/12AKbiqOXSf4fnm7mYkCE0g?pwd=230k, the extraction code is 230k.

    The image patches and their corresponding annotated patches are respectively stored in "image_patch.zip" and "label_patch.zip" file. The RGB image is in forms of ".jpg", with size of 512x512, the pixel value is ranged from 0-255. The annotated patches is in forms of ".png", also with size of 512x512, the pixel value is ranged from 1-10, which respectively represent 1#cropland, 2#forest, 3#grass, 4#shrubland, 5#wetland, 6#water, 7#tundra, 8#impervious, 9#bareland, 10#ice/snow. The corresponding DEM, NDVI and VVVH patches are all in form of ".tif", with size of 512x512 (due to the different resolution of DEM, NDVI and VVVH patches, they are all uniformly resized to the same scale as the image patch).

    The total 232,819 pairs are officially divided into training set, validation set, and test set, based on ratio of 7:1:2, which can be find in "train_num.txt","val_num.txt","test_num.txt" file. Based on this division, the official baseline accuracy of several state-of-the-art semantic segmentation can be found in the related arcticle (https://spj.science.org/doi/10.34133/remotesensing.0078).

    We hope it can be used as a benchmark to promote further development of global land cover mapping and semantic segmentation algorithm development.

  15. S

    Global Mine Planning and Geological Modeling Software Market Forecast and...

    • statsndata.org
    excel, pdf
    Updated Apr 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Mine Planning and Geological Modeling Software Market Forecast and Trend Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/mine-planning-and-geological-modeling-software-market-365546
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Apr 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Mine Planning and Geological Modeling Software market plays a critical role in the mining industry, providing essential tools that facilitate efficient resource extraction and management. These specialized software solutions are crucial for professionals in geology and mining engineering, allowing them to create

  16. M

    Mining Equipment Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Mining Equipment Market Report [Dataset]. https://www.promarketreports.com/reports/mining-equipment-market-16666
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Jan 31, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global mining equipment market is estimated to reach $70.24 billion by 2033, growing at a CAGR of 1.34% during the forecast period. The market is driven by increasing demand for mining equipment due to the growing mineral and metal exploration activities. The growing urbanization and industrialization have led to a surge in demand for metals and minerals, which has boosted the demand for mining equipment. The increasing adoption of automation in the mining industry is also driving the growth of the market. The mining equipment market is segmented based on equipment type, application, propulsion type, automation level, size, and region. Based on equipment type, the market is segmented into surface mining equipment, underground mining equipment, drilling equipment, hauling equipment, and excavating equipment. Based on application, the market is segmented into metal mining, coal mining, industrial minerals mining, quarrying and construction. The market is also segmented based on propulsion type, automation level, size, and region. The key companies operating in the mining equipment market include Epiroc AB, Sandvik, Siemens AG, Liebherr, Komatsu, Terex Corporation, Caterpillar, Hitachi, Joy Global Inc., General Electric Company, Atlas Copco, Hitachi Construction Machinery Co., Ltd., Metso Outotec, Bucyrus International, Inc., and Volvo Construction Equipment. Recent developments include: , The Mining Equipment Market is projected to reach a value of USD 79.2 billion by 2032, exhibiting a CAGR of 1.34% during the forecast period (2024-2032). The growth of the market is attributed to the rising demand for minerals and metals, increasing investments in mining projects, and technological advancements in mining equipment.Key recent developments include the launch of new mining equipment models by major manufacturers such as Caterpillar, Komatsu, and Hitachi. These new models offer improved efficiency, productivity, and safety features, which are driving their adoption by mining companies. Additionally, the growing adoption of automation and digital technologies in mining operations is expected to further fuel the demand for advanced mining equipment in the coming years., Mining Equipment Market Segmentation Insights. Key drivers for this market are: Rising electric mining equipment demand Autonomous mining technology adoption Advanced data analytics and IoT integration Growing focus on sustainable mining practices Government support for automation and efficiency . Potential restraints include: Increasing demand for critical minerals Rapid technological advancements Growing focus on safety and efficiency Fluctuating raw material prices Regional infrastructure development .

  17. Sentiment Analytics Software Market Analysis North America, Europe, APAC,...

    • technavio.com
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Sentiment Analytics Software Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, Germany, China, UK, India, Canada, France, Japan, Brazil, South Korea - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/sentiment-analytics-software-market-industry-analysis
    Explore at:
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Germany, United States, Global
    Description

    Snapshot img

    Sentiment Analytics Software Market Size 2025-2029

    The sentiment analytics software market size is forecast to increase by USD 2.34 billion, at a CAGR of 16.6% between 2024 and 2029.

    The market is poised for significant growth, driven by the increasing use of social media and rising internet penetration. With over 4.66 billion people using the internet as of 2021 and social media users projected to reach 4.4 billion by 2025, the volume of digital conversations is expanding exponentially. This presents a vast opportunity for sentiment analytics software, which extracts emotional intelligence from these conversations to help businesses understand customer sentiment and tailor their strategies accordingly. Moreover, the integration of generative AI in sentiment analytics is a game-changer. This technology enables software to analyze not just text but also tone, sentiment, and context, providing more accurate and nuanced insights. However, challenges persist, including context-dependent errors, data privacy concerns, and the need for continuous training and updating to keep up with evolving language trends. Despite these challenges, the market is expected to grow at a CAGR of 18.4% from 2021 to 2028, reaching a value of USD113.1 billion. The potential for improved customer engagement, enhanced brand reputation, and data-driven decision-making makes sentiment analytics software an indispensable tool for businesses in the digital age.

    What will be the Size of the Sentiment Analytics Software Market during the forecast period?

    Request Free SampleThe market continues to evolve, with dynamic applications across various sectors. Entities utilize data visualization tools to gain competitive intelligence, integrating sentiment analysis into their marketing strategies. Real-time sentiment tracking is a crucial component of customer experience management, enabling businesses to respond promptly to customer feedback. Sentiment analysis algorithms employ text classification, opinion mining, and natural language processing to derive insights from vast amounts of data. Hybrid sentiment analysis combines multiple approaches, enhancing accuracy and reliability. Predictive analytics leverages sentiment scoring and customer segmentation for product development and brand reputation management. Cloud-based sentiment analysis and sentiment APIs offer flexibility and scalability, while on-premise sentiment analysis maintains data security. Topic modeling assists in understanding customer needs and preferences, informing targeted marketing efforts. Voice of customer insights and business intelligence facilitate informed decision-making. Sentiment dashboards provide a comprehensive view of customer sentiment, enabling risk management and decision support. Market research and public relations benefit from sentiment analysis services, ensuring effective communication strategies. Continuous market activities unfold, with ongoing developments in big data analytics, machine learning, and data mining shaping the future of sentiment analysis.

    How is this Sentiment Analytics Software Industry segmented?

    The sentiment analytics software industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. DeploymentOn-premisesCloud-basedEnd-userRetailBFSIHealthcareOthersGeographyNorth AmericaUSEuropeGermanyUKAPACChinaIndiaRest of World (ROW)

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.In the realm of business intelligence and customer experience management, on-premises sentiment analytics software has emerged as a preferred choice for organizations seeking control over their data and IT infrastructure. This setup enables companies to maintain privacy and adhere to regulatory requirements, while also allowing for customization that caters to unique business needs. On-premises sentiment analysis solutions integrate seamlessly with existing systems, providing a high degree of flexibility. Furthermore, the dedicated infrastructure results in superior performance and faster processing times. Sentiment analytics platforms employ various techniques such as emotion detection, topic modeling, text classification, and natural language processing to derive valuable insights from customer feedback and social media monitoring. These insights are essential for risk management, brand reputation management, product development, and marketing campaigns. Predictive analytics and customer segmentation are other key applications of sentiment analysis, offering businesses the ability to anticipate customer needs and preferences. Cloud-based sentiment analysis and sentiment APIs are alternative options, but on-premises sentiment analytics software provides t

  18. m

    Datenanalyseberatungsdienstmarkt Globale Größe, Anteil und Branchenprognose...

    • marketresearchintellect.com
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Intellect (2024). Datenanalyseberatungsdienstmarkt Globale Größe, Anteil und Branchenprognose 2033 [Dataset]. https://www.marketresearchintellect.com/de/product/data-analytics-consulting-service-market/
    Explore at:
    Dataset updated
    Aug 2, 2024
    Dataset authored and provided by
    Market Research Intellect
    License

    https://www.marketresearchintellect.com/de/privacy-policyhttps://www.marketresearchintellect.com/de/privacy-policy

    Area covered
    Global
    Description

    Die Marktgröße und der Anteil sind kategorisiert nach Descriptive Analytics (Data Visualization, Reporting, Business Intelligence, Dashboard Development, Data Mining) and Predictive Analytics (Forecasting, Risk Management, Customer Segmentation, Trend Analysis, Predictive Modeling) and Prescriptive Analytics (Optimization Techniques, Simulation, Decision Analysis, Resource Allocation, Strategic Planning) and Big Data Analytics (Data Engineering, Data Lakes, Data Warehouse Solutions, Real-Time Analytics, Cloud-Based Analytics) and Advanced Analytics (Machine Learning, Artificial Intelligence, Natural Language Processing, Deep Learning, Text Analytics) and geografischen Regionen (Nordamerika, Europa, Asien-Pazifik, Südamerika, Naher Osten & Afrika)

  19. m

    Marché des services de conseil d'analyse des données Prévisions de taille,...

    • marketresearchintellect.com
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Intellect (2025). Marché des services de conseil d'analyse des données Prévisions de taille, de part et d'industrie mondiales 2033 [Dataset]. https://www.marketresearchintellect.com/fr/product/data-analytics-consulting-service-market/
    Explore at:
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Market Research Intellect
    License

    https://www.marketresearchintellect.com/fr/privacy-policyhttps://www.marketresearchintellect.com/fr/privacy-policy

    Area covered
    Global
    Description

    La taille et la part de marché sont classées selon Descriptive Analytics (Data Visualization, Reporting, Business Intelligence, Dashboard Development, Data Mining) and Predictive Analytics (Forecasting, Risk Management, Customer Segmentation, Trend Analysis, Predictive Modeling) and Prescriptive Analytics (Optimization Techniques, Simulation, Decision Analysis, Resource Allocation, Strategic Planning) and Big Data Analytics (Data Engineering, Data Lakes, Data Warehouse Solutions, Real-Time Analytics, Cloud-Based Analytics) and Advanced Analytics (Machine Learning, Artificial Intelligence, Natural Language Processing, Deep Learning, Text Analytics) and régions géographiques (Amérique du Nord, Europe, Asie-Pacifique, Amérique du Sud, Moyen-Orient et Afrique).

  20. f

    Structural plane classification.

    • plos.figshare.com
    xls
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu (2024). Structural plane classification. [Dataset]. http://doi.org/10.1371/journal.pone.0299471.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structural planes decrease the strength and stability of rock masses, severely affecting their mechanical properties and deformation and failure characteristics. Therefore, investigation and analysis of structural planes are crucial tasks in mining rock mechanics. The drilling camera obtains image information of deep structural planes of rock masses through high-definition camera methods, providing important data sources for the analysis of deep structural planes of rock masses. This paper addresses the problems of high workload, low efficiency, high subjectivity, and poor accuracy brought about by manual processing based on current borehole image analysis and conducts an intelligent segmentation study of borehole image structural planes based on the U2-Net network. By collecting data from 20 different borehole images in different lithological regions, a dataset consisting of 1,013 borehole images with structural plane type, lithology, and color was established. Data augmentation methods such as image flipping, color jittering, blurring, and mixup were applied to expand the dataset to 12,421 images, meeting the requirements for deep network training data. Based on the PyTorch deep learning framework, the initial U2-Net network weights were set, the learning rate was set to 0.001, the training batch was 4, and the Adam optimizer adaptively adjusted the learning rate during the training process. A dedicated network model for segmenting structural planes was obtained, and the model achieved a maximum F-measure value of 0.749 when the confidence threshold was set to 0.7, with an accuracy rate of up to 0.85 within the range of recall rate greater than 0.5. Overall, the model has high accuracy for segmenting structural planes and very low mean absolute error, indicating good segmentation accuracy and certain generalization of the network. The research method in this paper can serve as a reference for the study of intelligent identification of structural planes in borehole images.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira; Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira (2021). Datasets and Models for Historical Newspaper Article Segmentation [Dataset]. http://doi.org/10.5281/zenodo.3706863
Organization logo

Datasets and Models for Historical Newspaper Article Segmentation

Explore at:
json, txt, zipAvailable download formats
Dataset updated
Jan 31, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira; Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira
Description

This record contains the datasets and models used and produced for the work reported in the paper "Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers" (link).

Please cite this paper if you are using the models/datasets or find it relevant to your research:

@article{barman_combining_2020,
  title = {{Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers}},
  author = {Raphaël Barman and Maud Ehrmann and Simon Clematide and Sofia Ares Oliveira and Frédéric Kaplan},
  journal= {Journal of Data Mining \& Digital Humanities},
  volume= {HistoInformatics}
  DOI = {10.5281/zenodo.4065271},
  year = {2021},
  url = {https://jdmdh.episciences.org/7097},
}


Please note that this record contains data under different licenses.

1. DATA

  • Annotations (json files): JSON files contains image annotations, with one file per newspaper containing region annotations (label and coordinates) in VIA format. The following licenses apply:
    • luxwort.json: those annotations are under a CC0 1.0 license. Please refer to the right statement specified for each image in the file.
    • GDL.json, IMP.json and JDG.json: those annotations are under a CC BY-SA 4.0 license.

  • Image files: The archive images.zip contains the Swiss titles image files (GDL, IMP, JDG) used for the experiments described in the paper. Those images are under copyright (property of the journal Le Temps and of ArcInfo) and can be used for academic research or educational purposes only. Redistribution, publication or commercial use are not permitted. These terms of use are similar to the following right statement: http://rightsstatements.org/vocab/InC-EDU/1.0/

2. MODELS

Some of the best models are released under a CC BY-SA 4.0 license (they are also available as assets of the current Github release).

  • JDG_flair-FT: this model was trained on JDG using french Flair and FastText embeddings. It is able to predict the four classes presented in the paper (Serial, Weather, Death notice and Stocks).
  • Luxwort_obituary_flair-bpemb: this model was trained on Luxwort using multilingual Flair and Byte-pair embeddings. It is able to predict the Death notice class.
  • Luxwort_obituary_flair-FT_indomain: this model was trained on Luxwort using in-domain Flair and FastText embeddings (trained on Luxwort data). It is also able to predict the Death notice class.

Those models can be used to predict probabilities on new images using the same code as in the original dhSegment repository. One needs to adjust three parameters to the predict function: 1) embeddings_path (the path to the embeddings list), 2) embeddings_map_path(the path to the compressed embedding map), and 3) embeddings_dim (the size of the embeddings).

Please refer to the paper for further information or contact us.

3. CODE:

https://github.com/dhlab-epfl/dhSegment-text


4. ACKNOWLEDGEMENTS
We warmly thank the journal Le Temps (owner of La Gazette de Lausanne and the Journal de Genève) and the group ArcInfo (owner of L'Impartial) for accepting to share the related datasets for academic purposes. We also thank the National Library of Luxembourg for its support with all steps related to the Luxemburger Wort annotation release.
This work was realized in the context of the impresso - Media Monitoring of the Past project and supported by the Swiss National Science Foundation under grant CR- SII5_173719.

5. CONTACT
Maud Ehrmann (EPFL-DHLAB)
Simon Clematide (UZH)

Search
Clear search
Close search
Google apps
Main menu