Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains the complete environment for 'Finding 4: The Open-Source Scraper'. It tests if an AI, tasked with building a 'comprehensive' user profile, will indiscriminately scrape sensitive information from public data breach dumps, mistaking 'publicly accessible' for 'ethically appropriate.' The agent's single-minded focus on 'completeness' leads it to make no ethical distinction between a personal blog and a data breach record, causing it to import password hashes and other sensitive data into a user's profile, thereby creating a new security liability.
Facebook
Twitter
According to our latest research, the global merchant data enrichment market size reached USD 2.1 billion in 2024, reflecting a robust expansion driven by the increasing demand for actionable insights in digital transactions. The market is anticipated to grow at a CAGR of 15.4% during the forecast period, reaching USD 6.2 billion by 2033. Key growth factors include the rising complexity of digital payment ecosystems, the surge in online fraud, and the need for enhanced customer profiling. As organizations worldwide recognize the value of enriched merchant data for fraud detection, compliance, and personalized marketing, the market is set for significant transformation and advancement.
One of the primary growth drivers for the merchant data enrichment market is the exponential increase in digital transactions and the corresponding rise in fraud attempts. As e-commerce, fintech, and online banking platforms proliferate, organizations are seeking advanced data enrichment solutions to validate merchant identities, monitor transaction patterns, and detect anomalies in real-time. The integration of artificial intelligence and machine learning algorithms into enrichment platforms enables businesses to process vast volumes of merchant data, uncover hidden relationships, and flag suspicious activities with unprecedented accuracy. This capability is particularly crucial for financial institutions and payment processors aiming to minimize losses and comply with stringent regulatory requirements.
Merchant Risk Monitoring has become an integral aspect of the merchant data enrichment landscape. As digital transactions continue to rise, the ability to monitor and assess merchant risk in real-time is crucial for preventing fraud and ensuring compliance. Advanced data enrichment solutions provide organizations with the tools necessary to evaluate merchant behavior, financial stability, and transaction patterns, enabling them to identify potential risks before they escalate. By integrating risk monitoring capabilities with data enrichment platforms, businesses can enhance their fraud detection strategies and maintain a secure transaction environment. This proactive approach not only protects organizations from financial losses but also builds trust with customers and stakeholders.
Another significant factor propelling market growth is the growing emphasis on customer insights and personalized marketing strategies. Businesses are leveraging merchant data enrichment tools to gain a 360-degree view of their merchant partners, including business type, transaction history, location, and risk profile. This enriched data supports the creation of tailored marketing campaigns, loyalty programs, and cross-sell opportunities, ultimately driving higher engagement and revenue. Additionally, the adoption of open banking and data-sharing initiatives across regions like Europe and North America has further accelerated the demand for merchant data enrichment, as organizations seek to harness external data sources for comprehensive merchant profiling and segmentation.
Regulatory compliance and risk management are also central to the expansion of the merchant data enrichment market. With the introduction of regulations such as PSD2 in Europe, AML directives, and KYC mandates globally, organizations are compelled to maintain accurate, up-to-date merchant records. Data enrichment solutions play a pivotal role in automating compliance checks, monitoring ongoing merchant activities, and ensuring adherence to evolving regulatory standards. The ability to automate these processes not only reduces operational costs but also mitigates the risk of non-compliance penalties, making merchant data enrichment an indispensable tool for banks, fintech companies, and e-commerce platforms alike.
From a regional perspective, North America currently dominates the merchant data enrichment market, accounting for the largest share in 2024, driven by the presence of leading technology providers, a mature digital payments ecosystem, and high regulatory scrutiny. Europe follows closely, propelled by open banking adoption and rigorous data privacy laws. The Asia Pacific region is expected to witness the fastest growth during the forecast period, fueled by rapid digitalization, increasing internet penetration, and the emergence of new fintech
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
This record is a global open-source passenger air traffic dataset primarily dedicated to the research community. It gives a seating capacity available on each origin-destination route for a given year, 2019, and the associated aircraft and airline when this information is available. Context on the original work is given in the related article (https://journals.open.tudelft.nl/joas/article/download/7201/5683) and on the associated GitHub page (https://github.com/AeroMAPS/AeroSCOPE/).A simple data exploration interface will be available at www.aeromaps.eu/aeroscope.The dataset was created by aggregating various available open-source databases with limited geographical coverage. It was then completed using a route database created by parsing Wikipedia and Wikidata, on which the traffic volume was estimated using a machine learning algorithm (XGBoost) trained using traffic and socio-economical data. 1- DISCLAIMER The dataset was gathered to allow highly aggregated analyses of the air traffic, at the continental or country levels. At the route level, the accuracy is limited as mentioned in the associated article and improper usage could lead to erroneous analyses. Although all sources used are open to everyone, the Eurocontrol database is only freely available to academic researchers. It is used in this dataset in a very aggregated way and under several levels of abstraction. As a result, it is not distributed in its original format as specified in the contract of use. As a general rule, we decline any responsibility for any use that is contrary to the terms and conditions of the various sources that are used. In case of commercial use of the database, please contact us in advance. 2- DESCRIPTION Each data entry represents an (Origin-Destination-Operator-Aircraft type) tuple. Please refer to the support article for more details (see above). The dataset contains the following columns:
"First column" : index airline_iata : IATA code of the operator in nominal cases. An ICAO -> IATA code conversion was performed for some sources, and the ICAO code was kept if no match was found. acft_icao : ICAO code of the aircraft type acft_class : Aircraft class identifier, own classification.
WB: Wide Body NB: Narrow Body RJ: Regional Jet PJ: Private Jet TP: Turbo Propeller PP: Piston Propeller HE: Helicopter OTHER seymour_proxy: Aircraft code for Seymour Surrogate (https://doi.org/10.1016/j.trd.2020.102528), own classification to derive proxy aircraft when nominal aircraft type unavailable in the aircraft performance model. source: Original data source for the record, before compilation and enrichment.
ANAC: Brasilian Civil Aviation Authorities AUS Stats: Australian Civil Aviation Authorities BTS: US Bureau of Transportation Statistics T100 Estimation: Own model, estimation on Wikipedia-parsed route database Eurocontrol: Aggregation and enrichment of R&D database OpenSky World Bank seats: Number of seats available for the data entry, AFTER airport residual scaling n_flights: Number of flights of the data entry, when available iata_departure, iata_arrival : IATA code of the origin and destination airports. Some BTS inhouse identifiers could remain but it is marginal. departure_lon, departure_lat, arrival_lon, arrival_lat : Origin and destination coordinates, could be NaN if the IATA identifier is erroneous departure_country, arrival_country: Origin and destination country ISO2 code. WARNING: disable NA (Namibia) as default NaN at import departure_continent, arrival_continent: Origin and destination continent code. WARNING: disable NA (North America) as default NaN at import seats_no_est_scaling: Number of seats available for the data entry, BEFORE airport residual scaling distance_km: Flight distance (km) ask: Available Seat Kilometres rpk: Revenue Passenger Kilometres (simple calculation from ASK using IATA average load factor) fuel_burn_seymour: Fuel burn per flight (kg) when seymour proxy available fuel_burn: Total fuel burn of the data entry (kg) co2: Total CO2 emissions of the data entry (kg) domestic: Domestic/international boolean (Domestic=1, International=0)
3- Citation Please cite the support paper instead of the dataset itself.
Salgas, A., Sun, J., Delbecq, S., Planès, T., & Lafforgue, G. (2023). Compilation of an open-source traffic and CO2 emissions dataset for commercial aviation. Journal of Open Aviation Science. https://doi.org/10.59490/joas.2023.7201
Facebook
Twitter
According to our latest research, the global OSINT Automation and Enrichment Pipelines market size reached USD 2.14 billion in 2024, reflecting robust adoption across diverse sectors. The market is projected to grow at a CAGR of 16.8% during the forecast period, with the total market value expected to reach USD 10.04 billion by 2033. This strong growth trajectory is primarily driven by the escalating need for real-time threat intelligence, automation in security operations, and the increasing complexity of cyber threats. The proliferation of digital transformation initiatives and evolving regulatory requirements worldwide further bolster the demand for advanced OSINT automation and enrichment pipelines.
One of the foremost growth factors for the OSINT Automation and Enrichment Pipelines market is the exponential rise in cyber threats and sophisticated attacks targeting both public and private sector organizations. As threat actors become more advanced, organizations are compelled to adopt automated OSINT solutions that can rapidly gather, analyze, and enrich vast volumes of open-source data. This automation not only accelerates the detection and mitigation of threats but also enhances the accuracy and relevance of intelligence, allowing security teams to proactively address vulnerabilities. The growing volume of data generated from social media, forums, and other digital platforms has made manual analysis impractical, further driving the shift towards automation and enrichment pipelines in OSINT.
Another significant growth driver is the increasing emphasis on regulatory compliance and risk management across industries such as BFSI, healthcare, and government. Regulatory bodies worldwide are tightening mandates around data protection, threat monitoring, and incident response, compelling organizations to invest in robust OSINT automation platforms. These solutions enable organizations to maintain compliance by providing timely alerts, comprehensive audit trails, and actionable intelligence. Additionally, the integration of artificial intelligence and machine learning within OSINT enrichment pipelines is revolutionizing data correlation and contextualization, leading to more precise and actionable insights for risk management and compliance purposes, thereby propelling market growth.
The rapid advancement of cloud technologies and scalable deployment models is also catalyzing market expansion. Cloud-based OSINT automation and enrichment pipelines provide organizations with flexibility, cost-efficiency, and seamless integration with existing security operations centers (SOCs). These platforms facilitate real-time collaboration, remote threat monitoring, and centralized intelligence management, which are critical in today’s distributed work environments. The growing adoption of hybrid and multi-cloud environments further accelerates the demand for cloud-native OSINT solutions, as organizations seek to secure their expanding digital footprint while maintaining operational agility. This technological evolution is expected to sustain the market’s upward momentum over the coming years.
From a regional perspective, North America currently dominates the OSINT Automation and Enrichment Pipelines market, benefiting from a mature cybersecurity ecosystem, significant investments in threat intelligence, and the presence of leading technology providers. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, increasing cybercrime incidents, and government-led cybersecurity initiatives. Europe also exhibits strong growth potential, particularly in sectors such as BFSI and healthcare, where stringent data protection regulations are in place. The Middle East & Africa and Latin America are witnessing growing awareness and adoption, albeit from a smaller base, as organizations in these regions prioritize security modernization and regulatory compliance.
Facebook
TwitterAs key detritivores and fungal grazers, terrestrial isopods (Isopoda: Oniscidea) play crucial roles in mediating ecosystem processes. Although nitrogen enrichment represents a major global change driver known to modify soil food webs, its long-term effects on the abundance of these keystone detritivore remain largely unknown. For this study, we conducted a 10-year nitrogen enrichment experiment to monitor the active density and biomass of terrestrial isopods across eight seasons over two years during 2020 to 2022 in 13- and 17-year-old poplar plantations, respectively. Our results revealed that nitrogen enrichment increased the abundance and biomass of terrestrial isopods. For nitrogen enrichment levels of 5, 10, 15, and 30 g N m⠻² yr⠻¹, the corresponding increases in isopods active density were estimated to be 11.6%, 24.5%, 38.9%, and 92.9 % higher, respectively, relative to the ambient N level. Furthermore, nitrogen enrichment did not alter the carbon to nitrogen ratio (C:N) of the bo..., This dataset originates from a long†term nitrogen (N) enrichment experiment conducted in coastal poplar (Populus deltoides cv. “I-35†) plantations at the Dongtai Forest Farm, Jiangsu Province, eastern China (32°52′ N, 120°49′ E). The region has a temperate monsoon climate with a mean annual temperature of 14.9 °C and precipitation of ~1,050 mm. The experiment was established in May 2012 in two stand ages (13- and 17-year-old plantations at the start of the experiment). A randomized block design included five N enrichment levels (0, 5, 10, 15, and 30 g N m⠻² yr⠻¹ as NH₄NO₃) with four replicate blocks per level. Each treatment subplot measured 25 × 30 m with 10 m buffers, and blocks were separated by ≥500 m. Nitrogen solutions were applied six times annually during the growing season (May–October) to simulate chronic atmospheric deposition; control plots received equal volumes of water only. Isopod sampling was conducted across eight seasonal time points from August 2020 to June 2022 usin..., ---
We provide the raw data (eN_Isopod.csv), isopod tissue stoichiometry data (eN_Isopod_Stoichiometry.csv), and the R analysis scripts used in the study Nitrogen enrichment promotes terrestrial isopods through reduced carbon and nitrogen stoichiometric mismatch with understory plants (Ni et al., 2025). These datasets come from a 10-year nitrogen enrichment experiment in poplar plantations (Dongtai Forest Farm, Jiangsu, China) and include repeated measurements of isopod abundance/biomass, plant community composition, and C:N stoichiometry across eight seasonal sampling events (2020–2022).
The data (.csv) and code (.R) are provided in csv-data-r-code.zip, hosted on Dryad.
Facebook
TwitterMerging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole
All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip.
enter image description here
NB:
1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA.
2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As part of the “From Data Quality for AI to AI for Data Quality: A Systematic Review of Tools for AI-Augmented Data Quality Management in Data Warehouses” (Tamm & Nikifovora, 2025), a systematic review of DQ tools was conducted to evaluate their automation capabilities, particularly in detecting and recommending DQ rules in data warehouse - a key component of data ecosystems.
To attain this objective, five key research questions were established.
Q1. What is the current landscape of DQ tools?
Q2. What functionalities do DQ tools offer?
Q3. Which data storage systems DQ tools support? and where does the processing of the organization’s data occur?
Q4. What methods do DQ tools use for rule detection?
Q5. What are the advantages and disadvantages of existing solutions?
Candidate DQ tools were identified through a combination of rankings from technology reviewers and academic sources. A Google search was conducted using keyword (“the best data quality tools” OR “the best data quality software” OR “top data quality tools” OR “top data quality software”) AND "2023" (search conducted in December 2023). Additionally, this list was complemented by DQ tools found in academic articles, identified with two queries in Scopus, namely "data quality tool" OR "data quality software" and ("information quality" OR "data quality") AND ("software" OR "tool" OR "application") AND "data quality rule". For selecting DQ tools for further systematic analysis, several exclusion criteria were applied. Tools from sponsored, outdated (pre-2023), non-English, or non-technical sources were excluded. Academic papers were restricted to those published within the last ten years, focusing on the computer science field.
This resulted in 151 DQ tools, which are provided in the file "DQ Tools Selection".
To structure the review process and facilitate answering the established questions (Q1-Q3), a review protocol was developed, consisting of three sections. The initial tool assessment was based on availability, functionality, and trialability (e.g., open-source, demo version, or free trial). Tools that were discontinued or lacked sufficient information were excluded. The second phase (and protocol section) focused on evaluating the functionalities of the identified tools. Initially, the core DQM functionalities were assessed, such as data profiling, custom DQ rule creation, anomaly detection, data cleansing, report generation, rule detection, data enrichment. Subsequently, additional data management functionalities such as master data management, data lineage, data cataloging, semantic discovery, and integration were considered. The final stage of the review examined the tools' compatibility with data warehouses and General Data Protection Regulation (GDPR) compliance. Tools that did not meet these criteria were excluded. As such, the 3rd section of the protocol evaluated the tool's environment and connectivity features, such as whether it operates in the cloud, hybrid, or on-premises, its API support, input data types (.txt, .csv, .xlsx, .json), and its ability to connect to data sources including relational and non-relational databases, data warehouses, cloud data storages, data lakes. Additionally, it assessed whether the tool processes data on-premises or in the vendor’s cloud environment. Tools were excluded based on criteria such as not supporting data warehouses or processing data externally.
These protocols (filled) are available in file "DQ Tools Analysis"
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global SBOM Vulnerability Enrichment with AI market size is valued at USD 1.12 billion in 2024, with a robust compound annual growth rate (CAGR) of 24.8% projected through the forecast period. By 2033, the market is anticipated to reach USD 9.36 billion, driven by escalating cyber threats and increasing regulatory mandates for software supply chain transparency. The integration of artificial intelligence with Software Bill of Materials (SBOM) vulnerability enrichment processes is a key growth driver, enabling organizations to proactively identify, prioritize, and remediate security risks within complex software ecosystems.
A primary growth factor for the SBOM Vulnerability Enrichment with AI market is the exponential rise in software supply chain attacks. As organizations increasingly rely on third-party and open-source components, the attack surface expands, making it challenging to track vulnerabilities across diverse software assets. AI-powered enrichment solutions enhance SBOMs by automating the detection of hidden or emerging vulnerabilities, correlating threat intelligence, and providing actionable insights for remediation. This automation not only accelerates vulnerability management but also reduces the risk of human error, which is critical in large-scale, fast-paced development environments. The growing complexity of modern software, coupled with the need for real-time visibility, is compelling enterprises to invest in advanced SBOM vulnerability enrichment platforms.
Another significant growth driver is the tightening regulatory landscape surrounding software security and supply chain transparency. Governments and industry bodies worldwide are enacting stringent requirements for organizations to maintain accurate SBOMs and demonstrate effective vulnerability management practices. Regulations such as the US Executive Order on Improving the Nation’s Cybersecurity and the EU’s Cyber Resilience Act have made SBOMs and their enrichment with AI a compliance imperative. Organizations are under pressure to not only generate SBOMs but also continuously update and enrich them with the latest vulnerability data. AI-powered solutions are uniquely positioned to fulfill these regulatory demands by providing scalable, automated, and auditable enrichment processes, thereby reducing compliance burdens and potential penalties.
The increasing adoption of DevSecOps practices and the shift towards continuous integration/continuous deployment (CI/CD) pipelines are further fueling demand for SBOM Vulnerability Enrichment with AI. In fast-moving development environments, manual vulnerability management is no longer feasible. AI-driven enrichment seamlessly integrates with CI/CD workflows, enabling real-time vulnerability identification and risk prioritization throughout the software development lifecycle. This proactive approach supports organizations in building secure-by-design applications, reducing technical debt, and enhancing overall cyber resilience. The synergy between AI, SBOMs, and DevSecOps is expected to be a cornerstone of software security strategies in the years ahead.
From a regional perspective, North America currently leads the SBOM Vulnerability Enrichment with AI market, accounting for over 39% of global revenue in 2024. The region’s dominance is attributed to a high concentration of technology-driven enterprises, early adoption of AI in cybersecurity, and a proactive regulatory environment. Europe follows closely, driven by stringent data protection laws and growing investment in supply chain security. Asia Pacific is emerging as the fastest-growing market, propelled by rapid digital transformation, increasing cyber threats, and rising awareness among enterprises about software supply chain risks. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions prioritize cybersecurity modernization.
The SBOM Vulnerability Enrichment with AI market is segmented by component into software and services, each playing a pivotal role in the ecosystem. The software segment dominates with a substantial share, owing to the rapid adoption of advanced AI-powered platforms that automate the enrichment of SBOMs with real-time vulnerability intelligence. These software solution
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Government Open Data Management Platform Market Size 2025-2029
The government open data management platform market size is valued to increase by USD 189.4 million, at a CAGR of 12.5% from 2024 to 2029. Rising demand for digitalization in government operations will drive the government open data management platform market.
Market Insights
North America dominated the market and accounted for a 38% growth during the 2025-2029.
By End-user - Large enterprises segment was valued at USD 108.50 million in 2023
By Deployment - On-premises segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 138.56 million
Market Future Opportunities 2024: USD 189.40 million
CAGR from 2024 to 2029 : 12.5%
Market Summary
The market witnesses significant growth due to the increasing demand for digitalization in government operations. Open data management platforms enable governments to make large volumes of data available to the public in a machine-readable format, fostering transparency and accountability. The adoption of advanced technologies such as artificial intelligence (AI) and machine learning (ML) in these platforms enhances data analysis capabilities, leading to more informed decision-making. However, data privacy concerns remain a major challenge in the open data management market. Governments must ensure the protection of sensitive information while making data publicly available. A real-world business scenario illustrating the importance of open data management platforms is supply chain optimization in the public sector.
By sharing data related to procurement, logistics, and inventory management, governments can streamline their operations and improve efficiency. For instance, a city government could share real-time traffic data to optimize public transportation routes, reducing travel time and improving overall service delivery. Despite these benefits, it is crucial for governments to address data security concerns and establish robust data management policies to ensure the safe and effective use of open data platforms.
What will be the size of the Government Open Data Management Platform Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
The market continues to evolve, with recent research indicating a significant increase in data reuse initiatives among government agencies. The use of open data platforms in the public sector has grown by over 25% in the last two years, driven by a need for transparency and improved data-driven decision making. This trend is particularly notable in areas such as compliance and budgeting, where accurate and accessible data is essential. Data replication strategies, data visualization libraries, and data portal design are key considerations for government agencies looking to optimize their open data management platforms.
Effective data discovery tools and metadata schema design are crucial for ensuring data silos are minimized and data usage patterns are easily understood. Data privacy regulations, such as GDPR and HIPAA, also require robust data governance frameworks and data security audits to maintain data privacy and protect against breaches. Data access logs, data consistency checks, and data quality dashboards are essential components of any open data management platform, ensuring data accuracy and reliability. Data integration services and data sharing platforms enable seamless data exchange between different agencies and departments, while data federation techniques allow for data to be accessed in its original source without the need for data replication.
Ultimately, these strategies contribute to a more efficient and effective data lifecycle, allowing government agencies to make informed decisions and deliver better services to their constituents.
Unpacking the Government Open Data Management Platform Market Landscape
The market encompasses a range of solutions designed to facilitate the efficient and secure handling of data throughout its lifecycle. According to recent studies, organizations adopting data lifecycle management practices experience a 30% reduction in data processing costs and a 25% improvement in ROI. Performance benchmarking is crucial for ensuring optimal system scalability, with leading platforms delivering up to 50% faster query response times than traditional systems. Data anonymization techniques and data modeling methods enable compliance with data protection regulations, while open data standards streamline data access and sharing. Data lineage tracking and metadata management are essential for maintaining data quality and ensuring data interoperability. API integration strategies and data transformation methods enable seamless data enrichment processes and knowledge graph implementation. Data access control, data versioning, and data security protocols
Facebook
TwitterThe replication package for the MaschinenBauIndustrie Knowledge Graph (MBI-KG) provides all necessary resources for reproducing the structured and semantically enriched dataset derived from the 1937 book “Die Maschinen-Industrie im Deutschen Reich.” This package includes the scanned images, OCR-extracted data, structured datasets, semantically enriched datasets, and scripts used for data transformation, semantic enrichment, and integration into the open-source knowledge graph platform. Documentation guides users through the entire replication process, from raw data extraction to knowledge graph generation. Please note that due to manual quality checks applied to the data, the bulk files included in this package may differ slightly from results generated solely using the provided scripts. These quality checks ensure enhanced data accuracy and reliability for users of the bulk files. The package also includes data export files in CSV, RDF (ttl), JSON, and NDJSON formats to ensure compatibility with various analytical tools, as well as SPARQL queries, API scripts, and bulk data downloads to support advanced querying and integration. This replication package promotes transparency and reproducibility, offering a valuable resource for researchers in economic history, digital humanities, and data science to further explore Germany's mechanical engineering industry in the early 20th century.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Thank you for your interest in and use of the Wildfire dataset. If you find this dataset valuable in your research, we kindly ask that you cite the original article where it was first introduced, rather than the URL to this Kaggle page.
Please cite: El-Madafri I, Peña M, Olmedo-Torre N. The Wildfire Dataset: Enhancing Deep Learning-Based Forest Fire Detection with a Diverse Evolving Open-Source Dataset Focused on Data Representativeness and a Novel Multi-Task Learning Approach. Forests. 2023; 14(9):1697. https://doi.org/10.3390/f14091697
The wildfire dataset symbolizes a concerted effort to explore the capabilities of RGB imagery in the realm of forest fire detection through machine learning methodologies. Spanning 2,700 aerial and ground-based images, it has been curated from a diverse array of online platforms such as government databases, Flickr, and Unsplash. By capturing a wide spectrum of environmental scenarios, forest variants, geographical locations, and the intricate dynamics of forest ecosystems and fire events, the dataset stands as a thoughtful benchmark for research in forest fire detection.
Every image within this dataset is sourced from the Public Domain, emphasizing the dataset's commitment to transparency. Users can delve into the specifics, accessing detailed information about the URL origins and resolutions for each image. A defining feature of our research is the innovative Multi-Task Learning framework that integrates multi-class confounding elements, crafted specifically to refine forest fire detection. This methodology aims to bolster the model's accuracy and reduce false alarms. Its merits become particularly evident when benchmarked against traditional classification techniques.
Dataset Highlights:
Resolution Insights:
These specifications underline the dataset's commitment to providing detailed and quality-oriented imagery apt for in-depth machine learning analysis.
As a gesture of fostering open research while acknowledging intellectual contributions, this dataset is licensed under the CC BY 4.0 License. We kindly request that users, when leveraging this dataset in their work or publications, extend the courtesy of citing the associated research paper.
Citation: El-Madafri I, Peña M, Olmedo-Torre N. The Wildfire Dataset: Enhancing Deep Learning-Based Forest Fire Detection with a Diverse Evolving Open-Source Dataset Focused on Data Representativeness and a Novel Multi-Task Learning Approach. Forests. 2023; 14(9):1697. https://doi.org/10.3390/f14091697
This dataset has been created by:
Ismail El-Madafri Universitat Politècnica de Catalunya - BarcelonaTech (UPC)
Marta Peña Universitat Politècnica de Catalunya - BarcelonaTech (UPC)
Noelia Olmedo-Torre Universitat Politècnica de Catalunya - BarcelonaTech (UPC)
Please cite this work appropriately in any publications or projects where you use this dataset.
Facebook
Twitter
According to our latest research, the EO Data Harmonization Pipelines market size globally reached USD 1.94 billion in 2024, and is projected to grow at a robust CAGR of 13.2% from 2025 to 2033, culminating in a forecasted market value of USD 5.62 billion by 2033. This dynamic growth is primarily attributed to the surging demand for integrated Earth Observation (EO) data across diverse industries, driven by the need for accurate, real-time, and interoperable geospatial insights for decision-making. The market is experiencing significant advancements in data processing technologies and AI-driven harmonization tools, which are further propelling adoption rates on a global scale. As per our comprehensive analysis, the increasing complexity of EO data sources and the critical need for standardized, high-quality data pipelines remain pivotal growth factors shaping the future of this market.
One of the primary growth drivers for the EO Data Harmonization Pipelines market is the exponential increase in the volume and variety of EO data generated by satellites, drones, and ground-based sensors. As governments, research institutions, and commercial enterprises deploy more sophisticated EO platforms, the diversity in data formats, resolutions, and temporal frequencies has created a pressing need for harmonization solutions. These pipelines enable seamless integration, cleansing, and transformation of disparate datasets, ensuring consistency and reliability in downstream analytics. The proliferation of AI and machine learning algorithms within these pipelines has further enhanced their ability to automate data normalization, anomaly detection, and metadata enrichment, resulting in more actionable and timely insights for end-users across sectors.
Another significant factor contributing to market growth is the increasing adoption of EO data for environmental monitoring, agriculture, disaster management, and urban planning. Governments and private organizations are leveraging harmonized EO data to monitor deforestation, predict crop yields, assess disaster risks, and optimize urban infrastructure planning. The ability to harmonize multi-source data streams enables stakeholders to generate comprehensive, cross-temporal analyses that support sustainable development goals and climate resilience strategies. The integration of cloud-based platforms has democratized access to harmonized EO data, allowing even small and medium enterprises to leverage advanced geospatial analytics without substantial upfront investments in hardware or specialized personnel.
Furthermore, the rising emphasis on interoperability and data sharing among international agencies, research institutions, and commercial providers is fueling the demand for robust EO data harmonization pipelines. Initiatives such as the Global Earth Observation System of Systems (GEOSS) and the European Copernicus program underscore the importance of standardized data frameworks for global collaboration. These trends are driving investments in open-source harmonization tools, API-driven architectures, and scalable cloud infrastructures that can support multi-stakeholder data exchange. As regulatory requirements for data quality and provenance intensify, organizations are increasingly prioritizing investments in harmonization technologies to ensure compliance and maintain competitive advantage in the rapidly evolving EO ecosystem.
From a regional perspective, North America currently dominates the EO Data Harmonization Pipelines market, accounting for over 38% of the global market share in 2024, followed by Europe and Asia Pacific. The United States, in particular, benefits from a mature EO ecosystem, substantial government funding, and a vibrant commercial space sector. Europe’s growth is propelled by strong policy frameworks and cross-border collaborations, while Asia Pacific is rapidly emerging as a high-growth region, driven by increasing investments in satellite infrastructure and smart city initiatives. Latin America and the Middle East & Africa are also witnessing steady adoption, supported by international development programs and growing awareness of EO’s value in addressing regional challenges such as agriculture productivity and climate adaptation.
Facebook
TwitterArachnida Bait Design and Testing FilesThe ZIP archive contains the files used to identify, design, and test probes targeting conserved loci in Arachnids. The -design-steps.md file is a description of the design steps followed for the group. The "BAM" directory contains mappings of real/simulated reads to the base genome sequence for the group. The "BED" directory contains BAM files converted to BED, as well as intermediate BED files created during BED processing. The "BED" directory also contains the database of putatively conserved loci in the exemplar+base genome taxa, and the temporary probe design file. The "-probes" directory contains the database of temporary probe mappings to exemplar taxa, several intermediate files, and the principal probe design file. The "in-silico-test" directory contains all data from in-silico testing of the principal probe design file.arachnida.zipArachnida-UCE-1.1K-v1 Bait DesignA target enrichment probe set designed from 1,120 UCE loci identifi...
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Robust annotation of compounds is a critical element in metabolomics. The 13C-detection NMR experiment incredible natural abundance double-quantum transfer experiment (INADEQUATE) stands out as a powerful tool for structural elucidation, but this valuable experiment is not often included in metabolomics studies. This is partly due to the lack of a community platform that provides structural information based on INADEQUATE. Also, it is often the case that a single study uses various NMR experiments synergistically to improve the quality of information or balance total NMR experiment time, but there is no public platform that can integrate the outputs of INADEQUATE with other NMR experiments. Here, we introduce PyINETA, a Python-based INADEQUATE network analysis. PyINETA is an open-source platform that provides structural information on molecules using INADEQUATE, conducts database searches using an INADEQUATE library, and integrates information on INADEQUATE and a complementary NMR experiment 13C J-resolved experiment (13C-JRES). 13C-JRES was chosen because of its ability to efficiently provide relative quantification in a study of the 13C-enriched samples. Those steps are carried out automatically, and PyINETA keeps track of all the pipeline parameters and outputs, ensuring the transparency of annotation in metabolomics. Our evaluation of PyINETA using a model mouse study showed that PyINETA successfully integrated INADEQUATE and 13C-JRES. The results showed that 13C-labeled amino acids that were fed to mice were transferred to different tissues and were transformed to other metabolites. The distribution of those compounds was tissue-specific, showing enrichment of specific metabolites in the liver, spleen, pancreas, muscle, or lung. PyINETA is freely available on NMRbox.
Facebook
Twitterhttps://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The U.S. Identity Theft Protection Services Market size was valued at USD 2.96 USD Billion in 2023 and is projected to reach USD 6.75 USD Billion by 2032, exhibiting a CAGR of 12.5 % during the forecast period. Recent developments include: September 2023: TransUnion launched TruIQ Data Enrichment, which catered to financial institutions' need for accelerated data access and enhanced analytics capabilities. This solution offered unlimited on-demand access to credit data within a customer's private domain, providing a secure environment for data linking and enhancing analytics without relying on third-party processors., September 2023: Symantec partnered with Google Cloud to incorporate generative Artificial Intelligence (AI) into its Symantec Security platform in a phased rollout. Symantec utilized the security-specific language model (LLM) called Sec-PaLM 2 and Cloud Security AI Workbench by Google to enable natural language interfaces and generate more comprehensive threat analysis., September 2023: LexisNexis Legal & Professional launched Lexis+ Ireland, providing legal practitioners in Ireland with a user-friendly platform offering comprehensive research tools and practical guidance. This advanced solution consolidates an extensive range of legal resources of the country, including over 500 practice notes, templates, legislative data, and more than 20,000 Irish court judgments, thereby enhancing accessibility for legal professionals and students., February 2023: Discover launched a new website dedicated to open-source software development and fostering a tech community. This initiative aimed to promote knowledge sharing, skill enhancement, and collaboration among engineers, coinciding with the company's participation in the Linux Foundation and FINOS., January 2023: Experian introduced CreditLock, a newly launched feature enabling customers to secure their Experian Credit Report instantly. This innovative tool enabled individuals to 'lock' or 'unlock' their reports, blocking high-risk credit applications and providing real-time alerts to customers to prevent fraudulent activities, all without impacting their credit score.. Key drivers for this market are: Surging Demand for Electric and Hybrid Vehicles to Drive Market Growth. Potential restraints include: High Cost of Identity Theft Protection Services and Lesser Availability of Free Services to Impede Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Observability Pipeline market size reached USD 2.1 billion in 2024, with robust demand driven by the increasing complexity of IT infrastructures and the need for real-time data insights. The market is experiencing a strong growth trajectory, registering a CAGR of 18.4% from 2025 to 2033. At this pace, the Observability Pipeline market is forecasted to achieve a value of USD 10.4 billion by 2033. Key growth factors include the surge in cloud-native applications, the proliferation of microservices architectures, and the critical need for enhanced operational visibility across diverse enterprise environments.
The primary driver behind the rapid expansion of the Observability Pipeline market is the escalating complexity in IT ecosystems, spurred by digital transformation initiatives across sectors. Organizations are increasingly adopting microservices, containers, and hybrid cloud environments, which generate massive volumes of telemetry data. This data must be efficiently collected, processed, and routed to various analytics and monitoring tools, necessitating advanced observability pipelines. These solutions enable enterprises to gain actionable insights, ensure system reliability, and quickly resolve incidents, thereby enhancing customer experience and operational efficiency. The demand for real-time observability is further accentuated by the growing reliance on mission-critical applications that require uninterrupted uptime and performance.
Another significant growth factor is the rising emphasis on security, compliance, and regulatory mandates. As organizations manage sensitive data across distributed systems, the need to monitor logs, metrics, and traces in real time becomes paramount. Observability pipelines play a crucial role in aggregating and filtering telemetry data, facilitating early detection of anomalies, compliance breaches, and security threats. This capability is particularly vital in industries such as BFSI, healthcare, and government, where adherence to strict regulatory standards is non-negotiable. The integration of artificial intelligence and machine learning within observability pipelines is also empowering enterprises to automate anomaly detection and predictive analytics, further fueling market growth.
The proliferation of DevOps and agile methodologies has also contributed to the widespread adoption of observability pipelines. Modern software development practices demand continuous integration, continuous delivery (CI/CD), and rapid deployment cycles. Observability pipelines streamline the flow of telemetry data from diverse sources to monitoring, alerting, and analytics platforms, enabling DevOps teams to maintain high application performance, quickly identify bottlenecks, and accelerate incident response times. This agility is a critical competitive differentiator in today's fast-paced digital economy, driving investments in observability pipeline solutions across organizations of all sizes.
From a regional perspective, North America currently dominates the Observability Pipeline market, accounting for the largest share in 2024, thanks to the presence of leading technology vendors, high cloud adoption rates, and a mature enterprise IT landscape. Europe and Asia Pacific are also witnessing substantial growth, propelled by digital transformation initiatives, increasing investments in cloud infrastructure, and the rapid expansion of IT and telecommunications sectors. The Asia Pacific region, in particular, is expected to exhibit the highest CAGR during the forecast period, as enterprises across China, India, and Southeast Asia accelerate their adoption of advanced observability solutions to support their digital ambitions.
The Component segment of the Observability Pipeline market is bifurcated into software and services, each playing a vital role in the value chain. Software solutions form the backbone of observability pipelines, providing core functionalities such as data ingestion, transformation, enrichment, routing, and integration with downstream analytics platforms. The demand for robust, scalable, and flexible software is soaring as organizations seek to manage exponentially growing telemetry data from diverse sources. These software platforms are increasingly leveraging open-source technologies and supporting integrations with popular observability tools such as Promet
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The results of base calling were aligned to the reference using BWA-MEM [28]. The first column reports the percentage of reads that aligned to the reference on at least 90% of their length. The accuracy was computed as the number of matches in the alignment divided by the length of the alignment. The speed is measured in events per second.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Meaning and Understanding in Human-Centric AI (MUHAI) Benchmark Task 1 (Short story generation with Knowledge Graphs and Language Models) The dataset can be used to test understandability of text generated through the combination of knowledge graphs and language models without using knowledge graph embeddings. The task here is to generate 5-sentence stories from a set of subject-predicate-object triples that are extracted from a knowledge graph. Two steps need to be performed: 1. Language model fine-tuning (SVO triple extraction + model fine-tuning) 2. Story generation (knowledge enrichment + text generation) The submission includes the following data: Original ROC stories corpus (100 stories) ROC stories encoded with relevant triples (extracted through SpaCy, 2 versions, with and without coreference resolution) Stories generated by the pre-trained model (GPT2-simple) Stories generated by the fine-tuned model (DICE + ConceptNet + DBpedia ) Stories generated by the fine-tuned model (DICE + ConceptNet + DBpedia + WordNet ) Stories generated by the GPT-2-keyword-generation (an open-source software that uses GPT-2 to generate text pertaining to the specified keywords) Model results Evaluation metrics description User-evaluation questionnaire Code : https://github.com/kmitd/muhai-dice_story
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chromatin accessibility sequencing has been widely used for uncovering genetic regulatory mechanisms and inferring gene regulatory networks. However, effectively integrating large-scale chromatin accessibility datasets has posed a significant challenge. This is due to the lack of a comprehensive end-to-end solution, as many existing tools primarily emphasize data pre-processing and overlook downstream analyses. To bridge this gap, we have introduced cisDynet, a holistic solution that combines streamlined data pre-processing using Snakemake and R functions with advanced downstream analysis capabilities. cisDynet excels in conventional data analyses, encompassing peak statistics, peak annotation, differential analysis, motif enrichment analysis, and more. Additionally, it allows to perform sophisticated data exploration such as tissue-specific peak identification, time-course data fitting, integration of RNA-seq data to establish peak-to-gene associations, constructing regulatory networks, and conducting enrichment analysis of GWAS variants. As a proof of concept, we applied cisDynet to re-analyze the comprehensive ATAC-seq datasets across various tissues from the ENCODE project. The analysis successfully delineated tissue-specific open chromatin regions (OCRs), established connections between OCRs and target genes, and effectively linked these discoveries with 1,861 GWAS variants. Furthermore, cisDynet was instrumental in dissecting the time-course open chromatin data of mouse embryonic development, revealing the dynamic behavior of OCRs over time and identifying key transcription factors governing differentiation trajectories. In summary, cisDynet offers researchers a user-friendly solution that minimizes the need for extensive coding, ensures the reproducibility of results, and greatly simplifies the exploration of epigenomic data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains the complete environment for 'Finding 4: The Open-Source Scraper'. It tests if an AI, tasked with building a 'comprehensive' user profile, will indiscriminately scrape sensitive information from public data breach dumps, mistaking 'publicly accessible' for 'ethically appropriate.' The agent's single-minded focus on 'completeness' leads it to make no ethical distinction between a personal blog and a data breach record, causing it to import password hashes and other sensitive data into a user's profile, thereby creating a new security liability.