100+ datasets found

D
Data Fusion Solutions Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Fusion Solutions Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-fusion-solutions-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Oct 5, 2024
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Fusion Solutions Market Outlook

The global data fusion solutions market size is anticipated to grow significantly from USD 10.2 billion in 2023 to USD 25.7 billion by 2032, with a compound annual growth rate (CAGR) of 11.2% during the forecast period. This robust growth is primarily driven by the increasing demand for real-time data analysis, the integration of advanced technologies such as AI and machine learning, and the rising need for comprehensive data management solutions across various industries.

One of the primary growth factors for the data fusion solutions market is the exponential increase in data generation and the subsequent need for effective data management and analysis tools. As businesses and government entities increasingly rely on data-driven decision-making, the ability to amalgamate diverse data sources into a coherent and actionable format becomes crucial. Technologies like IoT, AI, and machine learning are further augmenting this demand by enabling more sophisticated data fusion capabilities, thereby providing deeper insights and fostering innovation across sectors.

Another significant driver is the growing complexity and diversity of data types that organizations need to manage. Traditional data management systems are often inadequate for handling the vast volumes and varieties of data generated today. Data fusion solutions, which integrate data from multiple sources to produce more accurate and comprehensive information, are becoming essential. This is particularly true in industries such as healthcare, defense, and transportation, where timely and accurate data integration can lead to better outcomes and operational efficiencies.

The third major growth factor is the critical role of data fusion in enhancing security and surveillance systems. In the defense and surveillance sector, for example, data fusion technologies are employed to combine inputs from various sensors, cameras, and other sources to provide a complete situational awareness picture. This capability is not only vital for national security but also for public safety, traffic management, and disaster response. The growing investments in smart cities and intelligent transportation systems are further propelling the demand for advanced data fusion solutions.

Regionally, North America is expected to dominate the data fusion solutions market throughout the forecast period. This can be attributed to the high adoption rate of advanced technologies, significant investments in R&D, and the presence of major market players in the region. Europe and Asia Pacific are also anticipated to witness substantial growth, driven by technological advancements, increasing government initiatives, and the rapid expansion of industries such as healthcare, transportation, and defense in these regions.

Component Analysis

The data fusion solutions market is segmented by components into software, hardware, and services. The software segment is expected to hold the largest market share, driven by the increasing demand for advanced data analytics and management tools. These software solutions are versatile and can be tailored to meet the specific needs of various industries, thereby enhancing their appeal. Moreover, the integration of AI and machine learning technologies into data fusion software is providing more sophisticated and accurate data analysis capabilities, which is further fuelling market growth.

Hardware components, although not as dominant as software, still play a crucial role in the data fusion ecosystem. The hardware segment includes sensors, data storage devices, and processing units that are essential for collecting, storing, and analyzing vast amounts of data. Advances in sensor technology and the increasing deployment of IoT devices are driving the demand for more robust and high-performance hardware solutions. Additionally, the development of edge computing technologies is enhancing the capability of hardware to process data closer to the source, thereby reducing latency and improving real-time decision-making.

The services segment encompasses various support services such as consulting, implementation, and maintenance, which are vital for the successful deployment and operation of data fusion solutions. As businesses increasingly invest in data fusion technologies, the demand for specialized services to ensure seamless integration and optimal performance
Data from: Combining data sets with different phylogenetic histories
zenodo.org
data.niaid.nih.gov
+2more
html
Updated May 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John J. Wiens; John J. Wiens (2022). Data from: Combining data sets with different phylogenetic histories [Dataset]. http://doi.org/10.5061/dryad.123
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.123
Dataset updated
May 29, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
John J. Wiens; John J. Wiens
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I suggest a simple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories), until a majority of unlinked data sets supports one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis at recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters and/or high homoplasy), and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, separate, consensus, and combined analysis may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of conflicting taxa is also problematic in that it may obfuscate the position of conflicting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.
d
Addresses (Open Data)
catalog.data.gov
data.tempe.gov
+10more
Updated Jul 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
Explore at:
Dataset updated
Jul 19, 2025
Dataset provided by
City of Tempe
Description
This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary
c
School Learning Modalities, 2021-2022
s.cnmilf.com
datahub.hhs.gov
+4more
Updated Mar 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). School Learning Modalities, 2021-2022 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/school-learning-modalities
Explore at:
Dataset updated
Mar 26, 2025
Dataset provided by
Centers for Disease Control and Prevention
Description
The 2021-2022 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2021-2022 school year and the Fall 2022 semester, from August 2021 – December 2022. These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021. School learning modality types are defined as follows: In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels. Remote: Schools within the district do not offer face-to-face instruction; all learning is conducted online/remotely to all students at all available grade levels. Hybrid: Schools within the district offer a combination of in-person and remote learning; face-to-face instruction is offered less than 5 days per week, or only to a subset of students. Data Information School learning modality data provided here are model estimates using combined input data and are not guaranteed to be 100% accurate. This learning modality dataset was generated by combining data from four different sources: Burbio [1], MCH Strategic Data [2], the AEI/Return to Learn Tracker [3], and state dashboards [4-20]. These data were combined using a Hidden Markov model which infers the sequence of learning modalities (In-Person, Hybrid, or Remote) for each district that is most likely to produce the modalities reported by these sources. This model was trained using data from the 2020-2021 school year. Metadata describing the _location, number of schools and number of students in each district comes from NCES [21]. You can read more about the model in the CDC MMWR: COVID-19–Related School Closures and Learning Modality Changes — United States, August 1–September 17, 2021. The metrics listed for each school learning modality reflect totals by district and the number of enrolled students per district for which data are available. School districts represented here exclude private schools and include the following NCES subtypes: Public school district that is NOT a component of a supervisory union Public school district that is a component of a supervisory union Independent charter district “BI” in the state column refers to school districts funded by the Bureau of Indian Education. Technical Notes Data from August 1, 2021 to June 24, 2022 correspond to the 2021-2022 school year. During this time frame, data from the AEI/Return to Learn Tracker and most state dashboards were not available. Inferred modalities with a probability below 0.6 were deemed inconclusive and were omitted. During the Fall 2022 semester, modalities for districts with a school closure reported by Burbio were updated to either “Remote”, if the closure spanned the entire week, or “Hybrid”, if the closure spanned 1-4 days of the week. Data from August
c
Internet Yellow Pages
catalog.caida.org
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Internet Health Report (2023). Internet Yellow Pages [Dataset]. https://catalog.caida.org/dataset/internet_yellow_pages
Explore at:
Dataset updated
Oct 10, 2023
Dataset authored and provided by
Internet Health Report
Description
Graph database of Internet resources combining data from multiple sources
Data from: Modelling of ready biodegradability based on combined public and...
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Filippo Lunghini; Gilles Marcou; Gilles Marcou; Philippe Gantzer; Philippe Azam; Dragos Horvath; Dragos Horvath; Erik Van Miert; Alexandre Varnek; Alexandre Varnek; Filippo Lunghini; Philippe Gantzer; Philippe Azam; Erik Van Miert (2020). Modelling of ready biodegradability based on combined public and industrial data sources [Dataset]. http://doi.org/10.5281/zenodo.3540701
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3540701
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Filippo Lunghini; Gilles Marcou; Gilles Marcou; Philippe Gantzer; Philippe Azam; Dragos Horvath; Dragos Horvath; Erik Van Miert; Alexandre Varnek; Alexandre Varnek; Filippo Lunghini; Philippe Gantzer; Philippe Azam; Erik Van Miert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB). In-silico prediction is a valid alternative to expensive and time-consuming experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues.

In this work we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (BA = 0.74 – 0.79) and data coverage (83 – 91 %).

The Generative Topographic Mapping approach was employed to compare the chemical space of the various data sources: several chemotypes and structural motifs unique to the industrial dataset were identified, highlighting for which chemical classes currently available models may have less reliable predictions.

Finally, public and industrial data were merged into Global dataset containing 3146 compounds and including a significant subset of compounds coming from the industrial context. This is the biggest dataset reported in the literature so far which covers some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has much larger applicability domain than related models built on publicly available data. The developed model is available for the user on the Laboratory of Chemoinformatics website.

This dataset is only the "All-Public" set, since the industrial compounds cannot be disclosed.

This update contains additional entries from [J. Chem. Inf. Model. 52 (2012), pp. 655–669] and [J. Chem. Inf. Model. 53 (2013), pp. 867–878]
d
Joiner
search.dataone.org
dataverse.harvard.edu
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HU, Tao (2024). Joiner [Dataset]. http://doi.org/10.7910/DVN/0BM2IQ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/0BM2IQ
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
HU, Tao
Description
The joiner is a component often used in workflows to merge or join data from different sources or intermediate steps into a single output. In the context of Common Workflow Language (CWL), the joiner can be implemented as a step that combines multiple inputs into a cohesive dataset or output. This might involve concatenating files, merging data frames, or aggregating results from different computations.
D
Data Integration Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Mar 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Integration Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-integration-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Mar 20, 2024
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Integration Market Outlook 2032

The global data integration market size was USD 13.33 Billion in 2023 and is likely to reach USD 36.76 Billion by 2032, expanding at a CAGR of 11.93 % during 2024–2032. The market growth is attributed to the growing need for businesses to comply with various regulatory requirements and the increasing demand for real-time data.

Increasing demand for real-time data is expected to boost the global data integration market. Real-time data allows businesses to make decisions quickly and accurately. However, to make the most of this data, it needs to be integrated with other data sources to provide a complete picture. This is where data integration comes in, enabling businesses to combine data from different sources and make informed decisions. Therefore, the rising demand for real-time data is propelling the market.

Data integration solutions are widely used in several industries including BFSI, healthcare, IT & telecom, retail, manufacturing, and others as these solutions allow businesses for better analysis and insights, leading to effective strategies and actions. Additionally, data integration automates the process of gathering, combining, and processing data. This saves time and reduces the risk of errors compared to manual data handling. These benefits offered by data integration solutions encourage industries to deploy these solutions into their operation for better decision-making.

Impact of Artificial Intelligence (AI) in Data Integration Market

Artificial Intelligence (AI) is revolutionizing the data integration market by automating and optimizing the process of combining data from different sources. AI algorithms identify patterns and relationships in data, enabling them to accurately map and integrate data from various sources. This not only reduces
f
Descriptive statistics of sexual violence victim-survivors in the Crime...
plos.figshare.com
xls
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Estela Capelas Barbosa; Niels Blom; Annie Bunce (2025). Descriptive statistics of sexual violence victim-survivors in the Crime Survey for England and Wales (CSEW) and Rape Crisis England & Wales (RCEW) datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0301155.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301155.t001
Dataset updated
Jan 14, 2025
Dataset provided by
PLOS ONE
Authors
Estela Capelas Barbosa; Niels Blom; Annie Bunce
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Descriptive statistics of sexual violence victim-survivors in the Crime Survey for England and Wales (CSEW) and Rape Crisis England & Wales (RCEW) datasets.
R
Cv Project 4 C Dataset
universe.roboflow.com
zip
Updated May 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cvproject (2024). Cv Project 4 C Dataset [Dataset]. https://universe.roboflow.com/cvproject-d8hm5/cv-project-4-c/model/5
Explore at:
zipAvailable download formats
Dataset updated
May 4, 2024
Dataset authored and provided by
Cvproject
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cattle Polygons
Description
Dataset Description:

The dataset consists of 6570 grayscale images, meticulously handpicked and curated for instance segmentation tasks. These images have been meticulously annotated to delineate individual object instances, providing a comprehensive dataset for training and evaluating instance segmentation models.

Data Collection Process:

The images within the dataset were collected through a rigorous process involving multiple sources and datasets. Leveraging the capabilities of Roboflow Universe, the team behind the project meticulously handpicked images from various publicly available sources and datasets relevant to the domain of interest. These sources may include online repositories, research datasets, and proprietary collections, ensuring a diverse and representative sample of data.

Preprocessing and Data Integration:

To ensure uniformity and consistency across the dataset, several preprocessing techniques were applied. First, the images were automatically oriented to correct any orientation discrepancies. Next, they were resized to a standardized resolution of 640x640 pixels, facilitating efficient training and inference. Moreover, to simplify the data and focus on the essential features, the images were converted to grayscale.

Furthermore, to augment the dataset and enhance its diversity, multiple datasets were combined and integrated into a single cohesive collection. This involved harmonizing annotation formats, resolving potential conflicts, and ensuring compatibility across different datasets. Through meticulous preprocessing and integration efforts, disparate datasets were seamlessly merged into a unified dataset, enriching its variability and ensuring comprehensive coverage of object instances and scenarios.

Model Details:

The instance segmentation model deployed for this dataset is built upon Roboflow 3.0 architecture, leveraging the Fast variant for efficient inference. Trained using the COCO instance segmentation dataset as its checkpoint, the model exhibits robust performance in accurately delineating object boundaries and classifying instances within the images.

Performance Metrics:

The model achieves impressive performance metrics, including a mAP of 76.5%, precision of 76.7%, and recall of 73.5%. These metrics underscore the model's effectiveness in accurately localizing and classifying object instances, demonstrating its suitability for various computer vision tasks.

Conclusion:

In summary, the dataset represents a culmination of meticulous data collection, preprocessing, and integration efforts, resulting in a comprehensive resource for instance segmentation tasks. By combining multiple datasets and leveraging advanced preprocessing techniques, the dataset offers diverse and representative imagery, enabling robust model training and evaluation. With the high-performance instance segmentation model and impressive performance metrics, the dataset serves as a valuable asset for researchers, developers, and practitioners in the field of computer vision.

For further information and access to the dataset, please visit Roboflow Universe.
c
School Learning Modalities, 2020-2021
s.cnmilf.com
datahub.hhs.gov
+3more
Updated Mar 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). School Learning Modalities, 2020-2021 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/school-learning-modalities-2020-2021
Explore at:
Dataset updated
Mar 26, 2025
Dataset provided by
Centers for Disease Control and Prevention
Description
The 2020-2021 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2020-2021 school year, from August 2020 – June 2021. These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021. School learning modality types are defined as follows: In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels. Remote: Schools within the district do not offer face-to-face instruction; all learning is conducted online/remotely to all students at all available grade levels. Hybrid: Schools within the district offer a combination of in-person and remote learning; face-to-face instruction is offered less than 5 days per week, or only to a subset of students. Data Information School learning modality data provided here are model estimates using combined input data and are not guaranteed to be 100% accurate. This learning modality dataset was generated by combining data from four different sources: Burbio [1], MCH Strategic Data [2], the AEI/Return to Learn Tracker [3], and state dashboards [4-20]. These data were combined using a Hidden Markov model which infers the sequence of learning modalities (In-Person, Hybrid, or Remote) for each district that is most likely to produce the modalities reported by these sources. This model was trained using data from the 2020-2021 school year. Metadata describing the _location, number of schools and number of students in each district comes from NCES [21]. You can read more about the model in the CDC MMWR: COVID-19–Related School Closures and Learning Modality Changes — United States, August 1–September 17, 2021. The metrics listed for each school learning modality reflect totals by district and the number of enrolled students per district for which data are available. School districts represented here exclude private schools and include the following NCES subtypes: Public school district that is NOT a component of a supervisory union Public school district that is a component of a supervisory union Independent charter district “BI” in the state column refers to school districts funded by the Bureau of Indian Education. Technical Notes Data from September 1, 2020 to June 25, 2021 correspond to the 2020-2021 school year. During this timeframe, all four sources of data were available. Inferred modalities with a probability below 0.75 were deemed inconclusive and were omitted. Data for the month of July may show “In Person” status although most school districts are effectively closed during this time for summer break. Users may wish to exclude July data from use for this reason where applicable. Sources K-12 School Opening Tracker. Burbio 2021; https
TRACE-A Merge Data - Dataset - NASA Open Data Portal
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). TRACE-A Merge Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/trace-a-merge-data-f7243
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
TRACE-A_Merge_Data is merge data files created from data collected onboard the DC-8 aircraft during the Transport and Atmospheric Chemistry near the Equator - Atlantic (TRACE-A) suborbital campaign. Data collection for this product is complete.The TRACE-A mission was a part of NASA’s Global Tropospheric Experiment (GTE) – an assemblage of missions conducted from 1983-2001 with various research goals and objectives. TRACE-A was conducted in the Atlantic from September 21 to October 24, 1992. TRACE-A had the objective of determining the cause and source of the high concentrations of ozone that accumulated over the Atlantic Ocean between southern Africa and South America from August to October. NASA partnered with the Brazilian Space Agency (INPE) to accomplish this goal. The NASA DC-8 aircraft and ozonesondes were utilized during TRACE-A to collect the necessary data. The DC-8 was equipped with 19 instruments. A few instruments on the DC-8 include the Differential Absorption Lidar (DIAL), the Laser-Induced Fluorescence, the O3-NO Ethylene/Forward Scattering Spectrometer, the Modified Licor, and the DACOM IR Laser Spectrometer. The DIAL was responsible for a variety of measurements, which include Nadir IR aerosols, Nadir UV aerosols, Zenith IR aerosols, Zenith VS aerosols, ozone, and ozone column. The Laser-Induced Fluorescence instrument collected measurements on NxOy in the atmosphere. Measurements of ozone were recorded by the O3-NO Ethylene/Forward Scattering Spectrometer while the Modified Licor recorded CO2. Finally, the DACOM IR Laser Spectrometer gathered an assortment of data points, including CO, O3, N2O, CH4, and CO2. Ozonesondes played a role in data collection for TRACE-A along with the DC-8 aircraft. The sondes were dropped from the DC-8 aircraft in order to gather data on ozone, temperature, and atmospheric pressure.
n
Data supporting: Methodological overview and data-merging approaches in the...
data.niaid.nih.gov
datadryad.org
zip
Updated Jun 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elena Quintero; Jorge Isla; Pedro Jordano (2021). Data supporting: Methodological overview and data-merging approaches in the study of plant-frugivore interactions [Dataset]. http://doi.org/10.5061/dryad.jm63xsjb8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.jm63xsjb8
Dataset updated
Jun 29, 2021
Dataset provided by
Estación Biológica de Doñana
Authors
Elena Quintero; Jorge Isla; Pedro Jordano
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Recording species interactions is one of the main challenges in ecological studies. Frugivory has received much attention for decades as a model for mutualisms among free-living species, and a variety of methods have been designed and developed for sampling and monitoring plant–frugivore interactions. The diversity of techniques poses an important challenge when comparing, combining or replicating results from different sources with different methodologies. With the emergence of modern techniques, such as molecular analysis or multimedia remote recorders, issues when combining data from different sources have become especially relevant. We provide an overview of all the techniques used for monitoring endozoochorous primary seed dispersal, focusing on a critical appraisal of the advantages and limitations, as well as the context-dependency nature, of the different methods. We propose five data merging approaches potentially useful to combine frugivory interactions data from different methodologies. Additionally, we provide two case studies where we combine empirical data from plant–animal interactions in Mediterranean shrublands using different methodologies. Data merging resulted in a net increase in the number of distinct pairwise interactions recorded and compensated biases inherent to different methods, resulting in a more robust estimation of network topological descriptors. These case studies clarify the context-dependent character of the merging approaches, highlighting the value of collecting detailed information on the sampling effort in terms of reliable results and reproducibility. Finally, we discuss the trends with different methodological approaches used in the last decades and future perspectives in this field. Methods We used two empirical datasets to illustrate data merging approaches, with two different organization levels. Both case studies are focused on plant–frugivore interactions taking place in the Mediterranean shrubland of Doñana National Park, Huelva, Spain. In each case study two sampling methods were used to maximise animal–plant interactions detected. The first case is an individual-based study on the avian frugivore assemblage of Pistacia lentiscus (Anacardiaceae) in El Puntal area, where monitoring cameras and DNA-barcoding were used to record interactions. Cameras methodology involved placing continuous-monitoring cameras (GoPro Hero® 7 model) facing individual plants. Forty individual plants were filmed for approximately 2 hours in several runs in different days (total of 84.5h). Any avian visitation was recorded as an interaction, yielding a total of 397 visitation records. Cameras were operative from sunrise for 2h and recording was set at maximum resolution. Data resulting from this sampling can be given as total number of records, or standardized by sampling time (no. records h-1). DNA-barcoding methodology (González-Varo et al. 2014) was done to faecal samples collected in seed traps located under the same forty individual trees. Seed traps were working for 102.7±8.9 days (mean ±SD) per plant. A total of 1371 faecal samples were analyzed (mean no. per plant: 33.8±15.2). Samples were collected regardless of whether or not they had seeds, as an indicator of a visitation event. Data resulting from this sampling can be given as a total number of records with positive identification of a given frugivore species, or standardized by the sampling time with seed traps actively operating in the field (no. records/trap/day or similar) The second case is a community-based study aiming to document species-specific plant–frugivore interactions in Hato Ratón, where analysis of fecal samples obtained with mist-netting and focal observations were used to detect interactions. Estimation of the dietary diversity of frugivore species through mist-netting, relied on seeds identification found in bird feces together with microhistological techniques to identify the non-seed remains of fruits by examining under microscope (40X, 100X) the shape, size, and structures (trichomes, glands) of exocarp tissue cells. This allowed not only the identification of fruit species when no seeds are present but also the relative volume occupied in the sample, so that an estimate of the corresponding number of fruits ingested can be derived (Jordano 1988). Between 6-10 mist nets were operated weekly for 1–2 days (for a total of 84 sampling days and 4080.5 mist-net hours), totalling 3541 fecal samples analyzed. Feeding records of frugivores visiting fruiting plants were obtained during 1.0 km-length walk censuses in the area, with 2–5 censuses carried out per month (123 sampling days), totalling 89.5 km and 2031 records. These focal observations were based on spot censuses where interactions are recorded during short stops as the observer advances along a fixed transect. In some cases (<15 % of the records) where no handling was observed but just the visit to the plant, the number of fruits was approximated from data on feeding rate (no. fruits/visit). Data resulting from this sampling can be given as total number of records, or standardized by sampling time (no. records km-1 census or no. records h-1 or day-1, or similar). For more information please refer to the Supplementary Material of our manuscript: Quintero, Isla & Jordano 2021 Methodological overview and data-merging approaches in the study of plant-frugivore interactions. Oikos 00: 1–18, 2021. doi: 10.1111/oik.08379. Additional data, codes of analysis, final merged datasets, and supplementary figures are available in Zenodo (http://doi.org/10.5281/zenodo.4751889) or GitHub repository (https://github.com/PJordano-Lab/MS_Oikos_FSD_Monitoring_interactions).
h
paperlists
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paper Copilot (2025). paperlists [Dataset]. https://huggingface.co/datasets/papercopilot/paperlists
Explore at:
Dataset updated
May 28, 2025
Dataset authored and provided by
Paper Copilot
Description
Paper Lists

This repository powers Paper Copilot, combining data from multiple sources to ensure coherence, consistency, and comprehensiveness. Typically, records from OpenReview, official conference sources, or open access sites are scattered, leading to fragmented information and extra effort to navigate between them. The aim of this repository is to serve as a comprehensive link collection for major conferences, enabling easier access to relevant information, and statistical… See the full description on the dataset page: https://huggingface.co/datasets/papercopilot/paperlists.
Western North American FLEXPART Back Trajectory 1994-2021 Merge Data
catalog.data.gov
Updated Jul 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/LARC/SD/ASDC (2025). Western North American FLEXPART Back Trajectory 1994-2021 Merge Data [Dataset]. https://catalog.data.gov/dataset/western-north-american-flexpart-back-trajectory-1994-2021-merge-data
Explore at:
Dataset updated
Jul 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
Western North Region, United States
Description
WNA-FLEXPART-BackTraj-1994-2021-Merge is the combined 1994-2021 Western North America Back Trajectory data using the FLEXible PARTicle (FLEXPART) dispersion model. Data collection for this product is complete.Backward simulations of airmass transport using a Lagrangian Particle Dispersion Model (LPDM) framework can establish source-receptor relationships (SRRs), supporting analysis of source contributions from various geospatial regions and atmospheric layers to downwind observations. In this study, we selected receptor locations to match gridded ozone observations over Western North America (WNA) from ozonesonde, lidar, commercial aircraft sampling, and aircraft campaigns (1994-2021). For each receptor, we used the FLEXible PARTicle (FLEXPART) dispersion model, driven by ERA5 reanalysis data, to achieve 15-day backwards SRR calculations, providing global simulations at high temporal (hourly) and spatial (1° x 1°) resolution, from the surface up to 20 km above ground level. This product retains detailed information for each receptor, including the gridded ozone value product, allowing the user to illustrate and identify source contributions to various subsets of ozone observations in the troposphere above WNA over nearly 3 decades at different vertical layers and temporal scales, such as diurnal, daily, seasonal, intra-annual, and decadal. This model product can also support source contribution analyses for other atmospheric components observed over WNA, if other co-located observations have been made at the spatial and temporal scales defined for some or all of the gridded ozone receptors used here.
Annual Population Survey: Well-Being, April 2011 - March 2015: Secure Access...
beta.ukdataservice.ac.uk
Updated 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Survey Division Office For National Statistics (2016). Annual Population Survey: Well-Being, April 2011 - March 2015: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-7961-1
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-7961-1
Dataset updated
2016
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Authors
Social Survey Division Office For National Statistics
Description
The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS) (held at the UK Data Archive under GN 33246), all of its associated LFS boosts and the APS boost. Thus, the APS combines results from five different sources: the LFS (waves 1 and 5); the English Local Labour Force Survey (LLFS), the Welsh Labour Force Survey (WLFS), the Scottish Labour Force Survey (SLFS) and the Annual Population Survey Boost Sample (APS(B) - however, this ceased to exist at the end of December 2005, so APS data from January 2006 onwards will contain all the above data apart from APS(B)). Users should note that the LLFS, WLFS, SLFS and APS(B) are not held separately at the UK Data Archive. For further detailed information about methodology, users should consult the Labour Force Survey User Guide, selected volumes of which have been included with the APS documentation for reference purposes (see 'Documentation' table below).

The APS aims to provide enhanced annual data for England, covering a target sample of at least 510 economically active persons for each Unitary Authority (UA)/Local Authority District (LAD) and at least 450 in each Greater London Borough. In combination with local LFS boost samples such as the WLFS and SLFS, the survey provides estimates for a range of indicators down to Local Education Authority (LEA) level across the United Kingdom.

APS Well-Being data
Since April 2011, the APS has included questions about personal and subjective well-being. The responses to these questions have been made available as annual sub-sets to the APS Person level files. It is important to note that the size of the achieved sample of the well-being questions within the dataset is approximately 165,000 people. This reduction is due to the well-being questions being only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. As a result some caution should be used when using analysis of responses to well-being questions at detailed geography areas and also in relation to any other variables where respondent numbers are relatively small. It is recommended that for lower level geography analysis that the variable UACNTY09 is used.

As well as annual datasets, three-year pooled datasets are available. When combining multiple APS datasets together, it is important to account for the rotational design of the APS and ensure that no person appears more than once in the multiple year dataset. This is because the well-being datasets are not designed to be longitudinal e.g. they are not designed to track individuals over time/be used for longitudinal analysis. They are instead cross-sectional, and are designed to use a cross-section of the population to make inferences about the whole population. For this reason, the three-year dataset has been designed to include only a selection of the cases from the individual year APS datasets, chosen in such a way that no individuals are included more than once, and the cases included are approximately equally spread across the three years. Further information is available in the 'Documentation' section below.

Secure Access APS Well-Being data
Secure Access datasets for the APS Well-Being include additional variables not included in either the standard End User Licence (EUL) versions (see under GN 33357) or the Special Licence (SL) access versions (see under GN 33376). Extra variables that typically can be found in the Secure Access version but not in the EUL or SL versions relate to:
geography, including:
Postcodes
Census Area Statistics (CAS) Wards
Census Output Areas
Nomenclature of Units for Territorial Statistics (NUTS) level 2 and 3 areas
Lower and Middle Layer Super Output Areas
Travel to Work Areas
Unitary authority / Local Authority District of place of work (main job)
region of place of work for first and second jobs
qualifications, education and training including level of highest qualification, qualifications from Government schemes, qualifications related to work, qualifications from school, qualifications from university of college and qualifications gained from outside the UK
detailed ethnic group for Scottish respondents
detailed religious denomination for Northern Irish respondents
length health problem has limited activity
learning difficulty or learning disability
occupation in apprenticeship or second job
number of bedrooms
number of dependent children in household aged under 19
Prospective users of the Secure Access version of the APS Well-Being will need to fulfil additional requirements, commencing with the completion of an extra application form to demonstrate to the data owners exactly why they need access to the extra, more detailed variables, in order to obtain permission to use that version. Secure Access data users must also complete face-to-face training and agree to the Secure Access User Agreement and Licence Compliance Policy (see 'Access' section below). Therefore, users are encouraged to download and inspect the EUL version of the data prior to ordering the Secure Access (or SL) version. Further details and links to all APS studies available from the UK Data Archive can be found via the APS Key Data series webpage.

APS Well-Being Datasets: Information, July 2016
From 2012-2015, the ONS published separate APS datasets aimed at providing initial estimates of subjective well-being, based on the Integrated Household Survey. In 2015 these were discontinued. A separate set of well-being variables and a corresponding weighting variable have been added to the April-March APS person datasets from A11M12 onwards. Users should no longer use the bespoke well-being datasets (SNs 6994, 6999, 7091, 7092, 7364, 7365, 7565, 7566 and 7961, but should now use the variables included on the April-March APS person datasets instead. Further information on the transition can be found on the Personal well-being in the UK: 2015 to 2016

Documentation and coding frames
The APS is compiled from variables present in the LFS. For variable and value labelling and coding frames that are not included either in the data or in the current APS documentation (e.g. coding frames for education, industrial and geographic variables, which are held in LFS User Guide Vol.5, Classifications), users are advised to consult the latest versions of the LFS User Guides, which are available from the ONS Labour Force Survey - User Guidance webpages.

May 2018 Update
Due to a change in the Travel-to-Work Area coding structure from 2001 to 2011, the variable TTWA9D has been relabelled in the pooled data file for 2012-2015.
e
Merger of BNV-D data (2008 to 2019) and enrichment
data.europa.eu
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick VINCOURT, Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f
Explore at:
zip(18530465)Available download formats
Dataset authored and provided by
Patrick VINCOURT
Description
Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}
Data from: Global data on crop nutrient concentration and harvest indices
data.niaid.nih.gov
datadryad.org
zip
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cameron I. Ludemann; Renske Hijbeek; Marloes van Loon; T. Scott Murrell; Achim Dobermann; Martin van Ittersum (2023). Global data on crop nutrient concentration and harvest indices [Dataset]. http://doi.org/10.5061/dryad.n2z34tn0x
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.n2z34tn0x
Dataset updated
Nov 27, 2023
Dataset provided by
International Fertilizer Associationhttp://www.fertilizer.org/
Wageningen University & Research
African Plant Nutrition Institute
Authors
Cameron I. Ludemann; Renske Hijbeek; Marloes van Loon; T. Scott Murrell; Achim Dobermann; Martin van Ittersum
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Estimates of crop nutrient removal (as crop products and crop residues) are an important component of crop nutrient balances. Crop nutrient removal can be estimated through multiplication of the quantity of crop products or crop residues (removed) by the nutrient concentration of those crop products and crop residue components respectively. Data for quantities of crop products removed at a country level are available through FAOSTAT (https://www.fao.org/faostat/en/), but equivalent data for quantities of crop residues are not available at a global level. However, quantities of crop residues can be estimated if the relationship between quantity of crop residues and crop products is known. Harvest index (HI) provides one such indication of the relationship between quantity of crop products and crop residues. HI is the proportion of above-ground biomass as crop products and can be used to estimate quantity of crop residues based on quantity of crop products. Previously, meta-analyses or surveys have been performed to estimate nutrient concentrations of crop products and crop residues and harvest indices (collectively known as crop coefficients). The challenges for using these coefficients in global nutrient balances include the representativeness of world regions or countries. Moreover, it may be unclear which countries or crop types are actually represented in the analyses of data. In addition, units used among studies differ which makes comparisons challenging. To overcome these challenges, data from meta-analyses and surveys were collated in one dataset with standardised units and referrals to the original region and crop names used by the sources of data. Original region and crop names were converted into internationally recognised names, and crop coefficients were summarised into two Tiers of data, representing the world (Tier 1, with single coefficient values for the world) and specific regions or countries of the world (Tier 2, with single coefficient values for each country). This dataset will aid both global and regional analyses for crop nutrient balances.

Methods Data acquisition Data were primarily collated from meta-analyses found in scientific literature. Terms used in Ovid (https://ovidsp.ovid.com/), CAB Abstracts (https://www.cabdirect.org/) and Google Scholar (https://scholar.google.com/) were: (crop) AND (“nutrient concentration” OR “nutrient content” OR “harvest index”) across any time. This search resulted in over 245,000 results. These results were refined to include studies that purported to represent crop nutrient concentration and/or harvest index of crops for geographic regions of the world, as opposed to site-specific field experiments. Given the range in different crops grown globally, preference was given to acquiring datasets that included multiple crops. In some cases, authors of meta-analyses were asked for raw data to aid the standardisation process. In addition, the International Fertilizer Association (IFA), and the Food and Agriculture Organization of the United Nations (UN FAO) provided data used for crop nutrient balances (FAOSTAT 2020). The request to UN FAO yielded phosphorus and potassium crop nutrient concentrations in addition to their publicly available nitrogen concentration values (FAOSTAT 2020). In total the refined search resulted in 26 different sources of data. Data files were converted to separate comma-delimited CSV files for each source of data, whereby a unique ‘source’ was a dataset from an article from the scientific literature or a dataset sent by the UN FAO or IFA. Crop nutrient concentrations were expressed as a percentage of dry matter and/or the percentage of fresh weight depending on which units were reported and whether dry matter percentages of crop fresh weight were reported. Meta-data text files were written to accompany each standardized CSV file. The standardized CSV files for each source of data included information on the name of the original region, the crop coefficients it purported to represent, as well as the original names of the crops as categorised by the authors of the data. If the data related to a meta-analysis of multiple sources, information was included for the primary source of data when available. Data from the separate source files were collated into one file named ‘Combined_crop_data.csv’ using R Studio (version 4.1.0) (hereafter referred to as R) with the scripts available at https://github.com/ludemannc/Tier_1_2_crop_coefficients.git. Processing of data When transforming the combined data file (‘Combined_crop_data.csv’) into representative crop coefficients for different regions (available in ‘Tier_1_and_2_crop_coefficients.csv’), crop coefficients that were duplicates from the same primary source of data were excluded from processing. For instance, Zhang et al. (2021) referred to multiple primary sources of data, and the data requested from the UN FAO and the IFA referred (in many cases) to crop coefficients from IPNI (2014). Duplicate crop coefficient data that came from the same primary source were therefore excluded from the summarised dataset of crop coefficients. Two tiers of data The data were sub-divided into two Tiers to help overcome the challenge of using these data in a global nutrient balance when data are not available for every country. This follows the approach taken by the Intergovernmental Panel for Climate Change-IPCC (IPCC 2019). Data were assigned different ‘Tiers’ based on complexity and data requirements. · Tier 1: crop coefficients at the world level. · Tier 2: crop coefficients at more granular geographic regions of the world (e.g. at regional, country or sub-country levels).
Crop coefficients were summarised as means for each crop item and crop component based on either ‘Tier 1’ or ‘Tier 2’. One could also envision a more detailed site-specific level (Tier 3). The data in this dataset did not meet the required level of complexity or data requirements for Tier 3, unlike, say, the site-specific data being collected as part of the Consortium for Precision Crop Nutrition (CPCN) (www.cropnutrientdata.net)-which could be described as being Tier 3. No data from the current dataset were therefore assigned to Tier 3. It is expected that in the future, site-specific data will be used to improve the crop coefficients further with a Tier 3 approach. The ‘Tier_1_and_2_crop_coefficients.csv’ file includes mean crop coefficients for the Tier 1 data, and mean crop coefficients for the Tier 2 data. The Tier 1 estimates of crop coefficients were mean values across Tier 1 data that purported to represent the World. Crop coefficients found in the data sources represent quite different geographic areas or regions. To enable combining data with different spatial overlaps for Tier 2, data were disaggregated to the country level. First, each region was assigned a list of countries (which the regional averages were assumed to represent, as listed in the ‘Original_region_names_and_assigned_countries.csv’ file). Countries were assigned alpha-3 country codes following the ISO 3166 international standards (https://www.iso.org/publication/PUB500001.html). Second, for each country mean, crop coefficients were calculated based on coefficients from regions listed for each country. For Australia for example, the mean values for each crop coefficient were calculated from values that represented sub-country (e.g. Australia New South Wales South East), country (Australia), and multi-country (e.g. Oceania) regions. For instance, if there was a harvest index value of 0.5 for wheat for the original region ‘Australia New South Wales South East’, a value of 0.51 for the original region named ‘Australia’ and a value of 0.47 for the original region named ‘Oceania’, then the mean Tier 2 harvest index for wheat for the country Australia would be 0.493, the unweighted mean. Using our dataset, a user can assign different weights to each entry. To aid analysis, the names of the original categories of crop were converted into UN FAO crop ‘item’ categories, following UN FAO standards (FAOSTAT 2022) (available in the ‘Original_crop_names_in_each_item_category.csv’ file). These item categories were also assigned categorical numeric codes following UN FAO standards (FAOSTAT 2022). Data related to crop products (e.g. grain, beans, saleable tubers or fibre) were assigned the category “Crop_products” and crop residues (eg straw, stover) were assigned the category “Crop_residues”. Dry and fresh matter weights In some cases nutrient concentration values from the original sources were available on a dry matter or a fresh weight basis, but not both. Gaps in either the nutrient concentration on a dry matter or fresh weight basis were given imputed values. If the data source mentioned the dry matter percentage of the crop component then this was preferentially used to impute the other missing nutrient concentration data. If dry matter percentage information was not available for a particular crop item or crop component, missing data were imputed using the mean dry matter percentage values across all Tier 1 and Tier 2 data. Global means for the UN FAO Cropland Nutrient Budget. Data were also summarised as means for nitrogen (N), elemental phosphorus (P) and elemental potassium (K) nutrient concentrations of crop products using data that represented the world (Tier 1) for the 2023 UN FAO Cropland Nutrient Budget. These data are available in the file named World_crop_coefficients_for_UN_FAO.csv.
Supplementary material for "Spatio-temporal modelling of abundance from...
zenodo.org
zip
Updated May 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Strebel; Marc Kéry; Jérôme Guélat; Thomas Sattler; Nicolas Strebel; Marc Kéry; Jérôme Guélat; Thomas Sattler (2022). Supplementary material for "Spatio-temporal modelling of abundance from multiple data sources in an integrated spatial distribution model" [Dataset]. http://doi.org/10.5281/zenodo.5840377
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5840377
Dataset updated
May 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicolas Strebel; Marc Kéry; Jérôme Guélat; Thomas Sattler; Nicolas Strebel; Marc Kéry; Jérôme Guélat; Thomas Sattler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

Aim: In biodiversity monitoring, observational data are often collected in multiple, disparate schemes with greatly varying degrees of standardization and possibly at different spatial and temporal scales. Technical advances also change the type of data over time. The resulting heterogeneous data sets are often deemed to be incompatible. Consequently, many available data sets may be ignored in practical analyses. Here, we propose a more efficient use of disparate biodiversity data to assess species distributions and population trends.

Location: Switzerland (Europe)

Taxon: Birds

Methods: We developed an integrated, hierarchical species distribution model with a joint likelihood for all data sets using a shared state process (e.g., latent species abundance or occurrence), but distinct observation process for each data set. We show how the abundance submodel of a binomial N-mixture model can fuse four different data types (count, detection/non-detection, presence-only, and absence-only data) and enable improved inferences about spatio-temporal patterns in abundance. As case studies, we use data from multiple avian biodiversity monitoring schemes. In the first, the goal is estimating abundance-based species distribution maps. In the second, we infer trends in population abundance across time.

Results: Accuracy and precision of abundance estimates increased when combining data from different sources compared to using a single data source alone. This is particularly valuable when data from each single data source is too sparse for reliable parameter estimation.
Main conclusions: We show that exploiting the complementary nature of "cheap", but abundant, citizen-science data and less abundant, but more information-rich, data from structured monitoring programs might be ideal to estimate distribution and population trends more accurately, especially for rare species. Joint likelihoods allow to include a wide variety of different data sets to (1) combine all the available information and to (2) mitigate weaknesses of one by the strength of another.
H
Data for: "Linking Datasets on Organizations Using Half a Billion...
dataverse.harvard.edu
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Connor Jerzak (2025). Data for: "Linking Datasets on Organizations Using Half a Billion Open-Collaborated Records" [Dataset]. http://doi.org/10.7910/DVN/EHRQQL
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/EHRQQL
Dataset updated
Jan 13, 2025
Dataset provided by
Harvard Dataverse
Authors
Connor Jerzak
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract: Scholars studying organizations often work with multiple datasets lacking shared unique identifiers or covariates. In such situations, researchers usually use approximate string (``fuzzy'') matching methods to combine datasets. String matching, although useful, faces fundamental challenges. Even when two strings appear similar to humans, fuzzy matching often does not work because it fails to adapt to the informativeness of the character combinations. In response, a number of machine-learning methods have been developed to refine string matching. Yet, the effectiveness of these methods is limited by the size and diversity of training data. This paper introduces data from a prominent employment networking site (LinkedIn) as a massive training corpus to address these limitations. We show how, by leveraging information from LinkedIn regarding organizational name-to-name links, we can improve upon existing matching benchmarks, incorporating the trillions of name pair examples from LinkedIn into various methods to improve performance by explicitly maximizing match probabilities inferred from the LinkedIn corpus. We also show how relationships between organization names can be modeled using a network representation of the LinkedIn data. In illustrative merging tasks involving lobbying firms, we document improvements when using the LinkedIn corpus in matching calibration and make all data and methods open source. Keywords: Record linkage; Interest groups; Text as data; Unstructured data

Facebook

Twitter

Click to copy link

Link copied

Cite

Dataintelo (2024). Data Fusion Solutions Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-fusion-solutions-market

Data Fusion Solutions Market Report | Global Forecast From 2025 To 2033

Explore at:

csv, pdf, pptxAvailable download formats

Dataset updated

Oct 5, 2024

Authors

Dataintelo

License

https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

Time period covered

2024 - 2032

Area covered

Global

Description

Data Fusion Solutions Market Outlook

The global data fusion solutions market size is anticipated to grow significantly from USD 10.2 billion in 2023 to USD 25.7 billion by 2032, with a compound annual growth rate (CAGR) of 11.2% during the forecast period. This robust growth is primarily driven by the increasing demand for real-time data analysis, the integration of advanced technologies such as AI and machine learning, and the rising need for comprehensive data management solutions across various industries.

One of the primary growth factors for the data fusion solutions market is the exponential increase in data generation and the subsequent need for effective data management and analysis tools. As businesses and government entities increasingly rely on data-driven decision-making, the ability to amalgamate diverse data sources into a coherent and actionable format becomes crucial. Technologies like IoT, AI, and machine learning are further augmenting this demand by enabling more sophisticated data fusion capabilities, thereby providing deeper insights and fostering innovation across sectors.

Another significant driver is the growing complexity and diversity of data types that organizations need to manage. Traditional data management systems are often inadequate for handling the vast volumes and varieties of data generated today. Data fusion solutions, which integrate data from multiple sources to produce more accurate and comprehensive information, are becoming essential. This is particularly true in industries such as healthcare, defense, and transportation, where timely and accurate data integration can lead to better outcomes and operational efficiencies.

The third major growth factor is the critical role of data fusion in enhancing security and surveillance systems. In the defense and surveillance sector, for example, data fusion technologies are employed to combine inputs from various sensors, cameras, and other sources to provide a complete situational awareness picture. This capability is not only vital for national security but also for public safety, traffic management, and disaster response. The growing investments in smart cities and intelligent transportation systems are further propelling the demand for advanced data fusion solutions.

Regionally, North America is expected to dominate the data fusion solutions market throughout the forecast period. This can be attributed to the high adoption rate of advanced technologies, significant investments in R&D, and the presence of major market players in the region. Europe and Asia Pacific are also anticipated to witness substantial growth, driven by technological advancements, increasing government initiatives, and the rapid expansion of industries such as healthcare, transportation, and defense in these regions.

Component Analysis

The data fusion solutions market is segmented by components into software, hardware, and services. The software segment is expected to hold the largest market share, driven by the increasing demand for advanced data analytics and management tools. These software solutions are versatile and can be tailored to meet the specific needs of various industries, thereby enhancing their appeal. Moreover, the integration of AI and machine learning technologies into data fusion software is providing more sophisticated and accurate data analysis capabilities, which is further fuelling market growth.

Hardware components, although not as dominant as software, still play a crucial role in the data fusion ecosystem. The hardware segment includes sensors, data storage devices, and processing units that are essential for collecting, storing, and analyzing vast amounts of data. Advances in sensor technology and the increasing deployment of IoT devices are driving the demand for more robust and high-performance hardware solutions. Additionally, the development of edge computing technologies is enhancing the capability of hardware to process data closer to the source, thereby reducing latency and improving real-time decision-making.

The services segment encompasses various support services such as consulting, implementation, and maintenance, which are vital for the successful deployment and operation of data fusion solutions. As businesses increasingly invest in data fusion technologies, the demand for specialized services to ensure seamless integration and optimal performance

Clear search

Close search

Google apps

Main menu

Data Fusion Solutions Market Report | Global Forecast From 2025 To 2033

Data Fusion Solutions Market Outlook

Component Analysis

Data from: Combining data sets with different phylogenetic histories

Addresses (Open Data)

School Learning Modalities, 2021-2022

Internet Yellow Pages

Data from: Modelling of ready biodegradability based on combined public and...

Joiner

Data Integration Market Report | Global Forecast From 2025 To 2033

Data Integration Market Outlook 2032

Impact of Artificial Intelligence (AI) in Data Integration Market

Descriptive statistics of sexual violence victim-survivors in the Crime...

Cv Project 4 C Dataset

School Learning Modalities, 2020-2021

TRACE-A Merge Data - Dataset - NASA Open Data Portal

Data supporting: Methodological overview and data-merging approaches in the...

paperlists

Western North American FLEXPART Back Trajectory 1994-2021 Merge Data

Annual Population Survey: Well-Being, April 2011 - March 2015: Secure Access...

Merger of BNV-D data (2008 to 2019) and enrichment

Data from: Global data on crop nutrient concentration and harvest indices

Supplementary material for "Spatio-temporal modelling of abundance from...

Data for: "Linking Datasets on Organizations Using Half a Billion...

Data Fusion Solutions Market Report | Global Forecast From 2025 To 2033

Data Fusion Solutions Market Outlook

Component Analysis