100+ datasets found
  1. d

    Joiner

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HU, Tao (2024). Joiner [Dataset]. http://doi.org/10.7910/DVN/0BM2IQ
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    HU, Tao
    Description

    The joiner is a component often used in workflows to merge or join data from different sources or intermediate steps into a single output. In the context of Common Workflow Language (CWL), the joiner can be implemented as a step that combines multiple inputs into a cohesive dataset or output. This might involve concatenating files, merging data frames, or aggregating results from different computations.

  2. COVID-19 Global Case and Death Data

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). COVID-19 Global Case and Death Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/covid-19-global-case-and-death-data
    Explore at:
    zip(81724234 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    The Devastator
    Description

    COVID-19 Global Case and Death Data

    Global COVID-19 Cases and Deaths Over Time

    By Coronavirus (COVID-19) Data Hub [source]

    About this dataset

    The COVID-19 Global Time Series Case and Death Data is a comprehensive collection of global COVID-19 case and death information recorded over time. This dataset includes data from various sources such as JHU CSSE COVID-19 Data and The New York Times.

    The dataset consists of several columns providing detailed information on different aspects of the COVID-19 situation. The COUNTRY_SHORT_NAME column represents the short name of the country where the data is recorded, while the Data_Source column indicates the source from which the data was obtained.

    Other important columns include Cases, which denotes the number of COVID-19 cases reported, and Difference, which indicates the difference in case numbers compared to the previous day. Additionally, there are columns such as CONTINENT_NAME, DATA_SOURCE_NAME, COUNTRY_ALPHA_3_CODE, COUNTRY_ALPHA_2_CODE that provide additional details about countries and continents.

    Furthermore, this dataset also includes information on deaths related to COVID-19. The column PEOPLE_DEATH_NEW_COUNT shows the number of new deaths reported on a specific date.

    To provide more context to the data, certain columns offer demographic details about locations. For instance, Population_Count provides population counts for different areas. Moreover,**FIPS** code is available for provincial/state regions for identification purposes.

    It is important to note that this dataset covers both confirmed cases (Case_Type: confirmed) as well as probable cases (Case_Type: probable). These classifications help differentiate between various types of COVID-19 infections.

    Overall, this dataset offers a comprehensive picture of global COVID-19 situations by providing accurate and up-to-date information on cases, deaths, demographic details like population count or FIPS code), source references (such as JHU CSSE or NY Times), geographical information (country names coded with ALPHA codes) , etcetera making it useful for researchers studying patterns and trends associated with this pandemic

    How to use the dataset

    • Understanding the Dataset Structure:

      • The dataset is available in two files: COVID-19 Activity.csv and COVID-19 Cases.csv.
      • Both files contain different columns that provide information about the COVID-19 cases and deaths.
      • Some important columns to look out for are: a. PEOPLE_POSITIVE_CASES_COUNT: The total number of confirmed positive COVID-19 cases. b. COUNTY_NAME: The name of the county where the data is recorded. c. PROVINCE_STATE_NAME: The name of the province or state where the data is recorded. d. REPORT_DATE: The date when the data was reported. e. CONTINENT_NAME: The name of the continent where the data is recorded. f. DATA_SOURCE_NAME: The name of the data source. g. PEOPLE_DEATH_NEW_COUNT: The number of new deaths reported on a specific date. h.COUNTRY_ALPHA_3_CODE :The three-letter alpha code represents country f.Lat,Long :latitude and longitude coordinates represent location i.Country_Region or COUNTRY_SHORT_NAME:The country or region where cases were reported.
    • Choosing Relevant Columns: It's important to determine which columns are relevant to your analysis or research question before proceeding with further analysis.

    • Exploring Data Patterns: Use various statistical techniques like summarizing statistics, creating visualizations (e.g., bar charts, line graphs), etc., to explore patterns in different variables over time or across regions/countries.

    • Filtering Data: You can filter your dataset based on specific criteria using column(s) such as COUNTRY_SHORT_NAME, CONTINENT_NAME, or PROVINCE_STATE_NAME to focus on specific countries, continents, or regions of interest.

    • Combining Data: You can combine data from different sources (e.g., COVID-19 cases and deaths) to perform advanced analysis or create insightful visualizations.

    • Analyzing Trends: Use the dataset to analyze and identify trends in COVID-19 cases and deaths over time. You can examine factors such as population count, testing count, hospitalization count, etc., to gain deeper insights into the impact of the virus.

    • Comparing Countries/Regions: Compare COVID-19

    Research Ideas

    • Trend Analysis: This dataset can be used to analyze and track the trends of COVID-19 cases and deaths over time. It provides comprehensive global data, allowing researchers and po...
  3. f

    Data sources used in our study.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crooks, Andrew; Mahabir, Ron; Gkountouna, Olga; Sasse, Kuleen; Croitoru, Arie (2024). Data sources used in our study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001376412
    Explore at:
    Dataset updated
    Jun 6, 2024
    Authors
    Crooks, Andrew; Mahabir, Ron; Gkountouna, Olga; Sasse, Kuleen; Croitoru, Arie
    Description

    The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data’s potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.

  4. d

    School Learning Modalities, 2021-2022

    • catalog.data.gov
    • datahub.hhs.gov
    • +5more
    Updated Mar 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). School Learning Modalities, 2021-2022 [Dataset]. https://catalog.data.gov/dataset/school-learning-modalities
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Centers for Disease Control and Prevention
    Description

    The 2021-2022 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2021-2022 school year and the Fall 2022 semester, from August 2021 – December 2022. These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021. School learning modality types are defined as follows: In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels. Remote: Schools within the district do not offer face-to-face instruction; all learning is conducted online/remotely to all students at all available grade levels. Hybrid: Schools within the district offer a combination of in-person and remote learning; face-to-face instruction is offered less than 5 days per week, or only to a subset of students. Data Information School learning modality data provided here are model estimates using combined input data and are not guaranteed to be 100% accurate. This learning modality dataset was generated by combining data from four different sources: Burbio [1], MCH Strategic Data [2], the AEI/Return to Learn Tracker [3], and state dashboards [4-20]. These data were combined using a Hidden Markov model which infers the sequence of learning modalities (In-Person, Hybrid, or Remote) for each district that is most likely to produce the modalities reported by these sources. This model was trained using data from the 2020-2021 school year. Metadata describing the location, number of schools and number of students in each district comes from NCES [21]. You can read more about the model in the CDC MMWR: COVID-19–Related School Closures and Learning Modality Changes — United States, August 1–September 17, 2021. The metrics listed for each school learning modality reflect totals by district and the number of enrolled students per district for which data are available. School districts represented here exclude private schools and include the following NCES subtypes: Public school district that is NOT a component of a supervisory union Public school district that is a component of a supervisory union Independent charter district “BI” in the state column refers to school districts funded by the Bureau of Indian Education. Technical Notes Data from August 1, 2021 to June 24, 2022 correspond to the 2021-2022 school year. During this time frame, data from the AEI/Return to Learn Tracker and most state dashboards were not available. Inferred modalities with a probability below 0.6 were deemed inconclusive and were omitted. During the Fall 2022 semester, modalities for districts with a school closure reported by Burbio were updated to either “Remote”, if the closure spanned the entire week, or “Hybrid”, if the closure spanned 1-4 days of the week. Data from August

  5. TRACE-A Merge Data

    • catalog.data.gov
    • gimi9.com
    • +3more
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA/LARC/SD/ASDC (2025). TRACE-A Merge Data [Dataset]. https://catalog.data.gov/dataset/trace-a-merge-data-204e5
    Explore at:
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    TRACE-A_Merge_Data is merge data files created from data collected onboard the DC-8 aircraft during the Transport and Atmospheric Chemistry near the Equator - Atlantic (TRACE-A) suborbital campaign. Data collection for this product is complete.The TRACE-A mission was a part of NASA’s Global Tropospheric Experiment (GTE) – an assemblage of missions conducted from 1983-2001 with various research goals and objectives. TRACE-A was conducted in the Atlantic from September 21 to October 24, 1992. TRACE-A had the objective of determining the cause and source of the high concentrations of ozone that accumulated over the Atlantic Ocean between southern Africa and South America from August to October. NASA partnered with the Brazilian Space Agency (INPE) to accomplish this goal.  The NASA DC-8 aircraft and ozonesondes were utilized during TRACE-A to collect the necessary data. The DC-8 was equipped with 19 instruments. A few instruments on the DC-8 include the Differential Absorption Lidar (DIAL), the Laser-Induced Fluorescence, the O3-NO Ethylene/Forward Scattering Spectrometer, the Modified Licor, and the DACOM IR Laser Spectrometer. The DIAL was responsible for a variety of measurements, which include Nadir IR aerosols, Nadir UV aerosols, Zenith IR aerosols, Zenith VS aerosols, ozone, and ozone column. The Laser-Induced Fluorescence instrument collected measurements on NxOy in the atmosphere. Measurements of ozone were recorded by the O3-NO Ethylene/Forward Scattering Spectrometer while the Modified Licor recorded CO2. Finally, the DACOM IR Laser Spectrometer gathered an assortment of data points, including CO, O3, N2O, CH4, and CO2. Ozonesondes played a role in data collection for TRACE-A along with the DC-8 aircraft. The sondes were dropped from the DC-8 aircraft in order to gather data on ozone, temperature, and atmospheric pressure.

  6. n

    Data from: Combining data sets with different phylogenetic histories

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Apr 8, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John J. Wiens (2008). Combining data sets with different phylogenetic histories [Dataset]. http://doi.org/10.5061/dryad.123
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2008
    Authors
    John J. Wiens
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I suggest a simple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories), until a majority of unlinked data sets supports one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis at recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters and/or high homoplasy), and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, separate, consensus, and combined analysis may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of conflicting taxa is also problematic in that it may obfuscate the position of conflicting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.

  7. Associations between health impact and other variables in RCEW data, CSEW...

    • plos.figshare.com
    xls
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Estela Capelas Barbosa; Niels Blom; Annie Bunce (2025). Associations between health impact and other variables in RCEW data, CSEW data, and the imputed synthetic dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0301155.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Estela Capelas Barbosa; Niels Blom; Annie Bunce
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Associations between health impact and other variables in RCEW data, CSEW data, and the imputed synthetic dataset.

  8. Patents and Publications from Federal Grants and

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Patents and Publications from Federal Grants and [Dataset]. https://www.kaggle.com/datasets/thedevastator/patents-and-publications-from-federal-grants-and
    Explore at:
    zip(7783025 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Patents and Publications from Federal Grants and Contracts

    Detailed Contract-Level Data for Empirical Research in Innovation and Science

    By [source]

    About this dataset

    This dataset provides comprehensive insights into the relationship between public procurement contracts, research grants, patents, and scientific publications. By combining data from multiple sources (e.g., USPTO, Federal Procurement Database System, Award Submission Portal) we have compiled a detailed contract-level dataset for empirical research in innovation and science.

    This valuable set of information contains columns related to both patents and contracts. Patent-specific information includes filing date, priority rating and claims made on the invention; contractual details such as awarding agency name, recipient organization’s name and award size are also included in this dataset. Furthermore it features detailed grant information like CFDA program numbers, titles of the program and project descriptions etc.; apart from that you will also find vendor data such as contact name & organizational type linked to these grants or contracts. Lastly this compilation is enriched with scientific publication data such as year of publication & WOS id related with patented inventions.

    Get your hands on this ultimate combination of facts on federal government grants & procurement contracts coupled with patent records & scientific publications!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is an invaluable asset for empirical researchers in the field of innovation and science. It can be used to explore trends related to federally funded research and development, as well as investigate the impact of public funding on invention activity among different types of organizations. To leverage the full potential of this dataset, here are some tips on how to use it:

    • Examine patent metadata such as filing date, priority claims etc., for a better understanding of the patented inventions associated with federal grants and contracts.
    • Explore publication information connected to patents such as works cited and WOS id, to gain insights into both patents and publications that may have been impacted by federal funding sources over time.
    • Analyze grant-level data such as CFDA program numbers or project descriptions, in order to evaluate the effectiveness and impact of different programs over time or compare grant outcomes across various industries or fields of study.
    • Look into contract-level information related to awards given by federal agencies including award size or recipient organization in order deepen our knowledge on how government funds are being utilized within different sectors or regions all across America’s economy
      5 Lastly, analyze vendor data regarding contact names, organizations type etc.,to determine if certain vendors have been more successful at obtaining government contracts than others investors over time

    Research Ideas

    • Identifying innovative companies, organizations or research programs in order to develop strategic partnerships.
    • Comparing the quality and quantity of innovation for different federal grants.
    • Analyzing patent and publication trends over time to uncover trends in the technological landscape, gauge policy effectiveness, and inform future government procurement strategies or research initiatives

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: 01_patent_contract.csv | Column name | Description | |:------------------------|:-----------------------------------------------------------------------| | Project Description | A description of the project associated with the grant. (String) | | Filing Date | The date the patent was filed. (Date) | | Vendor Name | The name of the vendor associated with the grant or contract. (String) |

    File: 09_paper_information.csv | Column name | Description ...

  9. V

    School Learning Modalities, 2020-2021

    • data.virginia.gov
    • datahub.hhs.gov
    • +3more
    csv, json, rdf, xsl
    Updated Jun 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2024). School Learning Modalities, 2020-2021 [Dataset]. https://data.virginia.gov/dataset/school-learning-modalities-2020-2021
    Explore at:
    rdf, csv, xsl, jsonAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Centers for Disease Control and Prevention
    Description

    The 2020-2021 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2020-2021 school year, from August 2020 – June 2021.

    These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the https://nces.ed.gov/ccd/files.asp#Fiscal:2,LevelId:5,SchoolYearId:35,Page:1">National Center for Educational Statistics (NCES) for 2020-2021.

    School learning modality types are defined as follows:

      • In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels.
      • Remote: Schools within the district do not offer face-to-face instruction; all learning is conducted online/remotely to all students at all available grade levels.
      • Hybrid: Schools within the district offer a combination of in-person and remote learning; face-to-face instruction is offered less than 5 days per week, or only to a subset of students.

    Data Information

      • School learning modality data provided here are model estimates using combined input data and are not guaranteed to be 100% accurate. This learning modality dataset was generated by combining data from four different sources: Burbio [1], MCH Strategic Data [2], the AEI/Return to Learn Tracker [3], and state dashboards [4-20]. These data were combined using a Hidden Markov model which infers the sequence of learning modalities (In-Person, Hybrid, or Remote) for each district that is most likely to produce the modalities reported by these sources. This model was trained using data from the 2020-2021 school year. Metadata describing the location, number of schools and number of students in each district comes from NCES [21].
      • You can read more about the model in the CDC MMWR: https://www.cdc.gov/mmwr/volumes/70/wr/mm7039e2.htm" target="_blank">COVID-19–Related School Closures and Learning Modality Changes — United States, August 1–September 17, 2021.
      • The metrics listed for each school learning modality reflect totals by district and the number of enrolled students per district for which data are available. School districts represented here exclude private schools and include the following NCES subtypes:
        • Public school district that is NOT a component of a supervisory union
        • Public school district that is a component of a supervisory union
        • Independent charter district
      • “BI” in the state column refers to school districts funded by the Bureau of Indian Education.

    Technical Notes

      • Data from September 1, 2020 to June 25, 2021 correspond to the 2020-2021 school year. During this timeframe, all four sources of data were available. Inferred modalities with a probability below 0.75 were deemed inconclusive and were omitted.
      • Data for the month of July may show “In Person” status although most school districts are effectively closed during this time for summer break. Users may wish to exclude July data from use for this reason where applicable.

    Sources

      1. K-12 School Opening Tracker. Burbio 2021; https

  10. e

    Merger of BNV-D data (2008 to 2019) and enrichment

    • data.europa.eu
    zip
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick VINCOURT (2025). Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f?locale=en
    Explore at:
    zip(18530465)Available download formats
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    Patrick VINCOURT
    Description

    Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

    All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
    3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}

  11. Z

    Supplementary material for "Spatio-temporal modelling of abundance from...

    • data.niaid.nih.gov
    Updated May 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Strebel, Nicolas; Kéry, Marc; Guélat, Jérôme; Sattler, Thomas (2022). Supplementary material for "Spatio-temporal modelling of abundance from multiple data sources in an integrated spatial distribution model" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5840376
    Explore at:
    Dataset updated
    May 2, 2022
    Dataset provided by
    Swiss Ornithological Institute, Sempach, Switzerland
    Authors
    Strebel, Nicolas; Kéry, Marc; Guélat, Jérôme; Sattler, Thomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    Aim: In biodiversity monitoring, observational data are often collected in multiple, disparate schemes with greatly varying degrees of standardization and possibly at different spatial and temporal scales. Technical advances also change the type of data over time. The resulting heterogeneous data sets are often deemed to be incompatible. Consequently, many available data sets may be ignored in practical analyses. Here, we propose a more efficient use of disparate biodiversity data to assess species distributions and population trends.

    Location: Switzerland (Europe)

    Taxon: Birds

    Methods: We developed an integrated, hierarchical species distribution model with a joint likelihood for all data sets using a shared state process (e.g., latent species abundance or occurrence), but distinct observation process for each data set. We show how the abundance submodel of a binomial N-mixture model can fuse four different data types (count, detection/non-detection, presence-only, and absence-only data) and enable improved inferences about spatio-temporal patterns in abundance. As case studies, we use data from multiple avian biodiversity monitoring schemes. In the first, the goal is estimating abundance-based species distribution maps. In the second, we infer trends in population abundance across time.

    Results: Accuracy and precision of abundance estimates increased when combining data from different sources compared to using a single data source alone. This is particularly valuable when data from each single data source is too sparse for reliable parameter estimation. Main conclusions: We show that exploiting the complementary nature of "cheap", but abundant, citizen-science data and less abundant, but more information-rich, data from structured monitoring programs might be ideal to estimate distribution and population trends more accurately, especially for rare species. Joint likelihoods allow to include a wide variety of different data sets to (1) combine all the available information and to (2) mitigate weaknesses of one by the strength of another.

  12. h

    merge-test-again

    • huggingface.co
    Updated Aug 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Vial (2025). merge-test-again [Dataset]. https://huggingface.co/datasets/jackvial/merge-test-again
    Explore at:
    Dataset updated
    Aug 31, 2025
    Authors
    Jack Vial
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Merged LeRobot Dataset

    This dataset was created by merging multiple LeRobot datasets using the LeRobot Data Studio merge tool.

      Source Datasets
    

    This merged dataset combines the following 2 datasets:

    jackvial/koch_screwdriver_attach_orange_panel_28_e5 jackvial/koch_screwdriver_attach_orange_panel_29_e5

      Merge Details
    

    Merge Date: Generated automatically Source Count: 2 datasets Episode Renumbering: Episodes are renumbered sequentially starting from 0… See the full description on the dataset page: https://huggingface.co/datasets/jackvial/merge-test-again.

  13. f

    P-values for comparison of relative risks for study outcomes assessed using...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harding, Jane E.; Milne, Barry; Shahbaz, Mohammad; von Randow, Martin; Gamble, Greg D.; Walters, Anthony (2024). P-values for comparison of relative risks for study outcomes assessed using different data sources alone or in combination. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001424576
    Explore at:
    Dataset updated
    Aug 7, 2024
    Authors
    Harding, Jane E.; Milne, Barry; Shahbaz, Mohammad; von Randow, Martin; Gamble, Greg D.; Walters, Anthony
    Description

    P-values for comparison of relative risks for study outcomes assessed using different data sources alone or in combination.

  14. Kensho College Equity Data

    • kaggle.com
    zip
    Updated Jul 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kensho Impact (2020). Kensho College Equity Data [Dataset]. https://www.kaggle.com/kenshoimpactteam/kensho-college-equity-data
    Explore at:
    zip(832804 bytes)Available download formats
    Dataset updated
    Jul 22, 2020
    Authors
    Kensho Impact
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The Kensho College Equity Dataset is a simple, flat data set of colleges that combines the best information from several available source data sets.

  15. Small Magellanic Cloud ATCA 8640-MHz Source Catalog - Dataset - NASA Open...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Small Magellanic Cloud ATCA 8640-MHz Source Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/small-magellanic-cloud-atca-8640-mhz-source-catalog
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This table contains a new catalog of radio-continuum sources in the field of the Small Magellanic Cloud (SMC). This catalog contains sources found at 8640 MHz (lambda = 3 cm) by combining data from various Australia Telescope Compact Array (ATCA) projects that covered the SMC. Some 457 sources have been detected at 3 cm in the new high-sensitivity and resolution radio-continuum image of the SMC from Crawford et al. (2011, SerAJ, 183, 95). The 3 cm map has a resolution of 20 arcseconds, and a sensitivity of 0.8 mJy/beam. The field size of the image used in this study covered from 00h 26m to 01h 27m in RA (J2000.0) and from -70o 35' to -75o 21' in Dec (J2000.0). The MIRIAD task 'imsad' was used to detect sources in the 3 cm image, requiring a fitted Gaussian flux density > 5 sigma (3.5 mJy). All sources were then visually examined to confirm that they are genuine point sources, excluding extended emission, bright side lobes, etc. This table was created by the HEASARC in September 2014 based on CDS Catalog J_other/Ser/184.93/ file tablea1.dat. This is a service provided by NASA HEASARC .

  16. d

    Data supporting: Methodological overview and data-merging approaches in the...

    • search.dataone.org
    • datadryad.org
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Quintero; Jorge Isla; Pedro Jordano (2025). Data supporting: Methodological overview and data-merging approaches in the study of plant-frugivore interactions [Dataset]. http://doi.org/10.5061/dryad.jm63xsjb8
    Explore at:
    Dataset updated
    Jul 20, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Elena Quintero; Jorge Isla; Pedro Jordano
    Time period covered
    Jan 1, 2021
    Description

    Recording species interactions is one of the main challenges in ecological studies. Frugivory has received much attention for decades as a model for mutualisms among free-living species, and a variety of methods have been designed and developed for sampling and monitoring plant–frugivore interactions. The diversity of techniques poses an important challenge when comparing, combining or replicating results from different sources with different methodologies. With the emergence of modern techniques, such as molecular analysis or multimedia remote recorders, issues when combining data from different sources have become especially relevant. We provide an overview of all the techniques used for monitoring endozoochorous primary seed dispersal, focusing on a critical appraisal of the advantages and limitations, as well as the context-dependency nature, of the different methods. We propose five data merging approaches potentially useful to combine frugivory interactions data from different met..., We used two empirical datasets to illustrate data merging approaches, with two different organization levels. Both case studies are focused on plant–frugivore interactions taking place in the Mediterranean shrubland of Doñana National Park, Huelva, Spain. In each case study two sampling methods were used to maximise animal–plant interactions detected. The first case is an individual-based study on the avian frugivore assemblage of Pistacia lentiscus (Anacardiaceae) in El Puntal area, where monitoring cameras and DNA-barcoding were used to record interactions. Cameras methodology involved placing continuous-monitoring cameras (GoPro Hero® 7 model) facing individual plants. Forty individual plants were filmed for approximately 2 hours in several runs in different days (total of 84.5h). Any avian visitation was recorded as an interaction, yielding a total of 397 visitation records. Cameras were operative from sunrise for 2h and recording was set at maximum resolution. Data resulting from t..., pl_bc.csv - Observation matrix obtained with DNA-barcoding method in El Puntal case study with Pistacia lentiscus individual plants. pl_cam.csv - Observation matrix obtained with monitoring cameras method in El Puntal case study with Pistacia lentiscus individual plants. bc_sampling_effort.csv - Sampling effort for DNA-barcoding method in El Puntal case study at Pistacia lentiscus individual plants.

    cam_sampling_effort.csv - Sampling effort for monitoring cameras method in El Puntal case study at Pistacia lentiscus individual plants.

    hr_mn.csv - Observation matrix obtained with mist-netting method in Hato Ratón case study hr_obs.csv - Observation matrix obtained with focal observations method in Hato Ratón case study

  17. Daily Global Trends - Insights on Popularity

    • kaggle.com
    zip
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Daily Global Trends - Insights on Popularity [Dataset]. https://www.kaggle.com/datasets/thedevastator/daily-global-trends-2020-insights-on-popularity
    Explore at:
    zip(28034217 bytes)Available download formats
    Dataset updated
    Jan 16, 2023
    Authors
    The Devastator
    Description

    Daily Global Trends - Insights on Popularity

    Analyzing Crowd Behaviour and Buzz Worldwide

    By Jeffrey Mvutu Mabilama [source]

    About this dataset

    This dataset provides a comprehensive look into 2020’s top trends worldwide, with information on the hottest topics and conversations happening all around the globe. With details such as trending type, country origin, dates of interest, URLs to find further information, keywords related to the trend and more - it's an invaluable insight into what's driving society today.

    You can use this data in conjunction with other sources to get ideas for businesses or products tailored to popular desires or opinions. If you are interested in international business perspectives then this is also your go-to source; you can adjust how best to interact with people from certain countries upon learning what they hold important in terms of search engine activity.

    It also gives key insights into buzz formation by monitoring trends over many countries over different periods of time then analysing whether events tend to last longer or if their effect is short-lived and how much impact it made in terms column ‘traffic’ – number of searches for an individual topic – for the duration of its period affecting higher positions and opinion polls. In addition, marketing / advertising professionals can anticipate what content is likely best received by audiences based off previous trends related images/snippets provided with each trend/topic as well as URL links tracking users who have shown interest.. This way they become better prepared when rolling out campaigns targeted at specific regions/areas taking cultural perspective into consideration rather than just raw numbers.

    Last but not least it serves perfectly as great starting material when getting acquainted foreigners online (at least we know what conversation starters won't be awkward mentioned!) before deepening our empathetic understanding like terms used largely solely within cultures such as TV program titles… So…… question is: What will be next big thing? See for yourself.

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to use this dataset for Insights on Popularity?

    This Daily Global Trends 2020 dataset provides valuable information about trends around the world, including insights on their popularity. It can be used to identify popular topics and find ways to capitalize on them through marketing, business ideas and more. Below are some tips for how to use this data in order to gain insight into global trends and the level of popularity they have.

    • For Business Ideas: Use the URL information provided in order to research each individual trend, analyzing both when it gained traction as well as when its popularity faded away (if at all). This will give insight into transforming a brief trend into a long-lived one or making use of an existing but brief surge in interest – think new apps related to a trending topic! Combining the geographic region listed with these timeframes gives even more granular insight that could be used for product localization or regional target marketing.

    • To study Crowd Behaviour & Dynamics: Explore both country-wise and globally trending topics by looking at which countries similarly exhibit interest levels for said topics. Go further by understanding what drives people’s interest in particular subjects from different countries; here web scraping techniques can be employed using the URLs provided accompanied with basic text analysis techniques such as word clouds! This allows researchers/marketers get better feedback from customers from multiple regions, enabling smarter decisions based upon real behaviour rather than assumptions.

    • For **Building Better Products & Selling Techniques: Utilize combine Category (Business, Social etc.), Country and Related keywords mentioned with traffic figures so that you can obtain granular information about what excites people across cultures i.e ‘Food’ is popular everywhere but certain variations depending upon geo-location may not sell due need catering towards local taste buds.-For example selling frozen food that requires little preparation via supermarket chains showing parallels between nutritional requirements vs expenses incurred while shopping will drive effective sales strategy using this data set . Further combining date information also helps make predictions based upon buyers behaviour over seasons i.e buying seedless watermelons during winter season would be futile .

    • For Social & Small Talk opportunities - Incorporating recently descr...

  18. o

    Data and code for "How do households respond to job loss? Lessons from...

    • openicpsr.org
    Updated May 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asger Lau Andersen; Amalie Sofie Jensen; Niels Johannesen; Claus Thustrup Kreiner; Søren Leth-Petersen; Adam Sheridan (2022). Data and code for "How do households respond to job loss? Lessons from multiple high-frequenct data sets" [Dataset]. http://doi.org/10.3886/E170201V1
    Explore at:
    Dataset updated
    May 11, 2022
    Dataset provided by
    American Economic Association
    Authors
    Asger Lau Andersen; Amalie Sofie Jensen; Niels Johannesen; Claus Thustrup Kreiner; Søren Leth-Petersen; Adam Sheridan
    Time period covered
    2009 - 2016
    Area covered
    Denmark
    Description

    How much and through which channels do households self-insure against job loss? Combining data from a large bank and from government sources, we quantify a broad range of responses to job loss in a unified empirical framework. Cumulated over a two-year period, households reduce spending by 30% of their income loss. They mainly self-insure through adjustments of liquid balances, which account for 50% of the income loss. Other channels – spousal labor supply, private transfers, home equity extraction, mortgage refinancing, and consumer credit – contribute less to self-insurance. Both overall self-insurance and the channels vary with household characteristics in intuitive ways.

  19. H

    Replication Data for: Leveraging Large Language Models for Fuzzy String...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu Wang (2024). Replication Data for: Leveraging Large Language Models for Fuzzy String Matching in Political Science [Dataset]. http://doi.org/10.7910/DVN/A8MKLO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Yu Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Fuzzy string matching remains a key issue when political scientists combine data from different sources. Existing matching methods invariably rely on string distances, such as Levenshtein distance and cosine similarity. As such, they are inherently incapable of matching strings that refer to the same entity with different names such as ''JP Morgan'' and ''Chase Bank'', ''DPRK'' and ''North Korea'', ''Chuck Fleischmann (R)'' and ''Charles Fleischmann (R)''. In this letter, we propose to use large language models to entirely sidestep this problem in an easy and intuitive manner. Extensive experiments show that our proposed methods can improve the state of the art by as much as 39% in terms of average precision while being substantially easier and more intuitive to use by political scientists. Moreover, our results are robust against various temperatures. We further note that enhanced prompting can lead to additional performance improvements.

  20. f

    Data sources which record educational outcomes.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joanne Given; Rebecca L. Bromley; Florence Coste; Sandra Lopez-Leon; Maria Loane (2023). Data sources which record educational outcomes. [Dataset]. http://doi.org/10.1371/journal.pone.0275979.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Joanne Given; Rebecca L. Bromley; Florence Coste; Sandra Lopez-Leon; Maria Loane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data sources which record educational outcomes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
HU, Tao (2024). Joiner [Dataset]. http://doi.org/10.7910/DVN/0BM2IQ

Joiner

Explore at:
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
HU, Tao
Description

The joiner is a component often used in workflows to merge or join data from different sources or intermediate steps into a single output. In the context of Common Workflow Language (CWL), the joiner can be implemented as a step that combines multiple inputs into a cohesive dataset or output. This might involve concatenating files, merging data frames, or aggregating results from different computations.

Search
Clear search
Close search
Google apps
Main menu