50 datasets found
  1. An aggregated dataset of day 3 post-inoculation viral titer measurements...

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). An aggregated dataset of day 3 post-inoculation viral titer measurements from influenza A virus-infected ferret tissues [Dataset]. https://catalog.data.gov/dataset/an-aggregated-dataset-of-day-3-post-inoculation-viral-titer-measurements-from-influenza-a-
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    Data from influenza A virus (IAV) infected ferrets (Mustela putorius furo) provides invaluable information towards the study of novel and emerging viruses that pose a threat to human health. This gold standard animal model can recapitulate many clinical signs of infection present in IAV-infected humans, supports virus replication of human and zoonotic strains without prior adaptation, and permits evaluation of virus transmissibility by multiple modes. While ferrets have been employed in risk assessment settings for >20 years, results from this work are typically reported in discrete stand-alone publications, making aggregation of raw data from this work over time nearly impossible. Here, we describe a dataset of 333 ferrets inoculated with 107 unique IAV, conducted by a single research group (NCIRD/ID/IPB/Pathogenesis Laboratory Team) under a uniform experimental protocol. This collection of ferret tissue viral titer data on a per-individual ferret level represents a companion dataset to ‘An aggregated dataset of serially collected influenza A virus morbidity and titer measurements from virus-infected ferrets’. However, care must be taken when combining datasets at the level of individual animals (see PMID 40245007 for guidance in best practices for comparing datasets comprised of serially-collected and fixed-timepoint in vivo-generated data). See publications using and describing data for more information: Kieran TJ, Sun X, Tumpey TM, Maines TR, Belser JA. 202X. Spatial variation of infectious virus load in aggregated day 3 post-inoculation respiratory tract tissues from influenza A virus-infected ferrets. Under peer review. Kieran TJ, Sun X, Maines TR, Belser JA. 2025. Predictive models of influenza A virus lethal disease: insights from ferret respiratory tract and brain tissues. Scientific Reports, in press. Bullock TA, Pappas C, Uyeki TM, Brock N, Kieran TJ, Olsen SJ, Davis CD, Tumpey TM, Maines TR, Belser JA. 2025. The (digestive) path less traveled: influenza A virus and the gastrointestinal tract. mBio, in press. Kieran TJ, Sun X, Maines TR, Beauchemin CAA, Belser JA. 2024. Exploring associations between viral titer measurements and disease outcomes in ferrets inoculated with 125 contemporary influenza A viruses. J Virol98: e01661-23. https://doi.org/10.1038/s41597-024-03256-6 Related dataset: Kieran TJ, Sun X, Creager HM, Tumpey TM, Maine TR, Belser JA. 2025. An aggregated dataset of serial morbidity and titer measurements from influenza A virus-infected ferrets. Sci Data, 11(1):510. https://doi.org/10.1038/s41597-024-03256-6 https://data.cdc.gov/National-Center-for-Immunization-and-Respiratory-D/An-aggregated-dataset-of-serially-collected-influe/cr56-k9wj/about_data Other relevant publications for best practices on data handling and interpretation: Kieran TJ, Maines TR, Belser JA. 2025. Eleven quick tips to unlock the power of in vivo data science. PLoS Comput Biol, 21(4):e1012947. https://doi.org/10.1371/journal.pcbi.1012947 Kieran TJ, Maines TR, Belser JA. 2025. Data alchemy, from lab to insight: Transforming in vivo experiments into data science gold. PLoS Pathog, 20(8):e1012460. https://doi.org/10.1371/journal.ppat.1012460

  2. Conversations on Coding, Debugging, Storytelling

    • kaggle.com
    zip
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Conversations on Coding, Debugging, Storytelling [Dataset]. https://www.kaggle.com/datasets/thedevastator/conversations-on-coding-debugging-storytelling-s
    Explore at:
    zip(1371478 bytes)Available download formats
    Dataset updated
    Dec 1, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Conversations on Coding, Debugging, Storytelling & Science

    Conversations on Coding, Debugging, Storytelling & Science

    By Peevski (From Huggingface) [source]

    About this dataset

    The OpenLeecher/GPT4-10k dataset is a comprehensive collection of 100 diverse conversations, presented in text format, revolving around a wide range of topics. These conversations cover various domains such as coding, debugging, storytelling, and science. Aimed at facilitating training and analysis purposes for researchers and developers alike, this dataset offers an extensive array of conversation samples.

    Each conversation within this dataset delves into different subject matters related to coding techniques, debugging strategies, storytelling methods; while also exploring concepts like spatial thinking, logical thinking. Furthermore, the conversations touch upon scientific fields including chemistry, physics and biology. To add further depth to the dataset's content, it also includes discussions on the topic of law.

    By providing this rich assortment of conversations spanning across multiple domains and disciplines in one cohesive dataset format on Kaggle platform as train.csv file , it empowers users to delve into these dialogue examples for exploration and analysis effortlessly. This compilation serves as an invaluable resource for understanding various aspects of coding practices alongside stimulating scientific discussions on subjects spanning across multiple fields

    How to use the dataset

    Introduction:

    • Understanding the Dataset Structure: The dataset consists of a CSV file named 'train.csv'. When examining the file's columns using software or programming language of your choice (e.g., Python), you will notice two key columns: 'chat' and '**chat'. Both these columns contain text data representing conversations between two or more participants.

    • Exploring Different Topics: The dataset covers a vast spectrum of subjects including coding techniques, debugging strategies, storytelling methods, spatial thinking, logical thinking, chemistry, physics, biology, and law each conversation:

      • Coding Techniques: Discover discussions on various programming concepts and best practices.
      • Debugging Strategies: Explore conversations related to identifying and fixing software issues.
      • Storytelling Methods: Dive into dialogues about effective storytelling techniques in different contexts.
      • Spatial Thinking: Engage with conversations that involve developing spatial reasoning skills for problem-solving.
      • Logical Thinking: Learn from discussions focused on enhancing logical reasoning abilities related to different domains.
      • Chemistry
      • Physics
      • Biology
      • Law
    • Analyzing Conversations: leverage natural language processing (NLP) tools or techniques such as sentiment analysis print(Number of Conversations:, len(df)) together

    • Accessible Code Examples

    Maximize Training Efficiency:

    • Taking Advantage of Diversity:

    • Creating New Applications:

    Conclusion:

    Research Ideas

    • Natural Language Processing Research: Researchers can leverage this dataset to train and evaluate natural language processing models, particularly in the context of conversational understanding and generation. The diverse conversations on coding, debugging, storytelling, and science can provide valuable insights into modeling human-like conversation patterns.
    • Chatbot Development: The dataset can be utilized for training chatbots or virtual assistants that can engage in conversations related to coding, debugging, storytelling, and science. By exposing the chatbot to a wide range of conversation samples from different domains, developers can ensure that their chatbots are capable of providing relevant and accurate responses.
    • Domain-specific Intelligent Assistants: Organizations or individuals working in fields such as coding education or scientific research may use this dataset to develop intelligent assistants tailored specifically for these domains. These assistants can help users navigate complex topics by answering questions related to coding techniques, debugging strategies, storytelling methods, or scientific concepts. Overall,'train.csv' provides a rich resource for researchers and developers interested in building conversational AI systems with knowledge across multiple domains including even legal matters

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **Li...

  3. a

    Levels of obesity and inactivity related illnesses (physical illnesses):...

    • hub.arcgis.com
    • data.catchmentbasedapproach.org
    Updated Apr 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Rivers Trust (2021). Levels of obesity and inactivity related illnesses (physical illnesses): Summary (England) [Dataset]. https://hub.arcgis.com/datasets/76bef8a953c44f36b569c37d7bdec45e
    Explore at:
    Dataset updated
    Apr 7, 2021
    Dataset authored and provided by
    The Rivers Trust
    Area covered
    Description

    SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of physical illnesses that are linked with obesity and inactivity. Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to:- Asthma (in persons of all ages)- Cancer (in persons of all ages)- Chronic kidney disease (in adults aged 18+)- Coronary heart disease (in persons of all ages)- Diabetes mellitus (in persons aged 17+)- Hypertension (in persons of all ages)- Stroke and transient ischaemic attack (in persons of all ages)This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.For each of the above illnesses, the percentage of each MSOA’s population with that illness was estimated. This was achieved by calculating a weighted average based on:- The percentage of the MSOA area that was covered by each GP practice’s catchment area- Of the GPs that covered part of that MSOA: the percentage of patients registered with each GP that have that illnessThe estimated percentage of each MSOA’s population with each illness was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with each illness, within the relevant age range.For each illness, each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have that illnessB) the NUMBER of people within that MSOA who are estimated to have that illnessAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA predicted to have that illness, compared to other MSOAs. In other words, those are areas where a large number of people are predicted to suffer from an illness, and where those people make up a large percentage of the population, indicating there is a real issue with that illness within the population and the investment of resources to address that issue could have the greatest benefits.The scores for each of the 7 illnesses were added together then converted to a relative score between 1 – 0 (1 = worst, 0 = best), to give an overall score for each MSOA: a score close to 1 would indicate that an area has high predicted levels of all obesity/inactivity-related illnesses, and these are areas where the local population could benefit the most from interventions to address those illnesses. A score close to 0 would indicate very low predicted levels of obesity/inactivity-related illnesses and therefore interventions might not be required.LIMITATIONS1. GPs do not have catchments that are mutually exclusive from each other: they overlap, with some geographic areas being covered by 30+ practices. This dataset should be viewed in combination with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset to identify where there are areas that are covered by multiple GP practices but at least one of those GP practices did not provide data. Results of the analysis in these areas should be interpreted with caution, particularly if the levels of obesity/inactivity-related illnesses appear to be significantly lower than the immediate surrounding areas.2. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).3. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.4. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of obesity/inactivity-related illnesses, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of these illnesses. TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:- Health and wellbeing statistics (GP-level, England): Missing data and potential outliersDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.

  4. Multiple scenario gate entry

    • kaggle.com
    zip
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nasser Al Musalhi (2023). Multiple scenario gate entry [Dataset]. https://www.kaggle.com/datasets/nasseralmusalhi/multiple-scenario-gate-entry
    Explore at:
    zip(52500137 bytes)Available download formats
    Dataset updated
    Apr 4, 2023
    Authors
    Nasser Al Musalhi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset contain 4 videos for different entry scenario. 1- One person in and one out at same time. 2. two person in and one out at same time. 3. two person in and 2 person out at same time 4. multiple in/out at same time

    Dataset can implement the best way to practice the visitor counter to historical places, schools, hospitals ...etc.

    This dataset can be used under open access license. To cite this dataset refer to: https://doi.org/10.21123/bsj.2024.10540

  5. Dataset - What are the Machine Learning best practices reported by...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, csv, txt
    Updated Jun 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anamaria Mojica-Hanke; Anamaria Mojica-Hanke; Andrea Bayona; Mario Linares-Vásquez; Mario Linares-Vásquez; Steffen Herbold; Steffen Herbold; Fabio A. González; Andrea Bayona; Fabio A. González (2023). Dataset - What are the Machine Learning best practices reported by practitioners on Stack Exchange? [Dataset]. http://doi.org/10.5281/zenodo.8058979
    Explore at:
    csv, txt, binAvailable download formats
    Dataset updated
    Jun 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anamaria Mojica-Hanke; Anamaria Mojica-Hanke; Andrea Bayona; Mario Linares-Vásquez; Mario Linares-Vásquez; Steffen Herbold; Steffen Herbold; Fabio A. González; Andrea Bayona; Fabio A. González
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data correspond to the posts (questions and answers) retrieved by querying for posts related to the tag 'machine learning' and the phrase 'best practice(s).' The data were used as the basis for a study currently under review on discussing machine learning best practices as discussed by practitioners in question-and-answer communities such as Stack Exchange. The information from each type of post (i.e., questions and answers) is presented in multiple formats (i.e., .txt, .csv, and .xlsx).

    Answers - Variables

    • AID: Unique identification of the answer in the Q&A website.
    • ParentId: Unique identification of the question associated with the answer in the Q&A website
    • AcceptedAnswerId : In the case in which an answer is the most voted question associated with the ParentId, and it is different from the accepted answer, a different identifier from the AID is available. In the case in which the accepted question had a score lower than 1, a -1 is assigned.
    • ABody: HTML text of the answer.
    • Score: Upvotes - downvotes of the answer.
    • url_Answer: URL of the answer. The question URL can be from different websites.
    • type: best or accepted. Accepted in the case that the information belongs to the accepted answer of the ParentId question and best in the case in which it is the most voted question of the ParentId question.
    • Date: Creation date of the answer.

    Questions - Variables

    • QID: Unique identification of the question in the Q&A website.
    • AcceptedAnswerId: Unique identification of the accepted answer for a specific question in the Q&A website. In the case in which a question had a most-voted answer different from the accepted one, and the accepted one had a negative score, a -1 was assigned to the AcceptedAnswerId.
    • BestAnswerId: Unique identification of the most voted answer for a specific question in the Q&A website. In the case in which the most voted and accepted questions were the same, then a -1 was assigned to the BestAnswerId.
    • Qtitle: Title of the question.
    • QBody: HTML text of the question.
    • Score: Upvotes - downvotes of the questions.
    • QTags: Tags that are associated with each question.
    • url_question: URL of the question. The question URL can be from different websites.
    • Date: Creation date of the question

    This dataset is a subset of the Stack Exchange dump of 03.2021 (https://archive.org/details/stackexchange_20210301) in which a series of filters were applied to obtain the data used in the study.

  6. A

    Spatiotemporal data for 2019-Novel Coronavirus Covid-19 Cases and deaths

    • data.amerigeoss.org
    csv, pdf, txt
    Updated Jan 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UN Humanitarian Data Exchange (2022). Spatiotemporal data for 2019-Novel Coronavirus Covid-19 Cases and deaths [Dataset]. https://data.amerigeoss.org/it/dataset/2019-novel-coronavirus-cases
    Explore at:
    txt(23645), csv(4916), pdf(15032), txt(7422), csv(795112664)Available download formats
    Dataset updated
    Jan 4, 2022
    Dataset provided by
    UN Humanitarian Data Exchange
    Description

    Data Overview

    This repository contains spatiotemporal data from many official sources for 2019-Novel Coronavirus beginning 2019 in Hubei, China ("nCoV_2019")

    You may not use this data for commercial purposes. If there is a need for commercial use of the data, please contact Metabiota at info@metabiota.com to obtain a commercial use license.

    The incidence data are in a CSV file format. One row in an incidence file contains a piece of epidemiological data extracted from the specified source.

    The file contains data from multiple sources at multiple spatial resolutions in cumulative and non-cumulative formats by confirmation status. To select a single time series of case or death data, filter the incidence dataset by source, spatial resolution, location, confirmation status, and cumulative flag.

    Data are collected, structured, and validated by Metabiota’s digital surveillance experts. The data structuring process is designed to produce the most reliable estimates of reported cases and deaths over space and time. The data are cleaned and provided in a uniform format such that information can be compared across multiple sources. Data are collected at the time of publication in the highest geographic and temporal resolutions available in the original report.

    This repository is intended to provide a single access point for data from a wide range of data sources. Data will be updated periodically with the latest epidemiological data. Metabiota maintains a database of epidemiological information for over two thousand high-priority infectious disease events. Please contact us (info@metabiota.com) if you are interested in licensing the complete dataset.

    Cumulative vs. Non-Cumulative Incidence

    Reporting sources provide either cumulative incidence, non-cumulative incidence, or both. If the source only provides a non-cumulative incidence value, the cumulative values are inferred using prior reports from the same source. Use the CUMULATIVE FLAG variable to subset the data to cumulative (TRUE) or non-cumulative (FALSE) values.

    Case Confirmation Status

    The incidence datasets include the confirmation status of cases and deaths when this information is provided by the reporting source. Subset the data by the CONFIRMATION_STATUS variable to either TOTAL, CONFIRMED, SUSPECTED, or PROBABLE to obtain the data of your choice.

    Total incidence values include confirmed, suspected, and probable incidence values. If a source only provides suspected, probable, or confirmed incidence, the total incidence is inferred to be the sum of the provided values. If the report does not specify confirmation status, the value is included in the "total" confirmation status value.

    The data provided under the "Metabiota Composite Source" often does not include suspected incidence due to inconsistencies in reporting cases and deaths with this confirmation status.

    Outcome - Cases vs. Deaths

    The incidence datasets include cases and deaths. Subset the data to either CASE or DEATH using the OUTCOME variable. It should be noted that deaths are included in case counts.

    Spatial Resolution

    Data are provided at multiple spatial resolutions. Data should be subset to a single spatial resolution of interest using the SPATIAL_RESOLUTION variable.

    Information is included at the finest spatial resolution provided to the original epidemic report. We also aggregate incidence to coarser geographic resolutions. For example, if a source only provides data at the province-level, then province-level data are included in the dataset as well as country-level totals. Users should avoid summing all cases or deaths in a given country for a given date without specifying the SPATIAL_RESOLUTION value. For example, subset the data to SPATIAL_RESOLUTION equal to “AL0” in order to view only the aggregated country level data.

    There are differences in administrative division naming practices by country. Administrative levels in this dataset are defined using the Google Geolocation API (https://developers.google.com/maps/documentation/geolocation/). For example, the data for the 2019-nCoV from one source provides information for the city of Beijing, which Google Geolocations indicates is a “locality.” Beijing is also the name of the municipality where the city Beijing is located. Thus, the 2019-nCoV dataset includes rows of data for both the city Beijing, as well as the municipality of the same name. If additional cities in the Beijing municipality reported data, those data would be aggregated with the city Beijing data to form the municipality Beijing data.

    Sources

    Data sources in this repository were selected to provide comprehensive spatiotemporal data for each outbreak. Data from a specific source can be selected using the SOURCE variable.

    In addition to the original reporting sources, Metabiota compiles multiple sources to generate the most comprehensive view of an outbreak. This compilation is stored in the database under the source name “Metabiota Composite Source.” The purpose of generating this new view of the outbreak is to provide the most accurate and precise spatiotemporal data for the outbreak. At this time, Metabiota does not incorporate unofficial - including media - sources into the “Metabiota Composite Source” dataset.

    Quality Assurance

    Data are collected by a team of digital surveillance experts and undergo many quality assurance tests. After data are collected, they are independently verified by at least one additional analyst. The data also pass an automated validation program to ensure data consistency and integrity.

    NonCommercial Use License

    • Creative Commons License Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

    • This is a human-readable summary of the Legal Code.

    • You are free:

      to Share — to copy, distribute and transmit the work to Remix — to adapt the work

    • Under the following conditions:

      Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

      Noncommercial — You may not use this work for commercial purposes.

      Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

    • With the understanding that:

      Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.

      Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

      Other Rights — In no way are any of the following rights affected by the license: Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; The author's moral rights; Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.

    For details and the full license text, see http://creativecommons.org/licenses/by-nc-sa/3.0/

    Liability

    Metabiota shall in no event be liable for any decision taken by the user based on the data made available. Under no circumstances, shall Metabiota be liable for any damages (whatsoever) arising out of the use or inability to use the database. The entire risk arising out of the use of the database remains with the user.

  7. W

    Data from: JAWS: Justified AWS-like data through workflow enhancements that...

    • cloud.csiss.gmu.edu
    • data.nasa.gov
    html
    Updated Jan 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2020). JAWS: Justified AWS-like data through workflow enhancements that ease access and add scientific value [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/jaws-justified-aws-like-data-through-workflow-enhancements-that-ease-access-and-add-scient
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jan 29, 2020
    Dataset provided by
    United States
    Description

    Automated Weather Station and AWS-like networks are the primary source of surface-level meteorological data in remote polar regions. These networks have developed organically and independently, and deliver data to researchers in idiosyncratic ASCII formats that hinder automated processing and intercomparison among networks. Moreover, station tilt causes significant biases in polar AWS measurements of radiation and wind direction. Researchers, network operators, and data centers would benefit from AWS-like data in a common format, amenable to automated analysis, and adjusted for known biases. This project addresses these needs by developing a scientific software workflow called "Justified AWS" (JAWS) to ingest Level 2 (L2) data in the multiple formats now distributed, harmonize it into a common format, and deliver value-added Level 3 (L3) output suitable for distribution by the network operator, analysis by the researcher, and curation by the data center. Polar climate researchers currently face daunting problems including how to easily: 1. Automate analysis (subsetting, statistics, unit conversion) of AWS-like L2 ASCII data. 2. Combine or intercompare data and data quality from among unharmonized L2 datasets. 3. Adjust L2 data for biases such as AWS tilt angle and direction. JAWS addresses these common issues by harmonizing AWS L2 data into a common format, and applying accepted methods to quantify quality and estimate biases. Specifically, JAWS enables users and network operators to 1. Convert L2 data (usually ASCII tables) into a netCDF-based L3 format compliant with metadata conventions (Climate-Forecast and ACDD) that promote automated discovery and analysis. 2. Include value-added L3 features like the Retrospective, Iterative, Geometry-Based (RIGB) tilt angle and direction corrections, solar angles, and standardized quality flags. 3. Provide a scriptable API to extend the initial L2-to-L3 conversion to newer AWS-like networks and instruments. Polar AWS network experts and NSIDC DAAC personnel, each with decades of experience, will help guide and deliberate the L3 conventions implemented in Stages 2-3. The project will start on July 1, 2017 at entry Technology Readiness Level 3 and will exit on June 30, 2019 at TRL 6. JAWS is now a heterogeneous collection of scripts and methods developed and validated at UCI over the past 15 years. At exit, JAWS will comprise three modular stages written in or wrapped by Python, installable by Conda: Stage 1 ingests and translates L2 data into netCDF. Stage 2 annotates the netCDF with CF and ACDD metadata. Stage 3 derives value-added scientific and quality information. The labor-intensive tasks include turning our heterogeneous workflow into a robust, standards-compliant, extensible workflow with an API based on best practices of modern scientific information systems and services. Implementation of Stages 1-2 may be straightforward though tedious due to the menagerie of L2 formats, instruments, and assumptions. The RIGB component of Stage 3 requires ongoing assimilation of ancillary NASA data (CERES, AIRS) and use of automated data transfer protocols (DAP, THREDDS). The immediate target recipient elements are polar AWS network managers, users, and data distributors. L2 borehole data suffers from similar interoperability issues, as does non-polar AWS data. Hence our L3 format will be extensible to global AWS and permafrost networks. JAWS will increase in situ data accessibility and utility, and enable new derived products (both are AIST goals). The PI is a long-standing researcher, open source software developer, and educator who understands obstacles to harmonizing disparate datasets with NASA interoperability recommendations. Our team participates in relevant geoscience communities, including ESDS working groups, ESIP, AGU, and EarthCube.

  8. Dataset to Practice SVM, KNN and PCA

    • kaggle.com
    Updated Aug 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Kumar (2022). Dataset to Practice SVM, KNN and PCA [Dataset]. https://www.kaggle.com/datasets/piyushkr101200/nn-assign1-2ddata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 29, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Piyush Kumar
    Description

    This dataset is from a college Assignment from NIT-Bhopal-India to practice ML classification techniques- 1. Support Vector Machine 2. K-Nearest Neighbour Classifier 3. Support Vector Machine

    You can try the Tasks yourself Task 1: Using one of the training data set, train the SVM classifiers that separate the two classes. Classify the test data set using this SVM classifier. Compute the classification error and confusion matrix. Task 2: Using one of the training data set, predict the class label of test data points using K-Nearest Neighbour classifier. Compute the classification error and confusion matrix. Task 3: Import one of the training data file and test data file. Combine the dataset from both the files and apply PCA to reduce the dimension of the dataset from 2 to 1.

  9. 2

    IFS

    • datacatalogue.ukdataservice.ac.uk
    Updated Mar 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IFF Research (2022). IFS [Dataset]. http://doi.org/10.5255/UKDA-SN-7281-2
    Explore at:
    Dataset updated
    Mar 3, 2022
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    IFF Research
    Time period covered
    Jan 1, 2010
    Area covered
    United Kingdom
    Description

    The Infant Feeding Survey (IFS) has been carried out every five years since 1975, in order to establish information about infant feeding practices. Government policy in the United Kingdom has consistently supported breastfeeding as the best way of ensuring a healthy start for infants and of promoting women's health. Current guidance on infant feeding is as follows:

    • breastmilk is the best form of nutrition for infants;
    • exclusive breastfeeding is recommended for around the first six months (26 weeks) of an infant's life;
    • infant formula is the only recommended alternative to breastfeeding for babies who are under 12 months old;
    • around six months is the recommended age for the introduction of solid foods for infants, whether breastfed or fed on breastmilk substitutes;
    • breastfeeding (and/or breastmilk substitutes) should continue beyond the first six months, along with appropriate types and amounts of solid foods;
    • mothers who are unable to, or choose not to, follow these recommendations should be supported to optimise their infants' nutrition.
    Since the IFS began, the content of the survey has evolved to reflect the prevailing government policy agenda, while recognising the importance of maintaining consistency over time to allow comparison and trend analysis. The first IFS in 1975 took place in England and Wales only. From 1980 the survey covered Scotland, while from 1990 Northern Ireland was also included. The 2005 survey was the first to provide separate estimates for England, Wales, Scotland and Northern Ireland, as well as for the UK as a whole, and to provide estimates of exclusive breast-feeding (where the baby is given only breast milk, no other liquids or solids).

    Further information about the IFS series may be found on the Health and Social Care Information Centre website (search for 'Infant Feeding Survey').

    The UK Data Archive holds IFS data from 1985 onwards. A separate survey, Infant Feeding in Asian Families, 1994-1996, covering England only, is held under SN 3759.

    The 2010 IFS was based on an initial representative sample of mothers who were selected from all UK births registered during August and October 2010. Three stages of data collection were conducted, with Stage 1 being carried out when babies were around 4-10 weeks old, Stage 2 when they were 4-6 months old, and Stage 3 when they were 8-10 months old. A total of 10,768 mothers completed and returned all three questionnaires. For the first time in 2010, additional questions were included alongside the main Stage 2 questionnaire for mothers of multiple births.

    Users should note that the UK Data Archive study currently includes questionnaire data from Stages 1, 2 and 3 and the multiple births data, with Excel data tables relating to survey methodology and sampling error.

    The main aims of the 2010 survey were broadly similar to previous IFS, and were as follows:
    • to establish how infants born in 2010 were being fed and to provide national figures on the incidence, prevalence and duration of breastfeeding and exclusive breastfeeding;
    • to examine trends in infant feeding practices over recent years, in particular to compare changes between 2005 and 2010;
    • to investigate variations in feeding practices among different socio-demographic groups and the factors associated with mothers' feeding intentions and with the feeding practices adopted in the early weeks;
    • to establish the age at which solid foods are introduced and to examine practices associated with introducing solid foods up to 9 months;
    • to measure the proportion of mothers who smoke and drink during pregnancy, and to look at the patterns of smoking and drinking behaviour before, during and after the birth; and
    • to measure levels of awareness of and registration on the Healthy Start scheme and understand how Healthy Start vouchers are being used. (The Healthy Start scheme provides support for mothers in receipt of certain benefits and tax credits. Vouchers are provided that can be spent on milk, infant formula, fresh fruit or vegetables for pregnant women and children under 4 years old and coupons are also available for free vitamins for pregnant women, mothers and babies.)
    For the second edition (July 2013), data and documentation from Stage 3 of the survey were added to the study.

    Linking files in Stata - a warning
    Stata users should note that the case identifier variable (ID) number structure may differ across datasets for all three stages. The letter prefixing the ID number may be upper case in one dataset and lower case in another. This is related to whether an online, face-to-face, CATI or postal route was used to complete the questionnaire- for example one respondent has the ID number 'E00157' in Stage 1 and Stage 2, but 'e00157' in Stage 3. Apart from the upper/lower case prefix letter, the ID number is exactly the same. However, the Stata command used to link the datasets (the 'merge' function) requires an exact match on the matching variable (ID), so if the prefix letter is lower case in one stage and upper case in another stage, Stata will reject the link and assume those cases are different respondents. At present, 441 cases are affected by this. The original datasets were compiled in SPSS, which does not distinguish between the upper and lower case prefix letters while merging datasets.

    Note from the depositor, September 2016:
    The depositor has sent the following note to data users: "An error in the Stage 1 dataset has been identified. Ninety-nine mothers stated that it was their first birth (Q3), that they had a total of 1 child (Q4) but then also selected the option to say that they had a multiple birth (Q5). The Stage 2 and Stage 3 data are unaffected and no figures in the published report or tables are affected. Users analysing the Stage 1 dataset should take this anomaly into account when including multiple births data in Stage 1 in their analysis."

  10. d

    Data from: Statistical issues in randomized trials of cancer screening

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Statistical issues in randomized trials of cancer screening [Dataset]. https://catalog.data.gov/dataset/statistical-issues-in-randomized-trials-of-cancer-screening
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background The evaluation of randomized trials for cancer screening involves special statistical considerations not found in therapeutic trials. Although some of these issues have been discussed previously, we present important recent and new methodologies. Methods Our emphasis is on simple approaches. Results We make the following recommendations: (1) Use death from cancer as the primary endpoint, but review death records carefully and report all causes of death (2) Use a simple "causal" estimate to adjust for nonattendance and contamination occurring immediately after randomization (3) Use a simple adaptive estimate to adjust for dilution in follow-up after the last screen Conclusion The proposed guidelines combine recent methodological work on screening endpoints and noncompliance/contamination with a new adaptive method to adjust for dilution in a study where follow-up continues after the last screen. These guidelines ensure good practice in the design and analysis of randomized trials of cancer screening.

  11. o

    Data from: CLARISSA Cash Plus Social Protection intervention: quantitative...

    • ordo.open.ac.uk
    png
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neil Howard; Keetie Roelen; Giel Ton; Mauricio Espinoza; Afrin Aktar; Saklain Al Mamum (2024). CLARISSA Cash Plus Social Protection intervention: quantitative and qualitative data [Dataset]. http://doi.org/10.21954/ou.rd.26061106.v3
    Explore at:
    pngAvailable download formats
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    The Open University
    Authors
    Neil Howard; Keetie Roelen; Giel Ton; Mauricio Espinoza; Afrin Aktar; Saklain Al Mamum
    License

    Attribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
    License information was derived automatically

    Description

    The CLARISSA Cash Plus intervention represented an innovative social protection scheme for tackling social ills, including the worst forms of child labour (WFCL). A universal and unconditional ‘cash plus’ programme, it combined community mobilisation, case work, and cash transfers (CTs). It was implemented in a high-density, low-income neighbourhood in Dhaka to build individual, family, and group capacities to meet needs. This, in turn, was expected to lead to a corresponding decrease in deprivation and community-identified social issues that negatively affect wellbeing, including WFCL. Four principles underpinned the intervention: Unconditionality, Universality, Needs-centred and people-led, and Emergent and open-ended.The intervention took place in Dhaka – North Gojmohol – over a 27-month period, between October 2021 and December 2023, to test and study the impact of providing unconditional and people‑led support to everyone in a community. Cash transfers were provided between January and June 2023 in monthly instalments, plus one investment transfer in September 2023. A total of 1,573 households received cash, through the Upay mobile financial service. Cash was complemented by a ‘plus’ component, implemented between October 2021 and December 2023. Referred to as relational needs-based community organising (NBCO), a team of 20 community mobilisers (CMs) delivered case work at the individual and family level and community mobilisation at the group level. The intervention was part of the wider CLARISSA programme, led by the Institute of Development Studies (IDS) and funded by UK’s Foreign, Commonwealth & Development Office (FCDO). The intervention was implemented by Terre des hommes (Tdh) in Bangladesh and evaluated in collaboration with the BRAC Institute of Governance and Development (BIGD) and researchers from the University of Bath and the Open University, UK.The evaluation of the CLARISSA Social Protection pilot was rooted in contribution analysis that combined multiple methods over more than three years in line with emerging best practice guidelines for mixed methods research on children, work, and wellbeing. Quantitative research included bi-monthly monitoring surveys administered by the project’s community mobilisers (CMs), including basic questions about wellbeing, perceived economic resilience, school attendance, etc. This was complimented by baseline, midline, and endline surveys, which collected information about key outcome indicators within the sphere of influence of the intervention, such as children’s engagement with different forms of work and working conditions, with schooling and other activities, household living conditions and sources of income, and respondents’ perceptions of change. Qualitative tools were used to probe topics and results of interest, as well as impact pathways. These included reflective diaries written by the community mobilisers; three rounds of focus group discussions (FGDs) with community members; three rounds of key informant interviews (KIIs) with members of case study households; and long-term ethnographic observation.Quantitative DataThe quantitative evaluation of the CLARISSA Cash Plus intervention involved several data collection methods to gather information about household living standards, children’s education and work, and social dynamics. The data collection included a pre-intervention census, four periodic surveys, and 13 rounds of bi-monthly monitoring surveys, all conducted between late 2020 and late 2023. Details of each instrument are as follows:Census: Conducted in October/November 2020 in the target neighbourhood of North Gojmohol (n=1,832) and the comparison neighbourhood of Balurmath (n=2,365)Periodic surveys: Baseline (February 2021, n=752 in North Gojmohol), Midline 1 (before cash) (October 2022, n=771 in North Gojmohol), Midline 2 (after 6 rounds of cash) (July 2023, n=769 in North Gojmohol), and Endline (December 2023, n=750 in North Gojmohol and n=773 in Balumath)Bi-monthly monitoring data (13 rounds): Conducted between December 2021 and December 2023 in North Gojmohol (average of 1,400 households per round)The present repository summarizes this information, organized as follows:1.1 Bimonthly survey (household): Panel dataset comprising 13 rounds of bi-monthly monitoring data at the household level (average of 1,400 households per round, total of 18,379 observations)1.2 Bimonthly survey (child): Panel dataset comprising 13 rounds of bi-monthly monitoring data at the child level (aged 5 to 16 at census) (average of 940 children per round, total of 12,213 observations)2.1 Periodic survey (household): Panel dataset comprising 5 periodic surveys (census, baseline, midline 1, midline 2, endline) at the household level (average of 750 households per period, total of 3,762 observations)2.2 Periodic survey (child): Panel dataset comprising 4 periodic surveys (baseline, midline 1, midline 2, endline) at the child level (average of 3,100 children per period, total of 12,417 observations)3.0 Balurmat - North Gojmohol panel: Balanced panel dataset comprising 558 households in North Gojmohol and 773 households in Balurmath, observed both at 2020 census and 2023 endline (total of 2,662 observations)4.0 Questionnaires: Original questionnaires for all datasetsAll datasets are provided in Stata format (.dta) and Excel format (.xlsx) and are accompanied by their respective dictionary in Excel format (.xlsx).Qualitative DataThe qualitative study was conducted in three rounds: the first round of IDIs and FGDs took place between December 2022 and January 2023; the second round took place from April to May 2023; and the third round took place from November to December 2023. KIIs were taken during the 2nd round of study in May 2023.The sample size by round and instrument type is shown below:RoundsIDIs with childrenIDIs with parentsIDIs with CMsFGDsKIIs1st Round (12/2022 – 01/2023)3026-06-2nd Round ( 04/2023 – 05/2023)3023-06053rd Round (11/2023 – 12/2023)26250307-The files in this archive contain the qualitative data and include six types of transcripts:· 1.1 Interviews with children in case study households (IDI): 30 families in round 1, 30 in round 2, and 26 in round 3· 1.2 Interviews with parents in case study households (IDI): 26 families in round 1, 23 in round 2, and 25 in round 3· 1.3 Interviews with community mobiliser (IDI): 3 CM in round 3· 2.0 Key informant interviews (KII): 5 in round 2· 3.0 Focus group discussions (FGD): 6 in round 1, 6 in round 2, and 7 in round 3· 4.0 Community mobiliser micro-narratives (556 cases)Additionally, this repository includes a comprehensive list of all qualitative data files ("List of all qualitative data+MC.xlsx").

  12. f

    Table_2_Integrating Peak Colocalization and Motif Enrichment Analysis for...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Feb 21, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dolfini, Diletta; Mantovani, Roberto; Ronzio, Mirko; Zambelli, Federico; Pavesi, Giulio (2020). Table_2_Integrating Peak Colocalization and Motif Enrichment Analysis for the Discovery of Genome-Wide Regulatory Modules and Transcription Factor Recruitment Rules.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000479625
    Explore at:
    Dataset updated
    Feb 21, 2020
    Authors
    Dolfini, Diletta; Mantovani, Roberto; Ronzio, Mirko; Zambelli, Federico; Pavesi, Giulio
    Description

    Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-Seq) has opened new avenues of research in the genome-wide characterization of regulatory DNA-protein interactions at the genetic and epigenetic level. As a consequence, it has become the de facto standard for studies on the regulation of transcription, and literally thousands of data sets for transcription factors and cofactors in different conditions and species are now available to the scientific community. However, while pipelines and best practices have been established for the analysis of a single experiment, there is still no consensus on the best way to perform an integrated analysis of multiple datasets in the same condition, in order to identify the most relevant and widespread regulatory modules composed by different transcription factors and cofactors. We present here a computational pipeline for this task, that integrates peak summit colocalization, a novel statistical framework for the evaluation of its significance, and motif enrichment analysis. We show examples of its application to ENCODE data, that led to the identification of relevant regulatory modules composed of different factors, as well as the organization on DNA of the binding motifs responsible for their recruitment.

  13. E

    ICES Phytobenthos community dataset

    • erddap.eurobis.org
    • obis.org
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinto, ICES Phytobenthos community dataset [Dataset]. https://erddap.eurobis.org/erddap/info/DOME-Phytobenthos/index.html
    Explore at:
    Dataset authored and provided by
    Pinto
    Time period covered
    Jul 14, 2007 - Oct 26, 2016
    Area covered
    Variables measured
    time, aphia_id, latitude, TimeOfDay, longitude, DayCollected, MaximumDepth, MinimumDepth, BasisOfRecord, YearCollected, and 3 more
    Description

    Phytobenthos community data, a large portion of the data held are monitoring data submitted for the OSPAR CEMP and HELCOM COMBINE monitoring programmes and therefore follow specific monitoring programme guidelines. AccConID=21 AccConstrDescription=This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licenses offered. Recommended for maximum dissemination and use of licensed materials. AccConstrDisplay=This dataset is licensed under a Creative Commons Attribution 4.0 International License. AccConstrEN=Attribution (CC BY) AccessConstraint=Attribution (CC BY) AccessConstraints=ICES Data Policy: https://www.ices.dk/data/Documents/ICES-Data-policy.pdf Acronym=None added_date=2018-04-16 09:46:22.350000 BrackishFlag=0 CDate=2017-07-10 cdm_data_type=Other CheckedFlag=0 Citation=ICES Environmental Database (DOME), Phytobenthos community. Available online at http://dome.ices.dk. ICES, Copenhagen. Consulted on yyyy-mm-dd. Comments=None ContactEmail=None Conventions=COARDS, CF-1.6, ACDD-1.3 CurrencyDate=None DasID=5755 DasOrigin=Monitoring: field survey DasType=Data DasTypeID=1 DateLastModified={'date': '2025-04-25 01:33:51.812809', 'timezone_type': 1, 'timezone': '+02:00'} DescrCompFlag=0 DescrTransFlag=0 Easternmost_Easting=27.92 EmbargoDate=None EngAbstract=Phytobenthos community data, a large portion of the data held are monitoring data submitted for the OSPAR CEMP and HELCOM COMBINE monitoring programmes and therefore follow specific monitoring programme guidelines. EngDescr=Data are quality assured using internal and external programmes. For example, the national laboratories that take part in monitoring programmes related to contaminants and biological effects that submit information to ICES, subscribe to the Quality Assurance of Information for Marine Environmental Monitoring in Europe (QUASIMEME) or the Biological Effects Quality Assurance in Monitoring Programmes (BEQUALM) inter-laboratory proficiency-testing schemes and perform internal quality assurance. ICES operates through a network of scientific expert and advisory groups. These groups, and the processes they feed into, act as a quality check on the marine evidence, both in terms of how the evidence was gathered and how the evidence has been subsequently treated. The groups, in cooperation with regional programmes under the Regional Sea Conventions, set standards and guidelines for the collection, transmission and analysis of these data. In addition, the ICES Secretariat provides supplementary quality assurance through its internal programmes related to the different types of marine data collection datasets, which is fedback to the participating national and regional programmes. These internal and external programmes and procedures have been established over a period of 30 or more years. They continue to evolve and strive to reflect the best available practices in the collection and treatment of marine data relevant to the ICES community. FreshFlag=0 geospatial_lat_max=62.88 geospatial_lat_min=54.28 geospatial_lat_units=degrees_north geospatial_lon_max=27.92 geospatial_lon_min=11.33 geospatial_lon_units=degrees_east infoUrl=None InputNotes=None institution=ICES License=https://creativecommons.org/licenses/by/4.0/ Lineage=Prior to publication data undergo quality control checked which are described in https://github.com/EMODnet/EMODnetBiocheck?tab=readme-ov-file#understanding-the-output MarineFlag=1 modified_sync=2021-02-04 00:00:00 Northernmost_Northing=62.88 OrigAbstract=None OrigDescr=None OrigDescrLang=None OrigDescrLangNL=None OrigLangCode=None OrigLangCodeExtended=None OrigLangID=None OrigTitle=None OrigTitleLang=None OrigTitleLangCode=None OrigTitleLangID=None OrigTitleLangNL=None Progress=In Progress PublicFlag=1 ReleaseDate=Apr 16 2018 12:00AM ReleaseDate0=2018-04-16 RevisionDate=None SizeReference=None sourceUrl=(local files) Southernmost_Northing=54.28 standard_name_vocabulary=CF Standard Name Table v70 StandardTitle=ICES Phytobenthos community dataset StatusID=1 subsetVariables=ScientificName,BasisOfRecord,YearCollected,MonthCollected,DayCollected,aphia_id TerrestrialFlag=0 time_coverage_end=2016-10-26T01:00:00Z time_coverage_start=2007-07-14T01:00:00Z UDate=2025-04-17 VersionDate=None VersionDay=None VersionMonth=None VersionName=None VersionYear=None VlizCoreFlag=1 Westernmost_Easting=11.33

  14. O

    GDR Data Management and Best Practices for Submitters and Curators

    • data.openei.org
    • gdr.openei.org
    • +2more
    image_document +1
    Updated Mar 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jon Weers; Nicole Taverna; Jay Huggins; RJ Scavo; Jon Weers; Nicole Taverna; Jay Huggins; RJ Scavo (2021). GDR Data Management and Best Practices for Submitters and Curators [Dataset]. https://data.openei.org/submissions/6476
    Explore at:
    website, image_documentAvailable download formats
    Dataset updated
    Mar 31, 2021
    Dataset provided by
    Open Energy Data Initiative (OEDI)
    National Renewable Energy Laboratory
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    Authors
    Jon Weers; Nicole Taverna; Jay Huggins; RJ Scavo; Jon Weers; Nicole Taverna; Jay Huggins; RJ Scavo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Resources for GDR data submitters and curators, including training videos, step-by-step guides on data submission, and detailed documentation of the GDR. The Data Management and Submission Best Practices document also contains API access and metadata schema information for developers interested in harvesting GDR metadata for federation or inclusion in their local catalogs.

  15. d

    Global Ocean Colour for Carbon Cycle Research (full product set)

    • search.dataone.org
    • doi.pangaea.de
    Updated Jan 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GlobColour; Observation de la Terre - Environnement (ACRI-ST), Sophia Antipolis; ACRI-ST (2018). Global Ocean Colour for Carbon Cycle Research (full product set) [Dataset]. http://doi.org/10.1594/PANGAEA.695832
    Explore at:
    Dataset updated
    Jan 5, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    GlobColour; Observation de la Terre - Environnement (ACRI-ST), Sophia Antipolis; ACRI-ST
    Time period covered
    Sep 1, 1997 - Dec 31, 2007
    Description

    In 2005, the International Ocean Colour Coordinating Group (IOCCG) convened a working group to examine the state of the art in ocean colour data merging, which showed that the research techniques had matured sufficiently for creating long multi-sensor datasets (IOCCG, 2007). As a result, ESA initiated and funded the DUE GlobColour project (http://www.globcolour.info/) to develop a satellite based ocean colour data set to support global carbon-cycle research. It aims to satisfy the scientific requirement for a long (10+ year) time-series of consistently calibrated global ocean colour information with the best possible spatial coverage. This has been achieved by merging data from the three most capable sensors: SeaWiFS on GeoEye's Orbview-2 mission, MODIS on NASA's Aqua mission and MERIS on ESA's ENVISAT mission. In setting up the GlobColour project, three user organisations were invited to help. Their roles are to specify the detailed user requirements, act as a channel to the broader end user community and to provide feedback and assessment of the results. The International Ocean Carbon Coordination Project (IOCCP) based at UNESCO in Paris provides direct access to the carbon cycle modelling community's requirements and to the modellers themselves who will use the final products. The UK Met Office's National Centre for Ocean Forecasting (NCOF) in Exeter, UK, provides an understanding of the requirements of oceanography users, and the IOCCG bring their understanding of the global user needs and valuable advice on best practice within the ocean colour science community. The three year project kicked-off in November 2005 under the leadership of ACRI-ST (France). The first year was a feasibility demonstration phase that was successfully concluded at a user consultation workshop organised by the Laboratoire d'Océanographie de Villefranche, France, in December 2006. Error statistics and inter-sensor biases were quantified by comparison with insitu measurements from moored optical buoys and ship based campaigns, and used as an input to the merging. The second year was dedicated to the production of the time series. In total, more than 25 Tb of input (level 2) data have been ingested and 14 Tb of intermediate and output products created, with 4 Tb of data distributed to the user community. Quality control (QC) is provided through the Diagnostic Data Sets (DDS), which are extracted sub-areas covering locations of in-situ data collection or interesting oceanographic phenomena. This Full Product Set (FPS) covers global daily merged ocean colour products in the time period 1997-2006 and is also freely available for use by the worldwide science community at http://www.globcolour.info/data_access_full_prod_set.html. The GlobColour service distributes global daily, 8-day and monthly data sets at 4.6 km resolution for, chlorophyll-a concentration, normalised water-leaving radiances (412, 443, 490, 510, 531, 555 and 620 nm, 670, 681 and 709 nm), diffuse attenuation coefficient, coloured dissolved and detrital organic materials, total suspended matter or particulate backscattering coefficient, turbidity index, cloud fraction and quality indicators. Error statistics from the initial sensor characterisation are used as an input to the merging methods and propagate through the merging process to provide error estimates for the output merged products. These error estimates are a key component of GlobColour as they are invaluable to the users; particularly the modellers who need them in order to assimilate the ocean colour data into ocean simulations. An intensive phase of validation has been undertaken to assess the quality of the data set. In addition, inter-comparisons between the different merged datasets will help in further refining the techniques used. Both the final products and the quality assessment were presented at a second user consultation in Oslo on 20-22 November 2007 organised by the Norwegian Institute for Water Research (NIVA); presentations are available on the GlobColour WWW site. On request of the ESA Technical Officer for the GlobColour project, the FPS data set was mirrored in the PANGAEA data library.

  16. R

    Monarch Butterfly Classification Dataset

    • universe.roboflow.com
    zip
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Cole (2023). Monarch Butterfly Classification Dataset [Dataset]. https://universe.roboflow.com/scott-cole-a3ty4/monarch-butterfly-classification/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset authored and provided by
    Scott Cole
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Monarch Butterfly
    Description

    Monarch Butterfly Classification

    The Monarch Butterfly Classification project is an advanced deep learning model designed to classify images of Monarch butterflies. With its cutting-edge technology and high accuracy, this model enables accurate identification and categorization of Monarch butterflies, aiding in research, conservation efforts, and educational initiatives.

    Key Features

    • Accurate Classification: The Monarch Butterfly Classification model utilizes state-of-the-art deep learning algorithms to accurately classify images of Monarch butterflies.

    • Versatile Use Cases: This powerful model has diverse applications, ranging from scientific research and conservation efforts to citizen science projects and environmental education programs.

    • Easy Integration: The Monarch Butterfly Classification model can be seamlessly integrated into existing platforms, apps, or websites, making it accessible to many users and enabling them to contribute effortlessly to butterfly monitoring.

    • User-Friendly Interface: We provide a user-friendly interface/API that allows users to easily interact with the model, upload images, and obtain instant classification results.

    Getting Started

    To get started with the Monarch Butterfly Classification project, follow these simple steps:

    1. Clone this repository and navigate to the project directory.
    2. Install the required dependencies using the provided requirements.txt file.
    3. Prepare your dataset by organizing images of Monarch butterflies in the desired structure.
    4. Train the classification model using the provided script and your prepared dataset.
    5. Once trained, you can use the trained model to classify new images by running the classification script and providing the path to the image you want to classify.
    6. Receive the predicted class label for the image.

    For detailed documentation and tutorials on using Roboflow and the Monarch Butterfly Classification model, please refer to docs.roboflow.com

    Contribution Guidelines

    We welcome contributions from the open-source community to enhance the Monarch Butterfly Classification project. If you're interested in contributing, please follow the guidelines outlined in [CONTRIBUTING.md] and submit your pull requests.

    License

    This project is licensed under the [Roboflow License]. For more information, see the [LICENSE] file provided by Roboflow.

    Contact Information

    For any questions, suggestions, or collaborations, please reach out to us at savetheworld at 150left.com

    Congratulations if you have made it this far! 🥳

    🎁🎁🎁 10 suggestions for trying out the Monarch Butterfly Classification model and contributing to its success:

    1. "Unveil the captivating world of Monarch butterflies with our powerful classification model. Join us in exploring their beauty and contributing to important research and conservation efforts."

    2. "Take a leap into the realm of Monarch butterflies with our cutting-edge classification model. Let's work together to protect these magnificent creatures and their habitats."

    3. "Calling all nature enthusiasts and citizen scientists! Embrace the Monarch Butterfly Classification model and make a real difference in our understanding of these delicate pollinators."

    4. "Unlock the wonders of Monarch butterflies with our state-of-the-art classification model. Join us in unraveling their secrets and advocating for their conservation."

    5. "Become a Monarch detective! Empower yourself with our classification model and contribute to the preservation of these iconic butterflies. Together, we can protect their future."

    6. "Join the Monarch Butterfly Classification community and contribute to the world of scientific research. Help us understand and safeguard these remarkable creatures for generations to come."

    7. "Immerse yourself in the world of Monarch butterflies and experience the joy of accurate classification. Let's come together to protect these majestic pollinators and ensure their survival."

    8. "Make a lasting impact on butterfly conservation by using our Monarch Butterfly Classification model. Every classification counts in our mission to preserve these awe-inspiring creatures."

    9. "Inspire others to appreciate the beauty of Monarch butterflies. Share your findings with our classification model and play a vital role in raising awareness and fostering conservation efforts."

    10. "Step into the realm of Monarch butterflies and contribute to groundbreaking research. Try our classification model and join us in safeguarding these enchanting creatures and their habitats."

  17. a

    Home For Everyone Tracker Open Data

    • housing-data-portal-boise.hub.arcgis.com
    • opendata.cityofboise.org
    • +1more
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Boise, Idaho (2023). Home For Everyone Tracker Open Data [Dataset]. https://housing-data-portal-boise.hub.arcgis.com/datasets/home-for-everyone-tracker-open-data
    Explore at:
    Dataset updated
    Jul 5, 2023
    Dataset authored and provided by
    City of Boise, Idaho
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A Home for Everyone is the City of Boise’s (city) initiative to address needs in the community by supporting the development and preservation of housing affordable to residents on Boise budgets. A Home for Everyone has three core goals: produce new homes affordable at 60% of area median income, create permanent supportive housing for households experiencing homelessness, and preserve home affordable at 80% of area median income. This dataset includes information about all homes that count toward the city’s Home for Everyone goals.

    While the “produce affordable housing” and “create permanent supportive housing” goals are focused on supporting the development of new housing, the preservation goal is focused on maintaining existing housing affordable. As a result, many of the data fields related to new development are not relevant to preservation projects. For example, zoning incentives are only applicable to new construction projects.

    Data may be unavailable for some projects and details are subject to change until construction is complete. Addresses are excluded for projects with fewer than five homes for privacy reasons.

    The dataset includes details on the number of “homes”. We use the word "home" to refer to any single unit of housing regardless of size, type, or whether it is rented or owned. For example, a building with 40 apartments counts as 40 homes, and a single detached house counts as one home.

    The dataset includes details about the phase of each project when a project involves constructing new housing. The process for building a new development is as follows: First, one must receive approval from the city’s Planning Division, which is also known as being “entitled.” Next, one must apply for and receive a permit from the city’s Building Division before beginning construction. Finally, once construction is complete and all city inspections have been passed, the building can be occupied.

    To contribute to a city goal, homes must meet affordability requirements based on a standard called area median income. The city considers housing affordable if is targeted to households earning at or below 80% of the area median income. For a three-person household in Boise, that equates to an annual income of $60,650 and monthly housing cost of $1,516. Deeply affordable housing sets the income limit at 60% of area median income, or even 30% of area median income. See Boise Income Guidelines for more details.Project Name – The name of each project. If a row is related to the Home Improvement Loan program, that row aggregates data for all homes that received a loan in that quarter or year. Primary Address – The primary address for the development. Some developments encompass multiple addresses.Project Address(es) – Includes all addresses that are included as part of the development project.Parcel Number(s) – The identification code for all parcels of land included in the development.Acreage – The number of acres for the parcel(s) included in the project.Planning Permit Number – The identification code for all permits the development has received from the Planning Division for the City of Boise. The number and types of permits required vary based on the location and type of development.Date Entitled – The date a development was approved by the City’s Planning Division.Building Permit Number – The identification code for all permits the development has received from the city’s Building Division.Date Building Permit Issued – Building permits are required to begin construction on a development.Date Final Certificate of Occupancy Issued – A certificate of occupancy is the final approval by the city for a development, once construction is complete. Not all developments require a certificate of occupancy.Studio – The number of homes in the development that are classified as a studio. A studio is typically defined as a home in which there is no separate bedroom. A single room serves as both a bedroom and a living room.1-Bedroom – The number of homes in a development that have exactly one bedroom.2-Bedroom – The number of homes in a development that have exactly two bedrooms.3-Bedroom – The number of homes in a development that have exactly three bedrooms.4+ Bedroom – The number of homes in a development that have four or more bedrooms.# of Total Project Units – The total number of homes in the development.# of units toward goals – The number of homes in a development that contribute to either the city’s goal to produce housing affordable at or under 60% of area median income, or the city’s goal to create permanent supportive housing for households experiencing homelessness. Rent at or under 60% AMI - The number of homes in a development that are required to be rented at or below 60% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details. Boise defines a home as “affordable” if it is rented or sold at or below 80% of area median income.Rent 61-80% AMI – The number of homes in a development that are required to be rented at between 61% and 80% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details. Boise defines a home as “affordable” if it is rented or sold at or below 80% of area median income.Rent 81-120% AMI - The number of homes in a development that are required to be rented at between 81% and 120% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details.Own at or under 60% AMI - The number of homes in a development that are required to be sold at or below 60% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details. Boise defines a home as “affordable” if it is rented or sold at or below 80% of area median income.

  18. The Pizza Problem

    • kaggle.com
    zip
    Updated Feb 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Jeanne (2019). The Pizza Problem [Dataset]. https://www.kaggle.com/jeremyjeanne/google-hashcode-pizza-training-2019
    Explore at:
    zip(178852 bytes)Available download formats
    Dataset updated
    Feb 8, 2019
    Authors
    Jeremy Jeanne
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Problem description

    Pizza

    The pizza is represented as a rectangular, 2-dimensional grid of R rows and C columns. The cells within the grid are referenced using a pair of 0-based coordinates [r, c] , denoting respectively the row and the column of the cell.

    Each cell of the pizza contains either:

    mushroom, represented in the input file as M
    tomato, represented in the input file as T
    

    Slice

    A slice of pizza is a rectangular section of the pizza delimited by two rows and two columns, without holes. The slices we want to cut out must contain at least L cells of each ingredient (that is, at least L cells of mushroom and at least L cells of tomato) and at most H cells of any kind in total - surprising as it is, there is such a thing as too much pizza in one slice. The slices being cut out cannot overlap. The slices being cut do not need to cover the entire pizza.

    Goal

    The goal is to cut correct slices out of the pizza maximizing the total number of cells in all slices. Input data set The input data is provided as a data set file - a plain text file containing exclusively ASCII characters with lines terminated with a single ‘ ’ character at the end of each line (UNIX- style line endings).

    File format

    The file consists of:

    one line containing the following natural numbers separated by single spaces:
    R (1 ≤ R ≤ 1000) is the number of rows
    C (1 ≤ C ≤ 1000) is the number of columns
    L (1 ≤ L ≤ 1000) is the minimum number of each ingredient cells in a slice
    H (1 ≤ H ≤ 1000) is the maximum total number of cells of a slice
    

    Google 2017, All rights reserved.

    R lines describing the rows of the pizza (one after another). Each of these lines contains C characters describing the ingredients in the cells of the row (one cell after another). Each character is either ‘M’ (for mushroom) or ‘T’ (for tomato).

    Example

    3 5 1 6
    TTTTT
    TMMMT
    TTTTT
    

    3 rows, 5 columns, min 1 of each ingredient per slice, max 6 cells per slice

    Example input file.

    Submissions

    File format

    The file must consist of:

    one line containing a single natural number S (0 ≤ S ≤ R × C) , representing the total number of slices to be cut,
    U lines describing the slices. Each of these lines must contain the following natural numbers separated by single spaces:
    r 1 , c 1 , r 2 , c 2 describe a slice of pizza delimited by the rows r (0 ≤ r1,r2 < R, 0 ≤ c1, c2 < C) 1 and r 2 and the columns c 1 and c 2 , including the cells of the delimiting rows and columns. The rows ( r 1 and r 2 ) can be given in any order. The columns ( c 1 and c 2 ) can be given in any order too.
    

    Example

    0 0 2 1
    0 2 2 2
    0 3 2 4
    

    3 slices.

    First slice between rows (0,2) and columns (0,1).
    Second slice between rows (0,2) and columns (2,2).
    Third slice between rows (0,2) and columns (3,4).
    Example submission file.
    

    © Google 2017, All rights reserved.

    Slices described in the example submission file marked in green, orange and purple. Validation

    For the solution to be accepted:

    the format of the file must match the description above,
    each cell of the pizza must be included in at most one slice,
    each slice must contain at least L cells of mushroom,
    each slice must contain at least L cells of tomato,
    total area of each slice must be at most H
    

    Scoring

    The submission gets a score equal to the total number of cells in all slices. Note that there are multiple data sets representing separate instances of the problem. The final score for your team is the sum of your best scores on the individual data sets. Scoring example

    The example submission file given above cuts the slices of 6, 3 and 6 cells, earning 6 + 3 + 6 = 15 points.

  19. G

    Scientific Data Management Systems Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Scientific Data Management Systems Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/scientific-data-management-systems-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Scientific Data Management Systems Market Outlook



    According to our latest research, the global scientific data management systems (SDMS) market size reached USD 4.3 billion in 2024, reflecting robust adoption across multiple scientific disciplines and industries. The market is projected to grow at a compound annual growth rate (CAGR) of 12.1% from 2025 to 2033, reaching an estimated USD 12.1 billion by 2033. This remarkable growth trajectory is primarily driven by the increasing complexity and volume of scientific data, the growing demand for integrated data management solutions, and the critical need for compliance with regulatory standards in research-intensive sectors such as pharmaceuticals, biotechnology, and healthcare.



    The expanding volume of scientific data, generated by advanced research methodologies and sophisticated laboratory instruments, is a primary growth driver for the scientific data management systems market. Organizations across life sciences, environmental sciences, and healthcare are generating and acquiring massive datasets, often in disparate formats and from multiple sources. This data deluge necessitates robust SDMS platforms that can efficiently capture, store, organize, and retrieve data, ensuring data integrity and facilitating seamless collaboration among research teams. Furthermore, the integration of artificial intelligence and machine learning capabilities into SDMS solutions is enhancing the ability to analyze complex datasets, extract actionable insights, and accelerate scientific discoveries, further fueling market expansion.



    Another significant growth factor for the scientific data management systems market is the stringent regulatory landscape governing scientific research and data management. Regulatory bodies such as the FDA, EMA, and other international agencies mandate rigorous data documentation, traceability, and security protocols, especially in drug development, clinical trials, and genomics research. SDMS platforms play a pivotal role in ensuring compliance with these regulations by providing audit trails, electronic signatures, and secure data storage. The increasing focus on data privacy, reproducibility of research, and adherence to Good Laboratory Practice (GLP) and Good Clinical Practice (GCP) guidelines are compelling organizations to invest in advanced SDMS solutions to mitigate compliance risks and maintain competitive advantage.



    The growing adoption of cloud-based and hybrid deployment models is further propelling the scientific data management systems market. Cloud-based SDMS solutions offer scalability, flexibility, and cost-effectiveness, enabling organizations to manage large volumes of data without the need for significant infrastructure investments. Hybrid models, which combine on-premises and cloud capabilities, are gaining traction among organizations seeking to balance data security with operational efficiency. The increasing digital transformation initiatives across the scientific community, coupled with the rising trend of collaborative research, are creating a fertile environment for SDMS vendors to innovate and expand their offerings, driving sustained market growth over the forecast period.



    From a regional perspective, North America currently dominates the scientific data management systems market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of leading pharmaceutical and biotechnology companies, well-established research infrastructure, and a strong emphasis on regulatory compliance. Europe follows closely, driven by significant investments in life sciences research and increasing adoption of digital technologies in academic and clinical settings. The Asia Pacific region is emerging as a high-growth market, supported by expanding research activities, government initiatives to modernize healthcare infrastructure, and growing collaborations between academic institutions and industry players. These regional dynamics underscore the global nature of the SDMS market and highlight the diverse opportunities for stakeholders across different geographies.



    In the realm of life sciences, the Life Sciences Controlled Substance Ordering System is becoming an integral component for organizations dealing with regulated substances. This system is designed to streamline the ordering process of controll

  20. Data from: precisionFDA Truth Challenge V2: Calling variants from short- and...

    • data.nist.gov
    • catalog.data.gov
    Updated Jan 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathanael David Olson (2021). precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions [Dataset]. http://doi.org/10.18434/mds2-2336
    Explore at:
    Dataset updated
    Jan 26, 2021
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Nathanael David Olson
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants. This dataset includes the fastq files provided to participants, the submitted variant callset as vcfs, and the benchmarking results, along with challenge submission metadata.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Centers for Disease Control and Prevention (2025). An aggregated dataset of day 3 post-inoculation viral titer measurements from influenza A virus-infected ferret tissues [Dataset]. https://catalog.data.gov/dataset/an-aggregated-dataset-of-day-3-post-inoculation-viral-titer-measurements-from-influenza-a-
Organization logo

An aggregated dataset of day 3 post-inoculation viral titer measurements from influenza A virus-infected ferret tissues

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 4, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description

Data from influenza A virus (IAV) infected ferrets (Mustela putorius furo) provides invaluable information towards the study of novel and emerging viruses that pose a threat to human health. This gold standard animal model can recapitulate many clinical signs of infection present in IAV-infected humans, supports virus replication of human and zoonotic strains without prior adaptation, and permits evaluation of virus transmissibility by multiple modes. While ferrets have been employed in risk assessment settings for >20 years, results from this work are typically reported in discrete stand-alone publications, making aggregation of raw data from this work over time nearly impossible. Here, we describe a dataset of 333 ferrets inoculated with 107 unique IAV, conducted by a single research group (NCIRD/ID/IPB/Pathogenesis Laboratory Team) under a uniform experimental protocol. This collection of ferret tissue viral titer data on a per-individual ferret level represents a companion dataset to ‘An aggregated dataset of serially collected influenza A virus morbidity and titer measurements from virus-infected ferrets’. However, care must be taken when combining datasets at the level of individual animals (see PMID 40245007 for guidance in best practices for comparing datasets comprised of serially-collected and fixed-timepoint in vivo-generated data). See publications using and describing data for more information: Kieran TJ, Sun X, Tumpey TM, Maines TR, Belser JA. 202X. Spatial variation of infectious virus load in aggregated day 3 post-inoculation respiratory tract tissues from influenza A virus-infected ferrets. Under peer review. Kieran TJ, Sun X, Maines TR, Belser JA. 2025. Predictive models of influenza A virus lethal disease: insights from ferret respiratory tract and brain tissues. Scientific Reports, in press. Bullock TA, Pappas C, Uyeki TM, Brock N, Kieran TJ, Olsen SJ, Davis CD, Tumpey TM, Maines TR, Belser JA. 2025. The (digestive) path less traveled: influenza A virus and the gastrointestinal tract. mBio, in press. Kieran TJ, Sun X, Maines TR, Beauchemin CAA, Belser JA. 2024. Exploring associations between viral titer measurements and disease outcomes in ferrets inoculated with 125 contemporary influenza A viruses. J Virol98: e01661-23. https://doi.org/10.1038/s41597-024-03256-6 Related dataset: Kieran TJ, Sun X, Creager HM, Tumpey TM, Maine TR, Belser JA. 2025. An aggregated dataset of serial morbidity and titer measurements from influenza A virus-infected ferrets. Sci Data, 11(1):510. https://doi.org/10.1038/s41597-024-03256-6 https://data.cdc.gov/National-Center-for-Immunization-and-Respiratory-D/An-aggregated-dataset-of-serially-collected-influe/cr56-k9wj/about_data Other relevant publications for best practices on data handling and interpretation: Kieran TJ, Maines TR, Belser JA. 2025. Eleven quick tips to unlock the power of in vivo data science. PLoS Comput Biol, 21(4):e1012947. https://doi.org/10.1371/journal.pcbi.1012947 Kieran TJ, Maines TR, Belser JA. 2025. Data alchemy, from lab to insight: Transforming in vivo experiments into data science gold. PLoS Pathog, 20(8):e1012460. https://doi.org/10.1371/journal.ppat.1012460

Search
Clear search
Close search
Google apps
Main menu