Facebook
TwitterData from influenza A virus (IAV) infected ferrets (Mustela putorius furo) provides invaluable information towards the study of novel and emerging viruses that pose a threat to human health. This gold standard animal model can recapitulate many clinical signs of infection present in IAV-infected humans, supports virus replication of human and zoonotic strains without prior adaptation, and permits evaluation of virus transmissibility by multiple modes. While ferrets have been employed in risk assessment settings for >20 years, results from this work are typically reported in discrete stand-alone publications, making aggregation of raw data from this work over time nearly impossible. Here, we describe a dataset of 333 ferrets inoculated with 107 unique IAV, conducted by a single research group (NCIRD/ID/IPB/Pathogenesis Laboratory Team) under a uniform experimental protocol. This collection of ferret tissue viral titer data on a per-individual ferret level represents a companion dataset to ‘An aggregated dataset of serially collected influenza A virus morbidity and titer measurements from virus-infected ferrets’. However, care must be taken when combining datasets at the level of individual animals (see PMID 40245007 for guidance in best practices for comparing datasets comprised of serially-collected and fixed-timepoint in vivo-generated data). See publications using and describing data for more information: Kieran TJ, Sun X, Tumpey TM, Maines TR, Belser JA. 202X. Spatial variation of infectious virus load in aggregated day 3 post-inoculation respiratory tract tissues from influenza A virus-infected ferrets. Under peer review. Kieran TJ, Sun X, Maines TR, Belser JA. 2025. Predictive models of influenza A virus lethal disease: insights from ferret respiratory tract and brain tissues. Scientific Reports, in press. Bullock TA, Pappas C, Uyeki TM, Brock N, Kieran TJ, Olsen SJ, Davis CD, Tumpey TM, Maines TR, Belser JA. 2025. The (digestive) path less traveled: influenza A virus and the gastrointestinal tract. mBio, in press. Kieran TJ, Sun X, Maines TR, Beauchemin CAA, Belser JA. 2024. Exploring associations between viral titer measurements and disease outcomes in ferrets inoculated with 125 contemporary influenza A viruses. J Virol98: e01661-23. https://doi.org/10.1038/s41597-024-03256-6 Related dataset: Kieran TJ, Sun X, Creager HM, Tumpey TM, Maine TR, Belser JA. 2025. An aggregated dataset of serial morbidity and titer measurements from influenza A virus-infected ferrets. Sci Data, 11(1):510. https://doi.org/10.1038/s41597-024-03256-6 https://data.cdc.gov/National-Center-for-Immunization-and-Respiratory-D/An-aggregated-dataset-of-serially-collected-influe/cr56-k9wj/about_data Other relevant publications for best practices on data handling and interpretation: Kieran TJ, Maines TR, Belser JA. 2025. Eleven quick tips to unlock the power of in vivo data science. PLoS Comput Biol, 21(4):e1012947. https://doi.org/10.1371/journal.pcbi.1012947 Kieran TJ, Maines TR, Belser JA. 2025. Data alchemy, from lab to insight: Transforming in vivo experiments into data science gold. PLoS Pathog, 20(8):e1012460. https://doi.org/10.1371/journal.ppat.1012460
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Peevski (From Huggingface) [source]
The OpenLeecher/GPT4-10k dataset is a comprehensive collection of 100 diverse conversations, presented in text format, revolving around a wide range of topics. These conversations cover various domains such as coding, debugging, storytelling, and science. Aimed at facilitating training and analysis purposes for researchers and developers alike, this dataset offers an extensive array of conversation samples.
Each conversation within this dataset delves into different subject matters related to coding techniques, debugging strategies, storytelling methods; while also exploring concepts like spatial thinking, logical thinking. Furthermore, the conversations touch upon scientific fields including chemistry, physics and biology. To add further depth to the dataset's content, it also includes discussions on the topic of law.
By providing this rich assortment of conversations spanning across multiple domains and disciplines in one cohesive dataset format on Kaggle platform as train.csv file , it empowers users to delve into these dialogue examples for exploration and analysis effortlessly. This compilation serves as an invaluable resource for understanding various aspects of coding practices alongside stimulating scientific discussions on subjects spanning across multiple fields
Introduction:
Understanding the Dataset Structure: The dataset consists of a CSV file named 'train.csv'. When examining the file's columns using software or programming language of your choice (e.g., Python), you will notice two key columns: 'chat' and '**chat'. Both these columns contain text data representing conversations between two or more participants.
Exploring Different Topics: The dataset covers a vast spectrum of subjects including coding techniques, debugging strategies, storytelling methods, spatial thinking, logical thinking, chemistry, physics, biology, and law each conversation:
- Coding Techniques: Discover discussions on various programming concepts and best practices.
- Debugging Strategies: Explore conversations related to identifying and fixing software issues.
- Storytelling Methods: Dive into dialogues about effective storytelling techniques in different contexts.
- Spatial Thinking: Engage with conversations that involve developing spatial reasoning skills for problem-solving.
- Logical Thinking: Learn from discussions focused on enhancing logical reasoning abilities related to different domains.
- Chemistry
- Physics
- Biology
- Law
Analyzing Conversations: leverage natural language processing (NLP) tools or techniques such as sentiment analysis print(Number of Conversations:, len(df)) together
Accessible Code Examples
Maximize Training Efficiency:
Taking Advantage of Diversity:
Creating New Applications:
Conclusion:
- Natural Language Processing Research: Researchers can leverage this dataset to train and evaluate natural language processing models, particularly in the context of conversational understanding and generation. The diverse conversations on coding, debugging, storytelling, and science can provide valuable insights into modeling human-like conversation patterns.
- Chatbot Development: The dataset can be utilized for training chatbots or virtual assistants that can engage in conversations related to coding, debugging, storytelling, and science. By exposing the chatbot to a wide range of conversation samples from different domains, developers can ensure that their chatbots are capable of providing relevant and accurate responses.
- Domain-specific Intelligent Assistants: Organizations or individuals working in fields such as coding education or scientific research may use this dataset to develop intelligent assistants tailored specifically for these domains. These assistants can help users navigate complex topics by answering questions related to coding techniques, debugging strategies, storytelling methods, or scientific concepts. Overall,'train.csv' provides a rich resource for researchers and developers interested in building conversational AI systems with knowledge across multiple domains including even legal matters
If you use this dataset in your research, please credit the original authors. Data Source
**Li...
Facebook
TwitterSUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of physical illnesses that are linked with obesity and inactivity. Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to:- Asthma (in persons of all ages)- Cancer (in persons of all ages)- Chronic kidney disease (in adults aged 18+)- Coronary heart disease (in persons of all ages)- Diabetes mellitus (in persons aged 17+)- Hypertension (in persons of all ages)- Stroke and transient ischaemic attack (in persons of all ages)This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.For each of the above illnesses, the percentage of each MSOA’s population with that illness was estimated. This was achieved by calculating a weighted average based on:- The percentage of the MSOA area that was covered by each GP practice’s catchment area- Of the GPs that covered part of that MSOA: the percentage of patients registered with each GP that have that illnessThe estimated percentage of each MSOA’s population with each illness was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with each illness, within the relevant age range.For each illness, each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have that illnessB) the NUMBER of people within that MSOA who are estimated to have that illnessAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA predicted to have that illness, compared to other MSOAs. In other words, those are areas where a large number of people are predicted to suffer from an illness, and where those people make up a large percentage of the population, indicating there is a real issue with that illness within the population and the investment of resources to address that issue could have the greatest benefits.The scores for each of the 7 illnesses were added together then converted to a relative score between 1 – 0 (1 = worst, 0 = best), to give an overall score for each MSOA: a score close to 1 would indicate that an area has high predicted levels of all obesity/inactivity-related illnesses, and these are areas where the local population could benefit the most from interventions to address those illnesses. A score close to 0 would indicate very low predicted levels of obesity/inactivity-related illnesses and therefore interventions might not be required.LIMITATIONS1. GPs do not have catchments that are mutually exclusive from each other: they overlap, with some geographic areas being covered by 30+ practices. This dataset should be viewed in combination with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset to identify where there are areas that are covered by multiple GP practices but at least one of those GP practices did not provide data. Results of the analysis in these areas should be interpreted with caution, particularly if the levels of obesity/inactivity-related illnesses appear to be significantly lower than the immediate surrounding areas.2. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).3. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.4. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of obesity/inactivity-related illnesses, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of these illnesses. TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:- Health and wellbeing statistics (GP-level, England): Missing data and potential outliersDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contain 4 videos for different entry scenario. 1- One person in and one out at same time. 2. two person in and one out at same time. 3. two person in and 2 person out at same time 4. multiple in/out at same time
Dataset can implement the best way to practice the visitor counter to historical places, schools, hospitals ...etc.
This dataset can be used under open access license. To cite this dataset refer to: https://doi.org/10.21123/bsj.2024.10540
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data correspond to the posts (questions and answers) retrieved by querying for posts related to the tag 'machine learning' and the phrase 'best practice(s).' The data were used as the basis for a study currently under review on discussing machine learning best practices as discussed by practitioners in question-and-answer communities such as Stack Exchange. The information from each type of post (i.e., questions and answers) is presented in multiple formats (i.e., .txt, .csv, and .xlsx).
Answers - Variables
Questions - Variables
This dataset is a subset of the Stack Exchange dump of 03.2021 (https://archive.org/details/stackexchange_20210301) in which a series of filters were applied to obtain the data used in the study.
Facebook
TwitterThis repository contains spatiotemporal data from many official sources for 2019-Novel Coronavirus beginning 2019 in Hubei, China ("nCoV_2019")
You may not use this data for commercial purposes. If there is a need for commercial use of the data, please contact Metabiota at info@metabiota.com to obtain a commercial use license.
The incidence data are in a CSV file format. One row in an incidence file contains a piece of epidemiological data extracted from the specified source.
The file contains data from multiple sources at multiple spatial resolutions in cumulative and non-cumulative formats by confirmation status. To select a single time series of case or death data, filter the incidence dataset by source, spatial resolution, location, confirmation status, and cumulative flag.
Data are collected, structured, and validated by Metabiota’s digital surveillance experts. The data structuring process is designed to produce the most reliable estimates of reported cases and deaths over space and time. The data are cleaned and provided in a uniform format such that information can be compared across multiple sources. Data are collected at the time of publication in the highest geographic and temporal resolutions available in the original report.
This repository is intended to provide a single access point for data from a wide range of data sources. Data will be updated periodically with the latest epidemiological data. Metabiota maintains a database of epidemiological information for over two thousand high-priority infectious disease events. Please contact us (info@metabiota.com) if you are interested in licensing the complete dataset.
Reporting sources provide either cumulative incidence, non-cumulative incidence, or both. If the source only provides a non-cumulative incidence value, the cumulative values are inferred using prior reports from the same source. Use the CUMULATIVE FLAG variable to subset the data to cumulative (TRUE) or non-cumulative (FALSE) values.
The incidence datasets include the confirmation status of cases and deaths when this information is provided by the reporting source. Subset the data by the CONFIRMATION_STATUS variable to either TOTAL, CONFIRMED, SUSPECTED, or PROBABLE to obtain the data of your choice.
Total incidence values include confirmed, suspected, and probable incidence values. If a source only provides suspected, probable, or confirmed incidence, the total incidence is inferred to be the sum of the provided values. If the report does not specify confirmation status, the value is included in the "total" confirmation status value.
The data provided under the "Metabiota Composite Source" often does not include suspected incidence due to inconsistencies in reporting cases and deaths with this confirmation status.
The incidence datasets include cases and deaths. Subset the data to either CASE or DEATH using the OUTCOME variable. It should be noted that deaths are included in case counts.
Data are provided at multiple spatial resolutions. Data should be subset to a single spatial resolution of interest using the SPATIAL_RESOLUTION variable.
Information is included at the finest spatial resolution provided to the original epidemic report. We also aggregate incidence to coarser geographic resolutions. For example, if a source only provides data at the province-level, then province-level data are included in the dataset as well as country-level totals. Users should avoid summing all cases or deaths in a given country for a given date without specifying the SPATIAL_RESOLUTION value. For example, subset the data to SPATIAL_RESOLUTION equal to “AL0” in order to view only the aggregated country level data.
There are differences in administrative division naming practices by country. Administrative levels in this dataset are defined using the Google Geolocation API (https://developers.google.com/maps/documentation/geolocation/). For example, the data for the 2019-nCoV from one source provides information for the city of Beijing, which Google Geolocations indicates is a “locality.” Beijing is also the name of the municipality where the city Beijing is located. Thus, the 2019-nCoV dataset includes rows of data for both the city Beijing, as well as the municipality of the same name. If additional cities in the Beijing municipality reported data, those data would be aggregated with the city Beijing data to form the municipality Beijing data.
Data sources in this repository were selected to provide comprehensive spatiotemporal data for each outbreak. Data from a specific source can be selected using the SOURCE variable.
In addition to the original reporting sources, Metabiota compiles multiple sources to generate the most comprehensive view of an outbreak. This compilation is stored in the database under the source name “Metabiota Composite Source.” The purpose of generating this new view of the outbreak is to provide the most accurate and precise spatiotemporal data for the outbreak. At this time, Metabiota does not incorporate unofficial - including media - sources into the “Metabiota Composite Source” dataset.
Data are collected by a team of digital surveillance experts and undergo many quality assurance tests. After data are collected, they are independently verified by at least one additional analyst. The data also pass an automated validation program to ensure data consistency and integrity.
Creative Commons License Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
This is a human-readable summary of the Legal Code.
You are free:
to Share — to copy, distribute and transmit the work to Remix — to adapt the work
Under the following conditions:
Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Noncommercial — You may not use this work for commercial purposes.
Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
With the understanding that:
Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.
Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
Other Rights — In no way are any of the following rights affected by the license: Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; The author's moral rights; Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.
For details and the full license text, see http://creativecommons.org/licenses/by-nc-sa/3.0/
Metabiota shall in no event be liable for any decision taken by the user based on the data made available. Under no circumstances, shall Metabiota be liable for any damages (whatsoever) arising out of the use or inability to use the database. The entire risk arising out of the use of the database remains with the user.
Facebook
TwitterAutomated Weather Station and AWS-like networks are the primary source of surface-level meteorological data in remote polar regions. These networks have developed organically and independently, and deliver data to researchers in idiosyncratic ASCII formats that hinder automated processing and intercomparison among networks. Moreover, station tilt causes significant biases in polar AWS measurements of radiation and wind direction. Researchers, network operators, and data centers would benefit from AWS-like data in a common format, amenable to automated analysis, and adjusted for known biases. This project addresses these needs by developing a scientific software workflow called "Justified AWS" (JAWS) to ingest Level 2 (L2) data in the multiple formats now distributed, harmonize it into a common format, and deliver value-added Level 3 (L3) output suitable for distribution by the network operator, analysis by the researcher, and curation by the data center. Polar climate researchers currently face daunting problems including how to easily: 1. Automate analysis (subsetting, statistics, unit conversion) of AWS-like L2 ASCII data. 2. Combine or intercompare data and data quality from among unharmonized L2 datasets. 3. Adjust L2 data for biases such as AWS tilt angle and direction. JAWS addresses these common issues by harmonizing AWS L2 data into a common format, and applying accepted methods to quantify quality and estimate biases. Specifically, JAWS enables users and network operators to 1. Convert L2 data (usually ASCII tables) into a netCDF-based L3 format compliant with metadata conventions (Climate-Forecast and ACDD) that promote automated discovery and analysis. 2. Include value-added L3 features like the Retrospective, Iterative, Geometry-Based (RIGB) tilt angle and direction corrections, solar angles, and standardized quality flags. 3. Provide a scriptable API to extend the initial L2-to-L3 conversion to newer AWS-like networks and instruments. Polar AWS network experts and NSIDC DAAC personnel, each with decades of experience, will help guide and deliberate the L3 conventions implemented in Stages 2-3. The project will start on July 1, 2017 at entry Technology Readiness Level 3 and will exit on June 30, 2019 at TRL 6. JAWS is now a heterogeneous collection of scripts and methods developed and validated at UCI over the past 15 years. At exit, JAWS will comprise three modular stages written in or wrapped by Python, installable by Conda: Stage 1 ingests and translates L2 data into netCDF. Stage 2 annotates the netCDF with CF and ACDD metadata. Stage 3 derives value-added scientific and quality information. The labor-intensive tasks include turning our heterogeneous workflow into a robust, standards-compliant, extensible workflow with an API based on best practices of modern scientific information systems and services. Implementation of Stages 1-2 may be straightforward though tedious due to the menagerie of L2 formats, instruments, and assumptions. The RIGB component of Stage 3 requires ongoing assimilation of ancillary NASA data (CERES, AIRS) and use of automated data transfer protocols (DAP, THREDDS). The immediate target recipient elements are polar AWS network managers, users, and data distributors. L2 borehole data suffers from similar interoperability issues, as does non-polar AWS data. Hence our L3 format will be extensible to global AWS and permafrost networks. JAWS will increase in situ data accessibility and utility, and enable new derived products (both are AIST goals). The PI is a long-standing researcher, open source software developer, and educator who understands obstacles to harmonizing disparate datasets with NASA interoperability recommendations. Our team participates in relevant geoscience communities, including ESDS working groups, ESIP, AGU, and EarthCube.
Facebook
TwitterThis dataset is from a college Assignment from NIT-Bhopal-India to practice ML classification techniques- 1. Support Vector Machine 2. K-Nearest Neighbour Classifier 3. Support Vector Machine
You can try the Tasks yourself Task 1: Using one of the training data set, train the SVM classifiers that separate the two classes. Classify the test data set using this SVM classifier. Compute the classification error and confusion matrix. Task 2: Using one of the training data set, predict the class label of test data points using K-Nearest Neighbour classifier. Compute the classification error and confusion matrix. Task 3: Import one of the training data file and test data file. Combine the dataset from both the files and apply PCA to reduce the dimension of the dataset from 2 to 1.
Facebook
TwitterThe Infant Feeding Survey (IFS) has been carried out every five years since 1975, in order to establish information about infant feeding practices. Government policy in the United Kingdom has consistently supported breastfeeding as the best way of ensuring a healthy start for infants and of promoting women's health. Current guidance on infant feeding is as follows:
Facebook
TwitterBackground The evaluation of randomized trials for cancer screening involves special statistical considerations not found in therapeutic trials. Although some of these issues have been discussed previously, we present important recent and new methodologies. Methods Our emphasis is on simple approaches. Results We make the following recommendations: (1) Use death from cancer as the primary endpoint, but review death records carefully and report all causes of death (2) Use a simple "causal" estimate to adjust for nonattendance and contamination occurring immediately after randomization (3) Use a simple adaptive estimate to adjust for dilution in follow-up after the last screen Conclusion The proposed guidelines combine recent methodological work on screening endpoints and noncompliance/contamination with a new adaptive method to adjust for dilution in a study where follow-up continues after the last screen. These guidelines ensure good practice in the design and analysis of randomized trials of cancer screening.
Facebook
TwitterAttribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
License information was derived automatically
The CLARISSA Cash Plus intervention represented an innovative social protection scheme for tackling social ills, including the worst forms of child labour (WFCL). A universal and unconditional ‘cash plus’ programme, it combined community mobilisation, case work, and cash transfers (CTs). It was implemented in a high-density, low-income neighbourhood in Dhaka to build individual, family, and group capacities to meet needs. This, in turn, was expected to lead to a corresponding decrease in deprivation and community-identified social issues that negatively affect wellbeing, including WFCL. Four principles underpinned the intervention: Unconditionality, Universality, Needs-centred and people-led, and Emergent and open-ended.The intervention took place in Dhaka – North Gojmohol – over a 27-month period, between October 2021 and December 2023, to test and study the impact of providing unconditional and people‑led support to everyone in a community. Cash transfers were provided between January and June 2023 in monthly instalments, plus one investment transfer in September 2023. A total of 1,573 households received cash, through the Upay mobile financial service. Cash was complemented by a ‘plus’ component, implemented between October 2021 and December 2023. Referred to as relational needs-based community organising (NBCO), a team of 20 community mobilisers (CMs) delivered case work at the individual and family level and community mobilisation at the group level. The intervention was part of the wider CLARISSA programme, led by the Institute of Development Studies (IDS) and funded by UK’s Foreign, Commonwealth & Development Office (FCDO). The intervention was implemented by Terre des hommes (Tdh) in Bangladesh and evaluated in collaboration with the BRAC Institute of Governance and Development (BIGD) and researchers from the University of Bath and the Open University, UK.The evaluation of the CLARISSA Social Protection pilot was rooted in contribution analysis that combined multiple methods over more than three years in line with emerging best practice guidelines for mixed methods research on children, work, and wellbeing. Quantitative research included bi-monthly monitoring surveys administered by the project’s community mobilisers (CMs), including basic questions about wellbeing, perceived economic resilience, school attendance, etc. This was complimented by baseline, midline, and endline surveys, which collected information about key outcome indicators within the sphere of influence of the intervention, such as children’s engagement with different forms of work and working conditions, with schooling and other activities, household living conditions and sources of income, and respondents’ perceptions of change. Qualitative tools were used to probe topics and results of interest, as well as impact pathways. These included reflective diaries written by the community mobilisers; three rounds of focus group discussions (FGDs) with community members; three rounds of key informant interviews (KIIs) with members of case study households; and long-term ethnographic observation.Quantitative DataThe quantitative evaluation of the CLARISSA Cash Plus intervention involved several data collection methods to gather information about household living standards, children’s education and work, and social dynamics. The data collection included a pre-intervention census, four periodic surveys, and 13 rounds of bi-monthly monitoring surveys, all conducted between late 2020 and late 2023. Details of each instrument are as follows:Census: Conducted in October/November 2020 in the target neighbourhood of North Gojmohol (n=1,832) and the comparison neighbourhood of Balurmath (n=2,365)Periodic surveys: Baseline (February 2021, n=752 in North Gojmohol), Midline 1 (before cash) (October 2022, n=771 in North Gojmohol), Midline 2 (after 6 rounds of cash) (July 2023, n=769 in North Gojmohol), and Endline (December 2023, n=750 in North Gojmohol and n=773 in Balumath)Bi-monthly monitoring data (13 rounds): Conducted between December 2021 and December 2023 in North Gojmohol (average of 1,400 households per round)The present repository summarizes this information, organized as follows:1.1 Bimonthly survey (household): Panel dataset comprising 13 rounds of bi-monthly monitoring data at the household level (average of 1,400 households per round, total of 18,379 observations)1.2 Bimonthly survey (child): Panel dataset comprising 13 rounds of bi-monthly monitoring data at the child level (aged 5 to 16 at census) (average of 940 children per round, total of 12,213 observations)2.1 Periodic survey (household): Panel dataset comprising 5 periodic surveys (census, baseline, midline 1, midline 2, endline) at the household level (average of 750 households per period, total of 3,762 observations)2.2 Periodic survey (child): Panel dataset comprising 4 periodic surveys (baseline, midline 1, midline 2, endline) at the child level (average of 3,100 children per period, total of 12,417 observations)3.0 Balurmat - North Gojmohol panel: Balanced panel dataset comprising 558 households in North Gojmohol and 773 households in Balurmath, observed both at 2020 census and 2023 endline (total of 2,662 observations)4.0 Questionnaires: Original questionnaires for all datasetsAll datasets are provided in Stata format (.dta) and Excel format (.xlsx) and are accompanied by their respective dictionary in Excel format (.xlsx).Qualitative DataThe qualitative study was conducted in three rounds: the first round of IDIs and FGDs took place between December 2022 and January 2023; the second round took place from April to May 2023; and the third round took place from November to December 2023. KIIs were taken during the 2nd round of study in May 2023.The sample size by round and instrument type is shown below:RoundsIDIs with childrenIDIs with parentsIDIs with CMsFGDsKIIs1st Round (12/2022 – 01/2023)3026-06-2nd Round ( 04/2023 – 05/2023)3023-06053rd Round (11/2023 – 12/2023)26250307-The files in this archive contain the qualitative data and include six types of transcripts:· 1.1 Interviews with children in case study households (IDI): 30 families in round 1, 30 in round 2, and 26 in round 3· 1.2 Interviews with parents in case study households (IDI): 26 families in round 1, 23 in round 2, and 25 in round 3· 1.3 Interviews with community mobiliser (IDI): 3 CM in round 3· 2.0 Key informant interviews (KII): 5 in round 2· 3.0 Focus group discussions (FGD): 6 in round 1, 6 in round 2, and 7 in round 3· 4.0 Community mobiliser micro-narratives (556 cases)Additionally, this repository includes a comprehensive list of all qualitative data files ("List of all qualitative data+MC.xlsx").
Facebook
TwitterChromatin immunoprecipitation followed by next-generation sequencing (ChIP-Seq) has opened new avenues of research in the genome-wide characterization of regulatory DNA-protein interactions at the genetic and epigenetic level. As a consequence, it has become the de facto standard for studies on the regulation of transcription, and literally thousands of data sets for transcription factors and cofactors in different conditions and species are now available to the scientific community. However, while pipelines and best practices have been established for the analysis of a single experiment, there is still no consensus on the best way to perform an integrated analysis of multiple datasets in the same condition, in order to identify the most relevant and widespread regulatory modules composed by different transcription factors and cofactors. We present here a computational pipeline for this task, that integrates peak summit colocalization, a novel statistical framework for the evaluation of its significance, and motif enrichment analysis. We show examples of its application to ENCODE data, that led to the identification of relevant regulatory modules composed of different factors, as well as the organization on DNA of the binding motifs responsible for their recruitment.
Facebook
TwitterPhytobenthos community data, a large portion of the data held are monitoring data submitted for the OSPAR CEMP and HELCOM COMBINE monitoring programmes and therefore follow specific monitoring programme guidelines. AccConID=21 AccConstrDescription=This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licenses offered. Recommended for maximum dissemination and use of licensed materials. AccConstrDisplay=This dataset is licensed under a Creative Commons Attribution 4.0 International License. AccConstrEN=Attribution (CC BY) AccessConstraint=Attribution (CC BY) AccessConstraints=ICES Data Policy: https://www.ices.dk/data/Documents/ICES-Data-policy.pdf Acronym=None added_date=2018-04-16 09:46:22.350000 BrackishFlag=0 CDate=2017-07-10 cdm_data_type=Other CheckedFlag=0 Citation=ICES Environmental Database (DOME), Phytobenthos community. Available online at http://dome.ices.dk. ICES, Copenhagen. Consulted on yyyy-mm-dd. Comments=None ContactEmail=None Conventions=COARDS, CF-1.6, ACDD-1.3 CurrencyDate=None DasID=5755 DasOrigin=Monitoring: field survey DasType=Data DasTypeID=1 DateLastModified={'date': '2025-04-25 01:33:51.812809', 'timezone_type': 1, 'timezone': '+02:00'} DescrCompFlag=0 DescrTransFlag=0 Easternmost_Easting=27.92 EmbargoDate=None EngAbstract=Phytobenthos community data, a large portion of the data held are monitoring data submitted for the OSPAR CEMP and HELCOM COMBINE monitoring programmes and therefore follow specific monitoring programme guidelines. EngDescr=Data are quality assured using internal and external programmes. For example, the national laboratories that take part in monitoring programmes related to contaminants and biological effects that submit information to ICES, subscribe to the Quality Assurance of Information for Marine Environmental Monitoring in Europe (QUASIMEME) or the Biological Effects Quality Assurance in Monitoring Programmes (BEQUALM) inter-laboratory proficiency-testing schemes and perform internal quality assurance. ICES operates through a network of scientific expert and advisory groups. These groups, and the processes they feed into, act as a quality check on the marine evidence, both in terms of how the evidence was gathered and how the evidence has been subsequently treated. The groups, in cooperation with regional programmes under the Regional Sea Conventions, set standards and guidelines for the collection, transmission and analysis of these data. In addition, the ICES Secretariat provides supplementary quality assurance through its internal programmes related to the different types of marine data collection datasets, which is fedback to the participating national and regional programmes. These internal and external programmes and procedures have been established over a period of 30 or more years. They continue to evolve and strive to reflect the best available practices in the collection and treatment of marine data relevant to the ICES community. FreshFlag=0 geospatial_lat_max=62.88 geospatial_lat_min=54.28 geospatial_lat_units=degrees_north geospatial_lon_max=27.92 geospatial_lon_min=11.33 geospatial_lon_units=degrees_east infoUrl=None InputNotes=None institution=ICES License=https://creativecommons.org/licenses/by/4.0/ Lineage=Prior to publication data undergo quality control checked which are described in https://github.com/EMODnet/EMODnetBiocheck?tab=readme-ov-file#understanding-the-output MarineFlag=1 modified_sync=2021-02-04 00:00:00 Northernmost_Northing=62.88 OrigAbstract=None OrigDescr=None OrigDescrLang=None OrigDescrLangNL=None OrigLangCode=None OrigLangCodeExtended=None OrigLangID=None OrigTitle=None OrigTitleLang=None OrigTitleLangCode=None OrigTitleLangID=None OrigTitleLangNL=None Progress=In Progress PublicFlag=1 ReleaseDate=Apr 16 2018 12:00AM ReleaseDate0=2018-04-16 RevisionDate=None SizeReference=None sourceUrl=(local files) Southernmost_Northing=54.28 standard_name_vocabulary=CF Standard Name Table v70 StandardTitle=ICES Phytobenthos community dataset StatusID=1 subsetVariables=ScientificName,BasisOfRecord,YearCollected,MonthCollected,DayCollected,aphia_id TerrestrialFlag=0 time_coverage_end=2016-10-26T01:00:00Z time_coverage_start=2007-07-14T01:00:00Z UDate=2025-04-17 VersionDate=None VersionDay=None VersionMonth=None VersionName=None VersionYear=None VlizCoreFlag=1 Westernmost_Easting=11.33
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Resources for GDR data submitters and curators, including training videos, step-by-step guides on data submission, and detailed documentation of the GDR. The Data Management and Submission Best Practices document also contains API access and metadata schema information for developers interested in harvesting GDR metadata for federation or inclusion in their local catalogs.
Facebook
TwitterIn 2005, the International Ocean Colour Coordinating Group (IOCCG) convened a working group to examine the state of the art in ocean colour data merging, which showed that the research techniques had matured sufficiently for creating long multi-sensor datasets (IOCCG, 2007). As a result, ESA initiated and funded the DUE GlobColour project (http://www.globcolour.info/) to develop a satellite based ocean colour data set to support global carbon-cycle research. It aims to satisfy the scientific requirement for a long (10+ year) time-series of consistently calibrated global ocean colour information with the best possible spatial coverage. This has been achieved by merging data from the three most capable sensors: SeaWiFS on GeoEye's Orbview-2 mission, MODIS on NASA's Aqua mission and MERIS on ESA's ENVISAT mission. In setting up the GlobColour project, three user organisations were invited to help. Their roles are to specify the detailed user requirements, act as a channel to the broader end user community and to provide feedback and assessment of the results. The International Ocean Carbon Coordination Project (IOCCP) based at UNESCO in Paris provides direct access to the carbon cycle modelling community's requirements and to the modellers themselves who will use the final products. The UK Met Office's National Centre for Ocean Forecasting (NCOF) in Exeter, UK, provides an understanding of the requirements of oceanography users, and the IOCCG bring their understanding of the global user needs and valuable advice on best practice within the ocean colour science community. The three year project kicked-off in November 2005 under the leadership of ACRI-ST (France). The first year was a feasibility demonstration phase that was successfully concluded at a user consultation workshop organised by the Laboratoire d'Océanographie de Villefranche, France, in December 2006. Error statistics and inter-sensor biases were quantified by comparison with insitu measurements from moored optical buoys and ship based campaigns, and used as an input to the merging. The second year was dedicated to the production of the time series. In total, more than 25 Tb of input (level 2) data have been ingested and 14 Tb of intermediate and output products created, with 4 Tb of data distributed to the user community. Quality control (QC) is provided through the Diagnostic Data Sets (DDS), which are extracted sub-areas covering locations of in-situ data collection or interesting oceanographic phenomena. This Full Product Set (FPS) covers global daily merged ocean colour products in the time period 1997-2006 and is also freely available for use by the worldwide science community at http://www.globcolour.info/data_access_full_prod_set.html. The GlobColour service distributes global daily, 8-day and monthly data sets at 4.6 km resolution for, chlorophyll-a concentration, normalised water-leaving radiances (412, 443, 490, 510, 531, 555 and 620 nm, 670, 681 and 709 nm), diffuse attenuation coefficient, coloured dissolved and detrital organic materials, total suspended matter or particulate backscattering coefficient, turbidity index, cloud fraction and quality indicators. Error statistics from the initial sensor characterisation are used as an input to the merging methods and propagate through the merging process to provide error estimates for the output merged products. These error estimates are a key component of GlobColour as they are invaluable to the users; particularly the modellers who need them in order to assimilate the ocean colour data into ocean simulations. An intensive phase of validation has been undertaken to assess the quality of the data set. In addition, inter-comparisons between the different merged datasets will help in further refining the techniques used. Both the final products and the quality assessment were presented at a second user consultation in Oslo on 20-22 November 2007 organised by the Norwegian Institute for Water Research (NIVA); presentations are available on the GlobColour WWW site. On request of the ESA Technical Officer for the GlobColour project, the FPS data set was mirrored in the PANGAEA data library.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Monarch Butterfly Classification project is an advanced deep learning model designed to classify images of Monarch butterflies. With its cutting-edge technology and high accuracy, this model enables accurate identification and categorization of Monarch butterflies, aiding in research, conservation efforts, and educational initiatives.
Accurate Classification: The Monarch Butterfly Classification model utilizes state-of-the-art deep learning algorithms to accurately classify images of Monarch butterflies.
Versatile Use Cases: This powerful model has diverse applications, ranging from scientific research and conservation efforts to citizen science projects and environmental education programs.
Easy Integration: The Monarch Butterfly Classification model can be seamlessly integrated into existing platforms, apps, or websites, making it accessible to many users and enabling them to contribute effortlessly to butterfly monitoring.
User-Friendly Interface: We provide a user-friendly interface/API that allows users to easily interact with the model, upload images, and obtain instant classification results.
To get started with the Monarch Butterfly Classification project, follow these simple steps:
requirements.txt file.For detailed documentation and tutorials on using Roboflow and the Monarch Butterfly Classification model, please refer to docs.roboflow.com
We welcome contributions from the open-source community to enhance the Monarch Butterfly Classification project. If you're interested in contributing, please follow the guidelines outlined in [CONTRIBUTING.md] and submit your pull requests.
This project is licensed under the [Roboflow License]. For more information, see the [LICENSE] file provided by Roboflow.
For any questions, suggestions, or collaborations, please reach out to us at savetheworld at 150left.com
Congratulations if you have made it this far! 🥳
🎁🎁🎁 10 suggestions for trying out the Monarch Butterfly Classification model and contributing to its success:
"Unveil the captivating world of Monarch butterflies with our powerful classification model. Join us in exploring their beauty and contributing to important research and conservation efforts."
"Take a leap into the realm of Monarch butterflies with our cutting-edge classification model. Let's work together to protect these magnificent creatures and their habitats."
"Calling all nature enthusiasts and citizen scientists! Embrace the Monarch Butterfly Classification model and make a real difference in our understanding of these delicate pollinators."
"Unlock the wonders of Monarch butterflies with our state-of-the-art classification model. Join us in unraveling their secrets and advocating for their conservation."
"Become a Monarch detective! Empower yourself with our classification model and contribute to the preservation of these iconic butterflies. Together, we can protect their future."
"Join the Monarch Butterfly Classification community and contribute to the world of scientific research. Help us understand and safeguard these remarkable creatures for generations to come."
"Immerse yourself in the world of Monarch butterflies and experience the joy of accurate classification. Let's come together to protect these majestic pollinators and ensure their survival."
"Make a lasting impact on butterfly conservation by using our Monarch Butterfly Classification model. Every classification counts in our mission to preserve these awe-inspiring creatures."
"Inspire others to appreciate the beauty of Monarch butterflies. Share your findings with our classification model and play a vital role in raising awareness and fostering conservation efforts."
"Step into the realm of Monarch butterflies and contribute to groundbreaking research. Try our classification model and join us in safeguarding these enchanting creatures and their habitats."
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A Home for Everyone is the City of Boise’s (city) initiative to address needs in the community by supporting the development and preservation of housing affordable to residents on Boise budgets. A Home for Everyone has three core goals: produce new homes affordable at 60% of area median income, create permanent supportive housing for households experiencing homelessness, and preserve home affordable at 80% of area median income. This dataset includes information about all homes that count toward the city’s Home for Everyone goals.
While the “produce affordable housing” and “create permanent supportive housing” goals are focused on supporting the development of new housing, the preservation goal is focused on maintaining existing housing affordable. As a result, many of the data fields related to new development are not relevant to preservation projects. For example, zoning incentives are only applicable to new construction projects.
Data may be unavailable for some projects and details are subject to change until construction is complete. Addresses are excluded for projects with fewer than five homes for privacy reasons.
The dataset includes details on the number of “homes”. We use the word "home" to refer to any single unit of housing regardless of size, type, or whether it is rented or owned. For example, a building with 40 apartments counts as 40 homes, and a single detached house counts as one home.
The dataset includes details about the phase of each project when a project involves constructing new housing. The process for building a new development is as follows: First, one must receive approval from the city’s Planning Division, which is also known as being “entitled.” Next, one must apply for and receive a permit from the city’s Building Division before beginning construction. Finally, once construction is complete and all city inspections have been passed, the building can be occupied.
To contribute to a city goal, homes must meet affordability requirements based on a standard called area median income. The city considers housing affordable if is targeted to households earning at or below 80% of the area median income. For a three-person household in Boise, that equates to an annual income of $60,650 and monthly housing cost of $1,516. Deeply affordable housing sets the income limit at 60% of area median income, or even 30% of area median income. See Boise Income Guidelines for more details.Project Name – The name of each project. If a row is related to the Home Improvement Loan program, that row aggregates data for all homes that received a loan in that quarter or year. Primary Address – The primary address for the development. Some developments encompass multiple addresses.Project Address(es) – Includes all addresses that are included as part of the development project.Parcel Number(s) – The identification code for all parcels of land included in the development.Acreage – The number of acres for the parcel(s) included in the project.Planning Permit Number – The identification code for all permits the development has received from the Planning Division for the City of Boise. The number and types of permits required vary based on the location and type of development.Date Entitled – The date a development was approved by the City’s Planning Division.Building Permit Number – The identification code for all permits the development has received from the city’s Building Division.Date Building Permit Issued – Building permits are required to begin construction on a development.Date Final Certificate of Occupancy Issued – A certificate of occupancy is the final approval by the city for a development, once construction is complete. Not all developments require a certificate of occupancy.Studio – The number of homes in the development that are classified as a studio. A studio is typically defined as a home in which there is no separate bedroom. A single room serves as both a bedroom and a living room.1-Bedroom – The number of homes in a development that have exactly one bedroom.2-Bedroom – The number of homes in a development that have exactly two bedrooms.3-Bedroom – The number of homes in a development that have exactly three bedrooms.4+ Bedroom – The number of homes in a development that have four or more bedrooms.# of Total Project Units – The total number of homes in the development.# of units toward goals – The number of homes in a development that contribute to either the city’s goal to produce housing affordable at or under 60% of area median income, or the city’s goal to create permanent supportive housing for households experiencing homelessness. Rent at or under 60% AMI - The number of homes in a development that are required to be rented at or below 60% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details. Boise defines a home as “affordable” if it is rented or sold at or below 80% of area median income.Rent 61-80% AMI – The number of homes in a development that are required to be rented at between 61% and 80% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details. Boise defines a home as “affordable” if it is rented or sold at or below 80% of area median income.Rent 81-120% AMI - The number of homes in a development that are required to be rented at between 81% and 120% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details.Own at or under 60% AMI - The number of homes in a development that are required to be sold at or below 60% of area median income. See the description of the dataset above for an explanation of area median income or see Boise Income Guidelines for more details. Boise defines a home as “affordable” if it is rented or sold at or below 80% of area median income.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Problem description
Pizza
The pizza is represented as a rectangular, 2-dimensional grid of R rows and C columns. The cells within the grid are referenced using a pair of 0-based coordinates [r, c] , denoting respectively the row and the column of the cell.
Each cell of the pizza contains either:
mushroom, represented in the input file as M
tomato, represented in the input file as T
Slice
A slice of pizza is a rectangular section of the pizza delimited by two rows and two columns, without holes. The slices we want to cut out must contain at least L cells of each ingredient (that is, at least L cells of mushroom and at least L cells of tomato) and at most H cells of any kind in total - surprising as it is, there is such a thing as too much pizza in one slice. The slices being cut out cannot overlap. The slices being cut do not need to cover the entire pizza.
Goal
The goal is to cut correct slices out of the pizza maximizing the total number of cells in all slices. Input data set The input data is provided as a data set file - a plain text file containing exclusively ASCII characters with lines terminated with a single ‘ ’ character at the end of each line (UNIX- style line endings).
File format
The file consists of:
one line containing the following natural numbers separated by single spaces:
R (1 ≤ R ≤ 1000) is the number of rows
C (1 ≤ C ≤ 1000) is the number of columns
L (1 ≤ L ≤ 1000) is the minimum number of each ingredient cells in a slice
H (1 ≤ H ≤ 1000) is the maximum total number of cells of a slice
Google 2017, All rights reserved.
R lines describing the rows of the pizza (one after another). Each of these lines contains C characters describing the ingredients in the cells of the row (one cell after another). Each character is either ‘M’ (for mushroom) or ‘T’ (for tomato).
Example
3 5 1 6
TTTTT
TMMMT
TTTTT
3 rows, 5 columns, min 1 of each ingredient per slice, max 6 cells per slice
Example input file.
Submissions
File format
The file must consist of:
one line containing a single natural number S (0 ≤ S ≤ R × C) , representing the total number of slices to be cut,
U lines describing the slices. Each of these lines must contain the following natural numbers separated by single spaces:
r 1 , c 1 , r 2 , c 2 describe a slice of pizza delimited by the rows r (0 ≤ r1,r2 < R, 0 ≤ c1, c2 < C) 1 and r 2 and the columns c 1 and c 2 , including the cells of the delimiting rows and columns. The rows ( r 1 and r 2 ) can be given in any order. The columns ( c 1 and c 2 ) can be given in any order too.
Example
0 0 2 1
0 2 2 2
0 3 2 4
3 slices.
First slice between rows (0,2) and columns (0,1).
Second slice between rows (0,2) and columns (2,2).
Third slice between rows (0,2) and columns (3,4).
Example submission file.
© Google 2017, All rights reserved.
Slices described in the example submission file marked in green, orange and purple. Validation
For the solution to be accepted:
the format of the file must match the description above,
each cell of the pizza must be included in at most one slice,
each slice must contain at least L cells of mushroom,
each slice must contain at least L cells of tomato,
total area of each slice must be at most H
Scoring
The submission gets a score equal to the total number of cells in all slices. Note that there are multiple data sets representing separate instances of the problem. The final score for your team is the sum of your best scores on the individual data sets. Scoring example
The example submission file given above cuts the slices of 6, 3 and 6 cells, earning 6 + 3 + 6 = 15 points.
Facebook
Twitter
According to our latest research, the global scientific data management systems (SDMS) market size reached USD 4.3 billion in 2024, reflecting robust adoption across multiple scientific disciplines and industries. The market is projected to grow at a compound annual growth rate (CAGR) of 12.1% from 2025 to 2033, reaching an estimated USD 12.1 billion by 2033. This remarkable growth trajectory is primarily driven by the increasing complexity and volume of scientific data, the growing demand for integrated data management solutions, and the critical need for compliance with regulatory standards in research-intensive sectors such as pharmaceuticals, biotechnology, and healthcare.
The expanding volume of scientific data, generated by advanced research methodologies and sophisticated laboratory instruments, is a primary growth driver for the scientific data management systems market. Organizations across life sciences, environmental sciences, and healthcare are generating and acquiring massive datasets, often in disparate formats and from multiple sources. This data deluge necessitates robust SDMS platforms that can efficiently capture, store, organize, and retrieve data, ensuring data integrity and facilitating seamless collaboration among research teams. Furthermore, the integration of artificial intelligence and machine learning capabilities into SDMS solutions is enhancing the ability to analyze complex datasets, extract actionable insights, and accelerate scientific discoveries, further fueling market expansion.
Another significant growth factor for the scientific data management systems market is the stringent regulatory landscape governing scientific research and data management. Regulatory bodies such as the FDA, EMA, and other international agencies mandate rigorous data documentation, traceability, and security protocols, especially in drug development, clinical trials, and genomics research. SDMS platforms play a pivotal role in ensuring compliance with these regulations by providing audit trails, electronic signatures, and secure data storage. The increasing focus on data privacy, reproducibility of research, and adherence to Good Laboratory Practice (GLP) and Good Clinical Practice (GCP) guidelines are compelling organizations to invest in advanced SDMS solutions to mitigate compliance risks and maintain competitive advantage.
The growing adoption of cloud-based and hybrid deployment models is further propelling the scientific data management systems market. Cloud-based SDMS solutions offer scalability, flexibility, and cost-effectiveness, enabling organizations to manage large volumes of data without the need for significant infrastructure investments. Hybrid models, which combine on-premises and cloud capabilities, are gaining traction among organizations seeking to balance data security with operational efficiency. The increasing digital transformation initiatives across the scientific community, coupled with the rising trend of collaborative research, are creating a fertile environment for SDMS vendors to innovate and expand their offerings, driving sustained market growth over the forecast period.
From a regional perspective, North America currently dominates the scientific data management systems market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of leading pharmaceutical and biotechnology companies, well-established research infrastructure, and a strong emphasis on regulatory compliance. Europe follows closely, driven by significant investments in life sciences research and increasing adoption of digital technologies in academic and clinical settings. The Asia Pacific region is emerging as a high-growth market, supported by expanding research activities, government initiatives to modernize healthcare infrastructure, and growing collaborations between academic institutions and industry players. These regional dynamics underscore the global nature of the SDMS market and highlight the diverse opportunities for stakeholders across different geographies.
In the realm of life sciences, the Life Sciences Controlled Substance Ordering System is becoming an integral component for organizations dealing with regulated substances. This system is designed to streamline the ordering process of controll
Facebook
Twitterhttps://www.nist.gov/open/licensehttps://www.nist.gov/open/license
The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants. This dataset includes the fastq files provided to participants, the submitted variant callset as vcfs, and the benchmarking results, along with challenge submission metadata.
Facebook
TwitterData from influenza A virus (IAV) infected ferrets (Mustela putorius furo) provides invaluable information towards the study of novel and emerging viruses that pose a threat to human health. This gold standard animal model can recapitulate many clinical signs of infection present in IAV-infected humans, supports virus replication of human and zoonotic strains without prior adaptation, and permits evaluation of virus transmissibility by multiple modes. While ferrets have been employed in risk assessment settings for >20 years, results from this work are typically reported in discrete stand-alone publications, making aggregation of raw data from this work over time nearly impossible. Here, we describe a dataset of 333 ferrets inoculated with 107 unique IAV, conducted by a single research group (NCIRD/ID/IPB/Pathogenesis Laboratory Team) under a uniform experimental protocol. This collection of ferret tissue viral titer data on a per-individual ferret level represents a companion dataset to ‘An aggregated dataset of serially collected influenza A virus morbidity and titer measurements from virus-infected ferrets’. However, care must be taken when combining datasets at the level of individual animals (see PMID 40245007 for guidance in best practices for comparing datasets comprised of serially-collected and fixed-timepoint in vivo-generated data). See publications using and describing data for more information: Kieran TJ, Sun X, Tumpey TM, Maines TR, Belser JA. 202X. Spatial variation of infectious virus load in aggregated day 3 post-inoculation respiratory tract tissues from influenza A virus-infected ferrets. Under peer review. Kieran TJ, Sun X, Maines TR, Belser JA. 2025. Predictive models of influenza A virus lethal disease: insights from ferret respiratory tract and brain tissues. Scientific Reports, in press. Bullock TA, Pappas C, Uyeki TM, Brock N, Kieran TJ, Olsen SJ, Davis CD, Tumpey TM, Maines TR, Belser JA. 2025. The (digestive) path less traveled: influenza A virus and the gastrointestinal tract. mBio, in press. Kieran TJ, Sun X, Maines TR, Beauchemin CAA, Belser JA. 2024. Exploring associations between viral titer measurements and disease outcomes in ferrets inoculated with 125 contemporary influenza A viruses. J Virol98: e01661-23. https://doi.org/10.1038/s41597-024-03256-6 Related dataset: Kieran TJ, Sun X, Creager HM, Tumpey TM, Maine TR, Belser JA. 2025. An aggregated dataset of serial morbidity and titer measurements from influenza A virus-infected ferrets. Sci Data, 11(1):510. https://doi.org/10.1038/s41597-024-03256-6 https://data.cdc.gov/National-Center-for-Immunization-and-Respiratory-D/An-aggregated-dataset-of-serially-collected-influe/cr56-k9wj/about_data Other relevant publications for best practices on data handling and interpretation: Kieran TJ, Maines TR, Belser JA. 2025. Eleven quick tips to unlock the power of in vivo data science. PLoS Comput Biol, 21(4):e1012947. https://doi.org/10.1371/journal.pcbi.1012947 Kieran TJ, Maines TR, Belser JA. 2025. Data alchemy, from lab to insight: Transforming in vivo experiments into data science gold. PLoS Pathog, 20(8):e1012460. https://doi.org/10.1371/journal.ppat.1012460