Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 2.07(USD Billion) |
MARKET SIZE 2024 | 2.17(USD Billion) |
MARKET SIZE 2032 | 3.2(USD Billion) |
SEGMENTS COVERED | Deployment Type ,Organization Size ,Industry Vertical ,Data Type ,Analysis Type ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | Cloud Deployment Machine Learning Integration Big Data Analytics Predictive Analytics Prescriptive Analytics |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | KNIME ,DAX Analytics ,Minitab ,Alteryx ,MVSP ,XLSTAT ,RapidMiner ,Statistica ,IBM ,TIBCO Software ,SPSS ,SAS Institute ,Oracle ,JMP |
MARKET FORECAST PERIOD | 2025 - 2032 |
KEY MARKET OPPORTUNITIES | Healthcare analytics Financial risk assessment Customer segmentation Fraud detection Anomaly detection |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 4.99% (2025 - 2032) |
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
In 2023, the global qualitative data analysis software market size was valued at approximately USD 1.2 billion. With an impressive compound annual growth rate (CAGR) of 15%, the market is projected to reach USD 3.3 billion by 2032. This growth is driven by an increasing demand for data-driven decision-making processes across various industries, as well as advancements in artificial intelligence and machine learning technologies that are enhancing the capabilities of qualitative data analysis tools. Organizations are increasingly recognizing the value of qualitative insights, which complement quantitative data by providing deeper, context-rich understanding of phenomena, which is a significant growth factor in this market.
The demand for qualitative data analysis software is expanding due to the growing need for holistic research methods that incorporate diverse data types. In academic research, qualitative data analysis plays a critical role in understanding complex social phenomena by analyzing text, audio, video, and images. The rise of interdisciplinary studies that demand robust qualitative analysis solutions is propelling software adoption. Additionally, the business and enterprise sector has increasingly leveraged these tools to extract consumer insights from unstructured data sources like social media, reviews, and customer feedback. These insights are crucial for developing marketing strategies and enhancing customer engagement, thus driving market growth.
Healthcare is another sector significantly contributing to the market's expansion. Qualitative data analysis is crucial for understanding patient narratives and improving patient-centered care models. With the shift towards personalized medicine, healthcare providers are utilizing qualitative insights to better comprehend patient experiences and treatment outcomes. Moreover, the integration of qualitative data analysis tools with other healthcare systems is enhancing clinical research and operational efficiency. The continuous development in healthcare analytics and the increasing volume of healthcare data are expected to further boost demand in this sector.
Government and public sector organizations are also adopting qualitative data analysis software to improve policy formulation and public services. By analyzing feedback from citizens and stakeholders, governments can make informed decisions that address public needs more effectively. The growing emphasis on transparency and accountability in governance is driving the adoption of these tools. Additionally, the ongoing digital transformation across public sectors globally is facilitating the integration of advanced data analysis tools in government operations, thus contributing to the market's growth.
Regionally, North America dominates the market due to its advanced technological infrastructure and high adoption rate of data-driven decision-making processes across various sectors. Europe follows, with a strong presence of academic research institutions and enterprises investing in qualitative data analysis tools. The Asia Pacific region is expected to witness the fastest growth, driven by rapid digitalization and increasing research activities in countries like China, India, and Japan. Latin America and the Middle East & Africa regions are also beginning to explore the potential of qualitative data analysis, although they currently constitute a smaller portion of the market.
The qualitative data analysis software market is segmented by component into software and services. The software segment is the backbone of the market, offering a variety of tools that allow users to code, categorize, and analyze qualitative data. The demand for sophisticated software solutions is rising as organizations seek tools that offer enhanced features such as data visualization, collaboration capabilities, and integration with other data sources. The push towards comprehensive data analysis platforms that can manage large datasets and provide intuitive interfaces is driving innovation in software development. Furthermore, the integration of artificial intelligence into these software solutions is significantly enhancing their capabilities, making them more efficient and reducing the time required for data analysis.
In contrast, the services segment encompasses a range of offerings including consulting, implementation, training, and support services. As organizations increasingly adopt sophisticated qualitative data analysis tools, there is a growing need for professional services to ensure
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The primary data types collected.
This is the third lab in an Introductory Physical Geography/Environmental Studies course. It introduces students to different data types (qualitative vs quantitative), basic statistical analyses (correlation analysis s, t-test), and graphing techniques.
The FDA Device Dataset by Dataplex provides comprehensive access to over 24 million rows of detailed information, covering 9 key data types essential for anyone involved in the medical device industry. Sourced directly from the U.S. Food and Drug Administration (FDA), this dataset is a critical resource for regulatory compliance, market analysis, and product safety assessment regarding.
Dataset Overview:
This dataset includes data on medical device registrations, approvals, recalls, and adverse events, among other crucial aspects. The dataset is meticulously cleaned and structured to ensure that it meets the needs of researchers, regulatory professionals, and market analysts.
24 Million Rows of Data:
With over 24 million rows, this dataset offers an extensive view of the regulatory landscape for medical devices. It includes data types such as classification, event, enforcement, 510k, registration listings, recall, PMA, UDI, and covid19 serology. This wide range of data types allows users to perform granular analysis on a broad spectrum of device-related topics.
Sourced from the FDA:
All data in this dataset is sourced directly from the FDA, ensuring that it is accurate, up-to-date, and reliable. Regular updates ensure that the dataset remains current, reflecting the latest in device approvals, clearances, and safety reports.
Key Features:
Comprehensive Coverage: Includes 9 key device data types, such as 510(k) clearances, premarket approvals, device classifications, and adverse event reports.
Regulatory Compliance: Provides detailed information necessary for tracking compliance with FDA regulations, including device recalls and enforcement actions.
Market Analysis: Analysts can utilize the dataset to assess market trends, monitor competitor activities, and track the introduction of new devices.
Product Safety Analysis: Researchers can analyze adverse event reports and device recalls to evaluate the safety and performance of medical devices.
Use Cases: - Regulatory Compliance: Ensure your devices meet FDA standards, monitor compliance trends, and stay informed about regulatory changes.
Market Research: Identify trends in the medical device market, track new device approvals, and analyze competitive landscapes with up-to-date and historical data.
Product Safety: Assess the safety and performance of medical devices by examining detailed adverse event reports and recall data.
Data Quality and Reliability:
The FDA Device Dataset prioritizes data quality and reliability. Each record is meticulously sourced from the FDA's official databases, ensuring that the information is both accurate and up-to-date. This makes the dataset a trusted resource for critical applications, where data accuracy is vital.
Integration and Usability:
The dataset is provided in CSV format, making it compatible with most data analysis tools and platforms. Users can easily import, analyze, and utilize the data for various applications, from regulatory reporting to market analysis.
User-Friendly Structure and Metadata:
The data is organized for easy navigation, with clear metadata files included to help users identify relevant records. The dataset is structured by device type, approval and clearance processes, and adverse event reports, allowing for efficient data retrieval and analysis.
Ideal For:
Regulatory Professionals: Monitor FDA compliance, track regulatory changes, and prepare for audits with comprehensive and up-to-date product data.
Market Analysts: Conduct detailed research on market trends, assess new device entries, and analyze competitive dynamics with extensive FDA data.
Healthcare Researchers: Evaluate the safety and efficacy of medical devices product data, identify potential risks, and contribute to improved patient outcomes through detailed analysis.
This dataset is an indispensable resource for anyone involved in the medical device industry, providing the data and insights necessary to drive informed decisions and ensure compliance with FDA regulations.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
With the recent attention and focus on quantitative methods for species delimitation, an overlooked but equally important issue regards what has actually been delimited. This study investigates the apparent arbitrariness of some taxonomic distinctions, and in particular how species and subspecies are assigned. Specifically, we use a recently developed Bayesian model-based approach to show that in the Hercules beetles (genus Dynastes) there is no statistical difference in the probability that putative taxa represent different species, irrespective of whether they were given species or subspecies designations. By considering multiple data types, as opposed to relying exclusively on genetic data alone, we also show that both previously recognized species and subspecies represent a variety of points along the speciation spectrum (i.e., previously recognized species are not systematically further along the continuum than subspecies). For example, based on evolutionary models of divergence, some taxa are statistically distinguishable on more than one axis of differentiation (e.g., along both phenotypic and genetic dimensions), whereas other taxa can only be delimited statistically from a single data type. Because both phenotypic and genetic data are analyzed in a common Bayesian framework, our study provides a framework for investigating whether disagreements in species boundaries among data types reflect (i) actual discordance with the actual history of lineage splitting, or instead (ii) differences among data types in the amount of time required for differentiation to become apparent among the delimited taxa. We discuss what the answers to these questions imply about what characters are used to delimit species, as well as the diverse processes involved in the origin and maintenance of species boundaries. With this in mind, we then reflect more generally on how quantitative methods for species delimitation are used to assign taxonomic status.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data supporting the Springer Nature Data Availability Statement (DAS) analysis in the State of Open Data 2024. SOOD_2024_special_analysis_DAS_SN.xlsx contains the DAS, DOI, publication date, DAS categories and related country by Insitution of any author.SOOD 2024_DAS_analysis_sharing.xlsx contains the summary data by country and data sharing type.Utilizing the Dimensions database, we identified articles containing key DAS identifiers such as “Data Availability Statement” or “Availability of Data and Materials” within their full text. Digital Object Identifiers (DOIs) of these articles were collected and matched against Springer Nature’s XML database to extract the DAS for each article. The extracted DAS were categorized into specific sharing types using text and data matching terms. For statements indicating that data are publicly available in a repository, we matched against a predefined list of repository identifiers, names, and URLs. The DAS were classified into the following categories:1. Data are available from the author on request. 2. Data are included in the manuscript or its supplementary material. 3. Some or all of the data are publicly available, for example in a repository.4. Figure source data are included with the manuscript. 5. Data availability is not applicable.6. Data are declared as not available by the author.7. Data available online but not in a repository.These categories are non-exclusive: more than one can apply to any one article. Publications outside the 2019–2023 range and non-article publication types (e.g., book chapters) that were initially included in the Dimensions search results were excluded from the final dataset. Articles were included in the final analysis after applying the exclusion criteria. Upon processing, it was found that only 370 results were returned for Botswana across the five-year period; due to this low number, Botswana was not included in the DAS focused country-level analysis. This analysis does not assess the accuracy of the DAS in the context of each individual article. There was no manual verification of the categories applied; as a result, terms used out of context could have led to misclassification. Approximately 5% of articles remained unclassified following text and data matching due to these limitations.
Data is included for two types of field surveys conducted for freshwater mussels in the mainstem of the middle and upper Delaware River in the Mid-Atlantic region of the United States from 2000-2002. Timed search (qualitative) surveys were conducted during 2000-2001 from a point at the confluence of the East and West Branches of the Delaware River near Hancock, NY continuously downstream to a point at the mouth of the Paulins Kill River near Columbia, NJ. In this qualitative survey, mussel species and counts were collected in the field catch-per-unit-effort (CPUE) data was determined for all mussel species within each of 1,095 consecutive stream sections ~200 m in length. Subsequent quantitative surveys were conducted in select 200-m sections of river using quadrats during 2002 in order to estimate abundance and density of mussel present in these sections. One Excel file contains data from qualitative surveys, and a second excel file contains data from quantitative quadrat surveys.
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 0.06(USD Billion) |
MARKET SIZE 2024 | 0.08(USD Billion) |
MARKET SIZE 2032 | 0.34(USD Billion) |
SEGMENTS COVERED | Modality ,Animal Type ,Application ,Automated Features ,Output Type ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | Rising pet ownership Technological advancements Increasing focus on animal welfare Growing demand for remote monitoring Veterinary industry expansion |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | eScription ,OptiTrack ,VICON ,BTS Bioengineering ,Genovation ,ZEBRIS ,Xsens ,SMART ,Gait Up ,Qualisys ,Motion Analysis Corporation ,Noraxon ,IMV imaging ,Phoenix Controls ,VASG |
MARKET FORECAST PERIOD | 2025 - 2032 |
KEY MARKET OPPORTUNITIES | Veterinary diagnostics Precision animal farming Animal health monitoring Livestock management Disease prevention |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 20.53% (2025 - 2032) |
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types).
This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.
Some background information for these four wilderness areas: Neota (area 2) probably has the highest mean elevational value of the 4 wilderness areas. Rawah (area 1) and Comanche Peak (area 3) would have a lower mean elevational value, while Cache la Poudre (area 4) would have the lowest mean elevational value.
As for primary major tree species in these areas, Neota would have spruce/fir (type 1), while Rawah and Comanche Peak would probably have lodgepole pine (type 2) as their primary species, followed by spruce/fir and aspen (type 5). Cache la Poudre would tend to have Ponderosa pine (type 3), Douglas-fir (type 6), and cottonwood/willow (type 4).
The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc.) Cache la Poudre would probably be more unique than the others, due to its relatively low elevation range and species composition.
Given is the attribute name, attribute type, the measurement unit and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database.
Name / Data Type / Measurement / Description
Elevation / quantitative /meters / Elevation in meters Aspect / quantitative / azimuth / Aspect in degrees azimuth Slope / quantitative / degrees / Slope in degrees Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to nearest surface water features Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to nearest surface water features Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to nearest roadway Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am, summer solstice Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon, summer soltice Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm, summer solstice Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to nearest wildfire ignition points Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1 (presence) / Wilderness area designation Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) / Soil Type designation Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation
Class Labels
Spruce/Fir, Lodgepole Pine, Ponderosa Pine, Cottonwood/Willow, Aspen, Douglas-fir, Krummholz
This study examines various dimensions of primary health care delivery in Uganda, using a baseline survey of public and private dispensaries, the most common lower level health facilities in the country.
The survey was designed and implemented by the World Bank in collaboration with the Makerere Institute for Social Research and the Ugandan Ministries of Health and of Finance, Planning and Economic Development. It was carried out in October - December 2000 and covered 155 local health facilities and seven district administrations in ten districts. In addition, 1617 patients exiting health facilities were interviewed. Three types of dispensaries (both with and without maternity units) were included: those run by the government, by private for-profit providers, and by private nonprofit providers, mainly religious.
This research is a Quantitative Service Delivery Survey (QSDS). It collected microlevel data on service provision and analyzed health service delivery from a public expenditure perspective with a view to informing expenditure and budget decision-making, as well as sector policy.
Objectives of the study included:
1) Measuring and explaining the variation in cost-efficiency across health units in Uganda, with a focus on the flow and use of resources at the facility level;
2) Diagnosing problems with facility performance, including the extent of drug leakage, as well as staff performance and availability;
3) Providing information on pricing and user fee policies and assessing the types of service actually provided;
4) Shedding light on the quality of service across the three categories of service provider - government, for-profit, and nonprofit;
5) Examining the patterns of remuneration, pay structure, and oversight and monitoring and their effects on health unit performance;
6) Assessing the private-public partnership, particularly the program of financial aid to nonprofits.
The study districts were Mpigi, Mukono, and Masaka in the central region; Mbale, Iganga, and Soroti in the east; Arua and Apac in the north; and Mbarara and Bushenyi in the west.
The survey covered government, for-profit and nonprofit private dispensaries with or without maternity units in ten Ugandan districts.
Sample survey data [ssd]
The survey covered government, for-profit and nonprofit private dispensaries with or without maternity units in ten Ugandan districts.
The sample design was governed by three principles. First, to ensure a degree of homogeneity across sampled facilities, attention was restricted to dispensaries, with and without maternity units (that is, to the health center III level). Second, subject to security constraints, the sample was intended to capture regional differences. Finally, the sample had to include facilities in the main ownership categories: government, private for-profit, and private nonprofit (religious organizations and NGOs). The sample of government and nonprofit facilities was based on the Ministry of Health facility register for 1999. Since no nationwide census of for-profit facilities was available, these facilities were chosen by asking sampled government facilities to identify the closest private dispensary.
Of the 155 health facilities surveyed, 81 were government facilities, 30 were private for-profit facilities, and 44 were nonprofit facilities. An exit poll of clients covered 1,617 individuals.
The final sample consisted of 155 primary health care facilities drawn from ten districts in the central, eastern, northern, and western regions of the country. It included government, private for-profit, and private nonprofit facilities. The nonprofit sector includes facilities owned and operated by religious organizations and NGOs. Approximately one third of the surveyed facilities were dispensaries without maternity units; the rest provided maternity care. The facilities varied considerably in size, from units run by a single individual to facilities with as many as 19 staff members.
Ministry of Health facility register for 1999 was used to design the sampling frame. Ten districts were randomly selected. From the selected districts, a sample of government and private nonprofit facilities and a reserve list of replacement facilities were randomly drawn. Because of the unreliability of the register for private for-profit facilities, it was decided that for-profit facilities would be identified on the basis of information from the government facilities sampled. The administrative records for facilities in the original sample were first reviewed at the district headquarters, where some facilities that did not meet selection criteria and data collection requirements were dropped from the sample. These were replaced by facilities from the reserve list. Overall, 30 facilities were replaced.
The sample was designed in such a way that the proportion of facilities drawn from different regions and ownership categories broadly mirrors that of the universe of facilities. Because no nationwide census of for-profit health facilities is available, it is difficult to assess the extent to which the sample is representative of this category. A census of health care facilities in selected districts, carried out in the context of the Delivery of Improved Services for Health (DISH) project supported by the U.S. Agency for International Development (USAID), suggests that about 63 percent of all facilities operate on a for-profit basis, while government and nonprofit providers run 26 and 11 percent of facilities, respectively. This would suggest an undersampling of private providers in the survey. It is not clear, however, whether the DISH districts are representative of other districts in Uganda in terms of the market for health care.
For the exit poll, 10 interviews per facility were carried out in approximately 85 percent of the facilities. In the remaining facilities the target of 10 interviews was not met, as a result of low activity levels.
In the first stage in the sampling process, eight districts (out of 45) had to be dropped from the sample frame due to security concerns. These districts were Bundibugyo, Gulu, Kabarole, Kasese, Kibaale, Kitgum, Kotido, and Moroto.
Face-to-face [f2f]
The following survey instruments are available:
The survey collected data at three levels: district administration, health facility, and client. In this way it was possible to capture central elements of the relationships between the provider organization, the frontline facility, and the user. In addition, comparison of data from different levels (triangulation) permitted cross-validation of information.
At the district level, a District Health Team Questionnaire was administered to the district director of health services (DDHS), who was interviewed on the role of the DDHS office in health service delivery. Specifically, the questionnaire collected data on health infrastructure, staff training, support and supervision arrangements, and sources of financing.
The District Facility Data Sheet was used at the district level to collect more detailed information on the sampled health units for fiscal 1999-2000, including data on staffing and the related salary structures, vaccine supplies and immunization activity, and basic and supplementary supplies of drugs to the facilities. In addition, patient data, including monthly returns from facilities on total numbers of outpatients, inpatients, immunizations, and deliveries, were reviewed for the period April-June 2000.
At the facility level, the Uganda Health Facility Survey Questionnaire collected a broad range of information related to the facility and its activities. The questionnaire, which was administered to the in-charge, covered characteristics of the facility (location, type, level, ownership, catchment area, organization, and services); inputs (staff, drugs, vaccines, medical and nonmedical consumables, and capital inputs); outputs (facility utilization and referrals); financing (user charges, cost of services by category, expenditures, and financial and in-kind support); and institutional support (supervision, reporting, performance assessment, and procurement). Each health facility questionnaire was supplemented by a Facility Data Sheet (FDS). The FDS was designed to obtain data from the health unit records on staffing and the related salary structure; daily patient records for fiscal 1999-2000; the type of patients using the facility; vaccinations offered; and drug supply and use at the facility.
Finally, at the facility level, an exit poll was used to interview about 10 patients per facility on the cost of treatment, drugs received, perceived quality of services, and reasons for using that unit instead of alternative sources of health care.
Detailed information about data editing procedures is available in "Data Cleaning Guide for PETS/QSDS Surveys" in external resources.
STATA cleaning do-files and the data quality reports on the datasets can also be found in external resources.
NOAA NEXRAD Quantitative Precipitation Estimation (QPE) Climate Data Record (CDR) is created from the Radar Multi-Radar/Multi-Sensor (MRMS) Reanalysis to produce severe weather and precipitation products for improved decision-making capability to improve severe weather forecasts and warnings, hydrology, aviation, and numerical weather prediction. The data cover a time period from 2002-01-01 to 2011-12-31. NOAA's NEXRAD reanalysis consists of two primary components; (1) Severe weather and radar-reflectivity data generation, (2) Quantitative Precipitation Estimate (including associated precipitation variables and merged rain gauge and radar estimation). This document focuses on the second component of NOAA's NEXRAD reanalysis - the Quantitative Precipitation Estimate (QPE). The primary files generated within this data set are radar-only and radar- gauge (ROQPE, GCQPE, and MOS2D) merged precipitation products as well as ancillary information on precipitation type (PRATE and PFLAG) and radar quality (RQIND). The initial data set covers the time period from January 2002 - December 2011. Radar-only reflectivity, Gauge, Precipitation Flag, and Radar Quality Index for 5-minute data at 1km regular grid over CONUS. Radar only Radar-Gauge Quantitative Precipitation Estimates at hourly scale at 1km regular grid over CONUS. MRMS Quantitative Precipitation Estimation (QPE) uses the most advanced radar technologies and provides high-resolution information about precipitation types and amounts for the nation. The data are stored in netCDF version 4.0 files that include the necessary metadata and supplementary data fields. Data set provides information that can be useful for identification of various types of precipitation, estimation of radar reflectivity, recognition of storm patterns, forecasting technologies for rainfall estimation, and associating different phases of precipitation such as hail freezing rain and snow with radar observations.
As our generation and collection of quantitative digital data increase, so do our ambitions for extracting new insights and knowledge from those data. In recent years, those ambitions have manifested themselves in so-called “Grand Challenge” projects coordinated by academic institutions. These projects are often broadly interdisciplinary and attempt to address to major issues facing the world in the present and the future through the collection and integration of diverse types of scientific data. In general, however, disciplines that focus on the past are underrepresented in this environment – in part because these grand challenges tend to look forward rather than back, and in part because historical disciplines tend to produce qualitative, incomplete data that are difficult to mesh with the more continuous quantitative data sets provided by scientific observation. Yet historical information is essential for our understanding of long-term processes, and should thus be incorporated into our efforts to solve present and future problems. Archaeology, an inherently interdisciplinary field of knowledge that bridges the gap between the quantitative and the qualitative, can act as a connector between the study of the past and data-driven attempts to address the challenges of the future. To do so, however, we must find new ways to integrate the results of archaeological research into the digital platforms used for the modeling and analysis of much bigger data.
Planet Texas 2050 is a grand challenge project recently launched by The University of Texas at Austin. Its central goal is to understand the dynamic interactions between water supply, urbanization, energy use, and ecosystems services in Texas, a state that will be especially affected by climate change and population mobility by the middle of the 21st century. Like many such projects, one of the products of Planet Texas 2050 will be an integrated data platform that will make it possible to model various scenarios and help decision-makers project the results of present policies or trends into the future. Unlike other such projects, however, PT2050 incorporates data collected from past societies, primarily through archaeological inquiry. We are currently designing a data integration and modeling platform that will allow us to bring together quantitative sensor data related to the present environment with “fuzzier” data collected in the course of research in the social sciences and humanities. Digital archaeological data, from LiDAR surveys to genomic information to excavation documentation, will be a central component of this platform. In this paper, I discuss the conceptual integration between scientific “big data” and “medium-sized” archaeological data in PT2050; the process that we are following to catalogue data types, identify domain-specific ontologies, and understand the points of intersection between heterogeneous datasets of varying resolution and precision as we construct the data platform; and how we propose to incorporate digital data from archaeological research into integrated modeling and simulation modules.
Expression profiling of restricted neural populations using microarrays can facilitate neuronal classification and provide insight into the molecular bases of cellular phenotypes. Due to the formidable heterogeneity of intermixed cell types that make up the brain, isolating cell types prior to microarray processing poses steep technical challenges that have been met in various ways. These methodological differences have the potential to distort cell-type-specific gene expression profiles insofar as they may insufficiently filter out contaminating mRNAs or induce aberrant cellular responses not normally present in vivo. Thus we have compared the repeatability, susceptibility to contamination from off-target cell-types, and evidence for stress-responsive gene expression of five different purification methods - Laser Capture Microdissection (LCM), Translating Ribosome Affinity Purification (TRAP), Immunopanning (PAN), Fluorescence Activated Cell Sorting (FACS), and manual sorting of fluorescently labeled cells (Manual). We found that all methods obtained comparably high levels of repeatability, however, data from LCM and TRAP showed significantly higher levels of contamination than the other methods. While PAN samples showed higher activation of apoptosis-related, stress-related and immediate early genes, samples from FACS and Manual studies, which also require dissociated cells, did not. Given that TRAP targets actively translated mRNAs, whereas other methods target all transcribed mRNAs, observed differences may also reflect translational regulation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset systematically collects and classifies published quantitative data on the main classes and categories of clay figurines from 3rd millennium BCE Mesopotamia. It covers sites in modern-day Iraq and Syria situated between the Euphrates and Tigris rivers, for which there is representative data available, i.e. the total number of figurines found at a given site within defined contextual or chronological frameworks. Sites that produced fewer than ten specimens have been excluded.
The dataset consists of four Excel spreadsheets and a separate text file containing references. Spreadsheet AZ includes quantitative data on the two main types of figurines (anthropomorphic and zoomorphic), while spreadsheet ACat includes quantitative data on the four main morphological categories of anthropomorphic figurines (pillar-shaped, standing, cone/cylindrical-shaped, and seated). Spreadsheet ZCat includes quantitative data on the three main morphological categories of zoomorphic figurines (solid, wheeled and hollow), while spreadsheet ZSpec includes quantitative data on the kinds of animals represented among zoomorphic figurines (equids, sheep, goats, cattle, dogs, birds, other quadrupeds and unidentified quadrupeds). However, not all sites provided data on all of the listed figurine classes and categories.
CHRONOLOGY
The chronology of the figurine corpora from North Mesopotamia was established based on the results of the ARCANE project (see Quenet 2011; Finkbeiner 2015; Renette 2019). The applied periodisation is based on regional systems for the Middle Euphrates (Early Middle Euphrates/EME 1–5 periods), Jazirah (Early Jazirah/EJZ 1–5), and Tigris (Early Tigridian/ETG 1–9) regions (see, for example, Lebeau 2018: x). For South Mesopotamia, however, the traditional system is used, which is largely based on historical periodisation and division into the Early Dynastic (ED) I–III, Akkadian, and Ur III periods.
CLASSIFICATION SYSTEM
The classification of the figurines in the dataset as either anthropomorphic or zoomorphic, as well as the identification of the kinds of animals represented among the zoomorphic figurines, follows the identification given in the original publication.
Despite the very general nature of these classification systems, limitations exist. Anthropomorphic figurines attached to the back of zoomorphic ones fall outside this system if they are found complete. If broken, they can easily be misclassified as either zoomorphic or anthropomorphic. In the dataset, they are added to the anthropomorphic figurines if traces of such are recognised, otherwise they remain with the zoomorphic group.
Quadrupeds that cannot be identified at species level (mostly due to their state of preservation) also escape the system, as do figurines of uncertain identification involving two kinds of animal (e.g. sheep and goats) as described in the original publication. In the dataset, they are included in the 'unidentified quadrupeds' category.
The identification of the anthropomorphic and zoomorphic figurine categories follows either the classification provided in the original publication or is based on the published illustrations of the figurines. Spreadsheets providing data on the anthropomorphic and zoomorphic figurine categories do not include figurines of uncertain classification.
Anthropomorphic figurines have been divided into four categories: pillar-shaped, standing, cylindrical/cone-shaped, and seated.
Pillar-shaped figurines are typically baked, with no rendering of the genitalia or legs. While the upper portions of these figurines are characterised by an abundance of detail, the lower portions are almost invariably plain and devoid of decoration (Sakal 2018: 225–226).
Standing figurines have clearly defined legs and are often set on a small, flat bases. These figurines are typically elaborately decorated. A distinct subcategory comprises female figurines stylised into a violin shape (Sakal 2018: 227).
Cylindrical and cone-shaped figurines are usually crude and unsmoothed, with few marked details, and are either unbacked or only lightly fired (Sakal 2018: 225).
Seated figurines either have legs that are lowered below seat level and are designed to be seated on a separately modelled object, or they are integrated with a stool or feature a projection along the back, enabling vertical placement.
Zoomorphic figurines are divided into three major categories. The first category includes solid figurines. The second category comprises wheeled figurines that are adapted for mounting on axles with wheels and, in most cases, a string for pulling them. This category includes solid and hollow figurines/vessels shaped like whole animals, which may have openings in the torso walls and/or spouts in the head.
Specimens in the hollow figurines category are not adapted for mounting on axles and wheels. These are predominantly flutes and rattles.
REFERENCES
Finkbeiner, U. (2015). Stratigraphy. In U. Finkbeiner, M. Novák, F. Sakal, and P. Sconzo (eds), Associated Regional Chronologies for the Ancient Near East and Eastern Mediterranean. Vol. 4, Middle Euphrates (pp. 17–40). Turnhout: Brepols
Lebeau, M. (2018). Foreword. In M. Lebeau (ed.), Associated Regional Chronologies for the Ancient Near East and Eastern Mediterranean, Interregional. Vol. 2, Artefacts (pp. ix–x). Turnhout: Brepols
Quenet, P. (2011). Stratigraphy. In M. Lebeau (ed.), Associated Regional Chronologies for the Ancient Near East and the Eastern Mediterranean. Vol. 1, Jezirah (pp. 19–47). Turnhout: Brepols
Renette, S. (2019). Stratigraphy. In E. Rova (ed.), Associated Regional Chronologies for the Ancient Near East and the Eastern Mediterranean. Vol. 5, Tigridian Region (pp. 21–44). Turnhout: Brepols
Sakal, F. (2018). Anthropomorphic terracotta figurines. In M. Lebeau (ed.), Associated Regional Chronologies for the Ancient Near East and Eastern Mediterranean, Interregional. Vol. 2, Artefacts (pp. 221–243). Turnhout: Brepols
Two types of data sets generated by our project: species inventories and quantitative counts of key organisms. The species inventories are a compilation of data collected by Chela Zabin of the Department of Zoology of the University of Hawaii in 2001 and by Zabin with the assistance of Erin Baumgartner's 9th grade Marine Science class at the Education Laboratory School in 2003, 2004 and 2005, through a National Science Foundation Graduate Teaching Fellowship. Each site was visited only once each year: by 50 students in 2003 and by 25 students in 2004 and 2005.
Attribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
License information was derived automatically
The CLARISSA Cash Plus intervention represented an innovative social protection scheme for tackling social ills, including the worst forms of child labour (WFCL). A universal and unconditional ‘cash plus’ programme, it combined community mobilisation, case work, and cash transfers (CTs). It was implemented in a high-density, low-income neighbourhood in Dhaka to build individual, family, and group capacities to meet needs. This, in turn, was expected to lead to a corresponding decrease in deprivation and community-identified social issues that negatively affect wellbeing, including WFCL. Four principles underpinned the intervention: Unconditionality, Universality, Needs-centred and people-led, and Emergent and open-ended.The intervention took place in Dhaka – North Gojmohol – over a 27-month period, between October 2021 and December 2023, to test and study the impact of providing unconditional and people‑led support to everyone in a community. Cash transfers were provided between January and June 2023 in monthly instalments, plus one investment transfer in September 2023. A total of 1,573 households received cash, through the Upay mobile financial service. Cash was complemented by a ‘plus’ component, implemented between October 2021 and December 2023. Referred to as relational needs-based community organising (NBCO), a team of 20 community mobilisers (CMs) delivered case work at the individual and family level and community mobilisation at the group level. The intervention was part of the wider CLARISSA programme, led by the Institute of Development Studies (IDS) and funded by UK’s Foreign, Commonwealth & Development Office (FCDO). The intervention was implemented by Terre des hommes (Tdh) in Bangladesh and evaluated in collaboration with the BRAC Institute of Governance and Development (BIGD) and researchers from the University of Bath and the Open University, UK.The evaluation of the CLARISSA Social Protection pilot was rooted in contribution analysis that combined multiple methods over more than three years in line with emerging best practice guidelines for mixed methods research on children, work, and wellbeing. Quantitative research included bi-monthly monitoring surveys administered by the project’s community mobilisers (CMs), including basic questions about wellbeing, perceived economic resilience, school attendance, etc. This was complimented by baseline, midline, and endline surveys, which collected information about key outcome indicators within the sphere of influence of the intervention, such as children’s engagement with different forms of work and working conditions, with schooling and other activities, household living conditions and sources of income, and respondents’ perceptions of change. Qualitative tools were used to probe topics and results of interest, as well as impact pathways. These included reflective diaries written by the community mobilisers; three rounds of focus group discussions (FGDs) with community members; three rounds of key informant interviews (KIIs) with members of case study households; and long-term ethnographic observation.Quantitative DataThe quantitative evaluation of the CLARISSA Cash Plus intervention involved several data collection methods to gather information about household living standards, children’s education and work, and social dynamics. The data collection included a pre-intervention census, four periodic surveys, and 13 rounds of bi-monthly monitoring surveys, all conducted between late 2020 and late 2023. Details of each instrument are as follows:Census: Conducted in October/November 2020 in the target neighbourhood of North Gojmohol (n=1,832) and the comparison neighbourhood of Balurmath (n=2,365)Periodic surveys: Baseline (February 2021, n=752 in North Gojmohol), Midline 1 (before cash) (October 2022, n=771 in North Gojmohol), Midline 2 (after 6 rounds of cash) (July 2023, n=769 in North Gojmohol), and Endline (December 2023, n=750 in North Gojmohol and n=773 in Balumath)Bi-monthly monitoring data (13 rounds): Conducted between December 2021 and December 2023 in North Gojmohol (average of 1,400 households per round)The present repository summarizes this information, organized as follows:1.1 Bimonthly survey (household): Panel dataset comprising 13 rounds of bi-monthly monitoring data at the household level (average of 1,400 households per round, total of 18,379 observations)1.2 Bimonthly survey (child): Panel dataset comprising 13 rounds of bi-monthly monitoring data at the child level (aged 5 to 16 at census) (average of 940 children per round, total of 12,213 observations)2.1 Periodic survey (household): Panel dataset comprising 5 periodic surveys (census, baseline, midline 1, midline 2, endline) at the household level (average of 750 households per period, total of 3,762 observations)2.2 Periodic survey (child): Panel dataset comprising 4 periodic surveys (baseline, midline 1, midline 2, endline) at the child level (average of 3,100 children per period, total of 12,417 observations)3.0 Balurmat - North Gojmohol panel: Balanced panel dataset comprising 558 households in North Gojmohol and 773 households in Balurmath, observed both at 2020 census and 2023 endline (total of 2,662 observations)4.0 Questionnaires: Original questionnaires for all datasetsAll datasets are provided in Stata format (.dta) and Excel format (.xlsx) and are accompanied by their respective dictionary in Excel format (.xlsx).Qualitative DataThe qualitative study was conducted in three rounds: the first round of IDIs and FGDs took place between December 2022 and January 2023; the second round took place from April to May 2023; and the third round took place from November to December 2023. KIIs were taken during the 2nd round of study in May 2023.The sample size by round and instrument type is shown below:RoundsIDIs with childrenIDIs with parentsIDIs with CMsFGDsKIIs1st Round (12/2022 – 01/2023)3026-06-2nd Round ( 04/2023 – 05/2023)3023-06053rd Round (11/2023 – 12/2023)26250307-The files in this archive contain the qualitative data and include six types of transcripts:· 1.1 Interviews with children in case study households (IDI): 30 families in round 1, 30 in round 2, and 26 in round 3· 1.2 Interviews with parents in case study households (IDI): 26 families in round 1, 23 in round 2, and 25 in round 3· 1.3 Interviews with community mobiliser (IDI): 3 CM in round 3· 2.0 Key informant interviews (KII): 5 in round 2· 3.0 Focus group discussions (FGD): 6 in round 1, 6 in round 2, and 7 in round 3· 4.0 Community mobiliser micro-narratives (556 cases)Additionally, this repository includes a comprehensive list of all qualitative data files ("List of all qualitative data+MC.xlsx").
Alveolar type II (ATII) epithelial cells function as stem cells, contributing to alveolar renewal, repair and cancer. Therefore, they are a highly relevant model for studying lung diseases, including acute injury, fibrosis and cancer, in which signals transduced by RAS and transforming growth factor (TGF)-ß play critical roles. To identify downstream molecular events following RAS and/or TGF-ß activation, we performed proteomic analysis using a quantitative label-free approach (LC-HDMSE) to provide in-depth proteome coverage and estimates of protein concentration in absolute amounts. Contact: pjss@soton.ac.uk
This survey is the first detailed study on the phenomena of teacher absenteeism in Indonesia obtained from two unannounced visits to 147 sample schools in October 2002 and March 2003. The study was conducted by the SMERU Research Institute and the World Bank, affiliated with the Global Development Network (GDN). Similar surveys were carried out at the same time in seven other developing countries: Bangladesh, Ecuador, India, Papua New Guinea, Peru, Uganda, and Zambia.
This research focuses on primary school teacher absence rates and their relations to individual teacher characteristics, conditions of the community and its institutions, and the education policy at various levels of authority. A teacher was considered as absent if at the time of the visit the researcher could not find the sample teacher in the school.
This survey was conducted in randomly selected 10 districts/cities in four Indonesian regions: Java-Bali, Sumatera, Kalimantan-Sulawesi, and Nusa Tenggara.
Java-Bali, Sumatera, Kalimantan-Sulawesi and Nusa Tenggara regions
Sample survey data [ssd]
Information from Indonesian Statistics Agency (BPS) and the Ministry of Education was used as a basis to build a sample frame. The data gathered included the amount of total population, a list of villages and primary school facilities in each district/city. Due to limited time and resources, this research only focused on primary schools. In Indonesia, there are two types of primary education facilities: primary schools and primary madrasah. Primary schools are regulated by the Ministry of National Education, using the general curriculum, while primary madrasah are regulated by the Ministry of Religious Affairs, using a mixed (general and Islamic) curriculum.
A sample of districts/cities and schools (consisting of primary schools and primary madrasah) were selected using the following steps. First, Indonesia was divided into several regions based on the number of total population: Java-Bali, Sumatera, Kalimantan-Sulawesi, and Nusa Tenggara. Indonesian provinces that were suffering from various conflicts (such as Aceh, Central Sulawesi, Maluku, North Maluku, and Papua) were removed from the sample selection process. Then, from each region, a total of five districts and cities were randomly selected, taking into account the population of each district/city.
Second, 12 schools were selected in each district/city. Before choosing sampled schools, researchers randomly selected 10 villages in each district/city to be sampled, taking into account the location of these villages (in urban or rural areas). One of the 10 villages was a backup village to anticipate the possibility of a village that was too difficult to reach. In each village sampled, researchers asked residents about the location of primary schools/madrasah (both public and private) in these villages. They started visiting schools, giving priority to public primary schools/madrasahs. To meet the number of samples in each district/city, additional samples were selected from private schools.
Third, in each school sampled, the researcher would request a list of teachers. If a school visited was considered to be large, such as schools with more than 15 teachers, then the researcher would only interview 15 teachers chosen randomly to ensure that survey quality could be maintained despite the limited time and resources. Each school was visited twice, both on an unannounced date. From the 147 primary schools/madrasah in the sample, 1,441 teachers were selected in each visit (because this is a panel study, the teacher absence data that were used were taken only from teachers that could be interviewed or whose data were obtained from both visits). If there were teachers whose information was only obtained from one of the visits, then their data was not included in the dataset panel.
Face-to-face [f2f]
The following survey instruments are available:
Detailed information about data editing procedures is available in "Data Cleaning Guide for PETS/QSDS Surveys" in external resources.
The STATA cleaning do-file and the data quality report on the dataset can also be found in external resources.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.