Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
The Poisson Process file concerns the solution of an exercise from the fourth module of the Statistics and Applied Data Analysis Specialization course at the University of Colorado Boulder that I took. In these notes, I intend to explain the most important steps.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Many capture-recapture surveys of wildlife populations operate in continuous time but detections are typically aggregated into occasions for analysis, even when exact detection times are available. This discards information and introduces subjectivity, in the form of decisions about occasion definition. We develop a spatio-temporal Poisson process model for spatially explicit capture-recapture (SECR) surveys that operate continuously and record exact detection times. We show that, except in some special cases (including the case in which detection probability does not change within occasion), temporally aggregated data do not provide sufficient statistics for density and related parameters, and that when detection probability is constant over time our continuous-time (CT) model is equivalent to an existing model based on detection frequencies. We use the model to estimate jaguar density from a camera-trap survey and conduct a simulation study to investigate the properties of a CT estimator and discrete-occasion estimators with various levels of temporal aggregation. This includes investigation of the effect on the estimators of spatio-temporal correlation induced by animal movement. The CT estimator is found to be unbiased and more precise than discrete-occasion estimators based on binary capture data (rather than detection frequencies) when there is no spatio-temporal correlation. It is also found to be only slightly biased when there is correlation induced by animal movement, and to be more robust to inadequate detector spacing, while discrete-occasion estimators with binary data can be sensitive to occasion length, particularly in the presence of inadequate detector spacing. Our model includes as a special case a discrete-occasion estimator based on detection frequencies, and at the same time lays a foundation for the development of more sophisticated CT models and estimators. It allows modelling within-occasion changes in detectability, readily accommodates variation in detector effort, removes subjectivity associated with user-defined occasions, and fully utilises CT data. We identify a need for developing CT methods that incorporate spatio-temporal dependence in detections and see potential for CT models being combined with telemetry-based animal movement models to provide a richer inference framework.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.
Facebook
TwitterThis dataset was created by Dr. H. Khalil
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Genomic Results Discrete Data Integration market size reached USD 2.18 billion in 2024, reflecting a robust expansion driven by the rapid adoption of precision medicine and the increasing integration of multi-omics data in healthcare and research. The market is projected to grow at a CAGR of 13.6% from 2025 to 2033, reaching an estimated USD 6.47 billion by 2033. This remarkable growth is primarily fueled by technological advancements in bioinformatics, an upsurge in clinical applications of genomics, and a growing demand for actionable insights from complex biological datasets.
One of the primary growth factors propelling the Genomic Results Discrete Data Integration market is the exponential increase in genomic data generated by next-generation sequencing (NGS) technologies. As the cost of sequencing continues to decrease, the volume of genomic, transcriptomic, proteomic, and metabolomic data being produced is rising dramatically. This surge necessitates advanced data integration solutions capable of transforming raw, heterogeneous datasets into structured, clinically relevant information. The ability to harmonize and standardize disparate data sources is crucial for supporting clinical diagnostics, personalized medicine, and drug discovery, all of which rely on robust data integration platforms to drive informed decisions and improve patient outcomes.
Another significant driver is the growing emphasis on personalized medicine and targeted therapeutics. Healthcare providers and pharmaceutical companies are increasingly leveraging discrete data integration platforms to correlate genomic variants with phenotypic outcomes, enabling more precise disease stratification and individualized treatment strategies. The integration of multi-omics data not only enhances the understanding of disease mechanisms but also accelerates the identification of novel therapeutic targets. This trend is further reinforced by regulatory agencies and reimbursement bodies that are placing greater value on the clinical utility of integrated genomic data, thereby incentivizing investments in advanced integration technologies.
Furthermore, the adoption of cloud-based solutions and artificial intelligence (AI) in genomic data integration is revolutionizing the market landscape. Cloud platforms offer scalable storage, computational power, and collaborative environments, making it feasible for institutions of all sizes to process and analyze vast datasets efficiently. AI-driven analytics are enhancing the extraction of actionable insights from integrated data, supporting applications across clinical diagnostics, research, and drug development. The convergence of these technologies is not only improving the speed and accuracy of data interpretation but also expanding the accessibility of genomic insights to a broader range of end-users, including hospitals, research institutes, and biotechnology companies.
Regionally, North America dominated the Genomic Results Discrete Data Integration market in 2024, accounting for the largest revenue share due to its advanced healthcare infrastructure, high adoption of precision medicine, and significant investments in genomics research. Europe followed closely, driven by strong government support and collaborative research initiatives. The Asia Pacific region is emerging as a high-growth market, propelled by increasing healthcare expenditure, expanding genomics research capabilities, and rising awareness of personalized medicine. Latin America and the Middle East & Africa are also witnessing gradual adoption, supported by international collaborations and capacity-building efforts. The regional outlook remains optimistic, with all major regions expected to contribute significantly to the market’s overall expansion through 2033.
The Genomic Results Discrete Data Integration market by component is segmented into software, hardware, and services, each playing a pivotal role in enabling seamless integration and interpretation of complex biological data. Software solutions represent the largest share, driven by the need for sophisticated algorithms that can harmonize, standardize, and analyze multi-omics datasets. These platforms facilitate data interoperability, support regulatory compliance, and enable advanced analytics, making them indispensable for both clinical and research applications. Key sof
Facebook
Twitter
As per our latest research, the global Genomic Results Discrete Data Integration market size reached USD 1.45 billion in 2024, demonstrating robust momentum driven by the increasing adoption of precision medicine and advanced data analytics in genomics. The market is projected to expand at a CAGR of 13.2% during the forecast period, reaching an estimated USD 4.14 billion by 2033. This impressive growth trajectory is fueled by the convergence of high-throughput sequencing technologies, the rising demand for integrated healthcare data, and the need for actionable insights from complex genomic datasets.
A primary growth factor in the Genomic Results Discrete Data Integration market is the exponential rise in genomic data generation, propelled by advancements in next-generation sequencing (NGS) and other high-throughput technologies. As the cost of sequencing continues to decline, the volume of raw genomic data produced by research laboratories, clinical settings, and biopharmaceutical companies has surged. However, the true value of this data is only realized when disparate datasets—spanning genomics, transcriptomics, proteomics, and metabolomics—are seamlessly integrated and analyzed. The integration of discrete genomic results enables researchers and clinicians to uncover complex biological relationships, identify novel biomarkers, and support the development of targeted therapies, thus driving widespread adoption of data integration platforms and solutions.
Another significant driver is the increasing focus on personalized medicine, which relies heavily on the integration of multi-omics data to tailor medical treatments to individual patients. Healthcare providers and pharmaceutical companies are leveraging integrated genomic data to stratify patient populations, predict disease susceptibility, and optimize therapeutic interventions. This shift toward data-driven healthcare is further supported by regulatory agencies encouraging the use of real-world evidence and integrated datasets for drug approval and post-market surveillance. Consequently, the demand for robust, scalable, and interoperable data integration solutions is surging, as stakeholders seek to harness the full potential of genomic and related datasets for clinical and research applications.
Furthermore, the Genomic Results Discrete Data Integration market benefits from technological innovations in artificial intelligence (AI), machine learning (ML), and cloud computing. These technologies facilitate the efficient aggregation, harmonization, and analysis of massive and heterogeneous datasets, overcoming traditional barriers to data integration such as data silos, format inconsistencies, and security concerns. The adoption of AI-driven analytics and cloud-based integration platforms is accelerating, enabling real-time data sharing, collaborative research, and scalable storage solutions. These advancements are not only enhancing the accuracy and speed of data interpretation but also democratizing access to integrated genomic insights across diverse healthcare and research environments.
From a regional perspective, North America continues to dominate the Genomic Results Discrete Data Integration market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The region’s leadership is attributed to its advanced healthcare infrastructure, significant investments in genomics research, and the presence of leading biopharmaceutical and technology companies. Meanwhile, Asia Pacific is emerging as the fastest-growing region, propelled by expanding genomic research initiatives, increasing healthcare expenditure, and government support for precision medicine. Europe also demonstrates steady growth, driven by collaborative research projects and strong regulatory frameworks supporting data integration. Latin America and Middle East & Africa represent nascent but promising markets, with growing awareness and gradual adoption of integrated genomic solutions.
The Com
Facebook
TwitterThis dataset includes profile discrete measurements of temperature, salinity, oxygen and CFCs obtained During the R/V Meteor cruise M85/1 (EXPOCODE 06M320110624) in the North Atlantic Ocean from 2011-06-24 to 2011-08-02. R/V Meteor Cruise M85, leg 1, was funded by the German Federal Ministry of Education and Research (BMBF) as part of the cooperative research program "North Atlantic".
Facebook
TwitterHarmful algal blooms (HABs) are overgrowths of algae or cyanobacteria in water and can be harmful to humans and animals directly via toxin exposure or indirectly via changes in water quality and related impacts to ecosystems services, drinking water characteristics, and recreation. While HABs occur frequently throughout the United States, the driving conditions behind them are not well understood, especially in flowing waters. In order to facilitate future model development and characterization of HABs in the Illinois River Basin, this data release publishes a synthesized and cleaned collection of HABs-related water quality and quantity data for river and stream sites in the basin. It includes nutrients, major ions, sediment, physical properties, streamflow, chlorophyll and other types of water data. This data release contains files of harmonized data from the USGS National Water Information System (NWIS), the U.S. Army Corps of Engineers (USACE), the Illinois Environmental Protection Agency (IEPA), and a USGS Open File Report (OFR) containing toxin data in Illinois (Terrio and others, 2013: https://pubs.usgs.gov/of/2013/1019/pdf/ofr2013-1019.pdf). Both discrete data and continuous sensor data for 142 parameters (44 of which returned data) between October 1, 2015 and December 31, 2022 were downloaded from NWIS programmatically. All data were harmonized into a shared format (see files named data_{parameter_group}combined.csv). The USGS NWIS data went through additional cleaning and were also grouped by generic parameters (see pcode_group_xwalk.csv to see what parameter codes are mapped to which generic parameters). Any data not from USGS NWIS were kept outside of the parameter grouping files. Additional streamflow data for select locations was retrieved from the USACE and are available in data_usace_00060_combined.csv. Additional algal toxin data provided by the IEPA and in a USGS OFR report (Terrio and others, 2013), which include some lake sites, are available in data_algaltoxins_combined.csv. We also provide collapsed datasets of daily metrics for each water quality (“generic parameter”) group of USGS NWIS data (files named daily_metrics{parameter_group}.csv). Lastly, we include a site_metadata.csv containing site identification and location information for all sites with water quality and quantity data, and mappings to the National Hydrography Dataset flowlines where available. This work was completed as part of the USGS Proxies Project, an effort supported by the Water Mission Area (WMA) Water Quality Processes program to develop estimation methods for PFAS, harmful algal blooms, and metals, at multiple spatial and temporal scales.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In multilevel data, units at level 1 are nested in clusters at level 2, which in turn may be nested in even larger clusters at level 3, and so on. For continuous data, several authors have shown how to model multilevel data in a ‘wide’ or ‘multivariate’ format approach. We provide a general framework to analyze random intercept multilevel SEM in the ‘wide format’ (WF) and extend this approach for discrete data. In a simulation study, we vary response scale (binary, four response options), covariate presence (no, between-level, within-level), design (balanced, unbalanced), model misspecification (present, not present), and the number of clusters (small, large) to determine accuracy and efficiency of the estimated model parameters. With a small number of observations in a cluster, results indicate that the WF approach is a preferable approach to estimate multilevel data with discrete response options.
Facebook
TwitterThis dataset was created by DANIYAL KHAN
Facebook
TwitterThe FORGE team is making these fracture models available to researchers wanting a set of natural fractures in the FORGE reservoir for use in their own modeling work. They have been used to predict stimulation distances during hydraulic stimulation at the open toe section of well 16A(78)-32. These fracture sets are fully stochastic and do not contain the deterministic set that matches the pilot well 58-32 FMI data. Well 58-32 has been completed and 16A(78)-32 is to be drilled as part of Phase 3. The original .fab files are not included due to redundancy. The *.fabgz data for the 800m and 1200m depth areas are in the native FracMan format and have been compressed using Gzip. Filtered data for the 800m depth area includes .csv spreadsheets, native FracMan (.fab), and GOCAD (.ts) files that are in a compressed zip format. The file titled "SGW 2020 Finnila and Podgorney DFN fracture files on GDR.pdf" is a description of the data and should be reviewed prior to data use.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project was a collaboration between Dr. Christopher W. Hunt and Dr. Joseph Salisbury (of the University of New Hampshire) and Dr. Xuewu Liu and Dr. Robert H. Byrne (of the University of South Florida).
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains discrete carbonate data (TA, DIC, pH, Temperature, Salinity), collected as part of the study described below. See the "Related Publications" section for autonomously collected pH data from this study.
Study description:
The increase in atmospheric carbon dioxide (CO2) over the last 200 years has largely been mitigated by the ocean’s function as a carbon sink. However, this continuous absorption of CO2 by seawater triggers ocean acidification (OA), a process in which water becomes more acidic and more depleted in carbonate ions that are essential for calcifiers. OA is well-studied in open ocean environments; however, understanding the unique manifestation of OA in coastal ecosystems presents myriad challenges due to considerable natural variability resulting from concurrent and sometimes opposing coastal processes--e.g. eutrophication, changing hydrological conditions, heterogeneous biological activity, and complex water mass mixing. This study analyzed high temporal resolution pH data collected during 2022 and 2023 from Narragansett Bay, RI--a mid-sized, urban estuary that since 2005 has undergone a 50% reduction in nitrogen loading with weekly, discrete bottle samples to verify sensor data. We used autonomous data for pH, temperature, salinity, and dissolved oxygen from 4 sensors in Narragansett Bay. The autonomous data spanned over a year from 2022 to mid-2023 and had temporal resolutions between 10 and 15 minutes. The data have been subjected to QA/QC protocols, such that all pH measurements are final and quality-controlled. As well, pH values normalized to 15°C (using PyCO2SYS) are included. All pH values are in total scale.
Discrete samples were taken weekly at the Narragansett Bay Long Term Phytoplankton Time Series site and monthly from Greenwich Bay, collocated with 2 of the sensors. Discrete data were analyzed in lab for dissolved inorganic carbon and total alkalinity, and include in situ temperature and salinity.
Facebook
TwitterThe primary article (cited below under "Related works") introduces social work researchers to discrete choice experiments (DCEs) for studying stakeholder preferences. The article includes an online supplement with a worked example demonstrating DCE design and analysis with realistic simulated data. The worked example focuses on caregivers' priorities in choosing treatment for children with attention deficit hyperactivity disorder. This dataset includes the scripts (and, in some cases, Excel files) that we used to identify appropriate experimental designs, simulate population and sample data, estimate sample size requirements for the multinomial logit (MNL, also known as conditional logit) and random parameter logit (RPL) models, estimate parameters using the MNL and RPL models, and analyze attribute importance, willingness to pay, and predicted uptake. It also includes the associated data files (experimental designs, data generation parameters, simulated population data and parameters, ..., In the worked example, we used simulated data to examine caregiver preferences for 7 treatment attributes (medication administration, therapy location, school accommodation, caregiver behavior training, provider communication, provider specialty, and monthly out-of-pocket costs) identified by dosReis and colleagues in a previous DCE. We employed an orthogonal design with 1 continuous variable (cost) and 12 dummy-coded variables (representing the levels of the remaining attributes, which were categorical). Using the parameter estimates published by dosReis et al., with slight adaptations, we simulated utility values for a population of 100,000 people, then selected a sample of 500 for analysis. Relying on random utility theory, we used the mlogit package in R to estimate the MNL and RPL models, using 5,000 Halton draws for simulated maximum likelihood estimation of the RPL model. In addition to estimating the utility parameters, we measured the relative importance of each attribute, esti..., , # Data from: How to Use Discrete Choice Experiments to Capture Stakeholder Preferences in Social Work Research
This dataset supports the worked example in:
Ellis, A. R., Cryer-Coupet, Q. R., Weller, B. E., Howard, K., Raghunandan, R., & Thomas, K. C. (2024). How to use discrete choice experiments to capture stakeholder preferences in social work research. Journal of the Society for Social Work and Research. Advance online publication. https://doi.org/10.1086/731310
The referenced article introduces social work researchers to discrete choice experiments (DCEs) for studying stakeholder preferences. In a DCE, researchers ask participants to complete a series of choice tasks: hypothetical situations in which each participant is presented with alternative scenarios and selects one or more. For example, social work researchers may want to know how parents and other caregivers pr...
Facebook
TwitterThis data set is a compilation of discrete and high frequency water quality data from sites on Allequash Creek in Wisconsin, and within the Allequash Creek watershed, for the water years (WY) 2019-2021.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
A combination of discrete and daily-aligned groundwater levels for the Mississippi River Valley alluvial aquifer clipped to the Mississippi Alluvial Plain, as defined by Painter and Westerman (2018), with corresponding metadata are based on processing of U.S. Geological Survey National Water Information System (NWIS) (U.S. Geological Survey, 2020) data. The processing was made after retrieval using aggregation and filtering through the infoGW2visGWDB software (Asquith and Seanor, 2019). The nomenclature GWmaster mimics that of the output from infoGW2visGWDB. Two separate data retrievals for NWIS were made. First, the discrete data were retrieved, and second, continuous records from recorder sites with daily-mean or other daily statistics codes were retrieved. Each dataset was separately passed through the infoGW2visGWDB software to create a "GWmaster discrete" and "GWmaster continuous" and these tables were combined and then sorted on the site identifier and date to form the data ...
Facebook
TwitterBlood Glucose discrete data set that already interpolated by Spline Method to measure value of MAGE. This data set aim at to find the alternative than using CGM (Continuous Glucose Monitoring) to predict diabetic using discrete data. The discrete data obtained from 27 fluctuations of blood glucose within 3 days that taken by Glucometer. After the data go through Interpolation method, there are 150+ point that can re-present as similar as CGM model.
There are 42 Patients Column A as CLASS means divide the conditions into 3 groups (1 for Pre-Diabet patient, 2 for Diabet patient, 3 for Normal patient)
Thank you for 42 volunteers that who are willing to spend time and energy for this study Related article - http://beei.org/index.php/EEI/article/view/2387
Hope with this data can create another study relate with predict Diabetic to personal user, so we can monitor our life-style
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This study identified subgroups of bladder pain syndrome/interstitial cystitis (BPS/IC) patients and potential treatment targets by combining validated questionnaires and patient diaries with discrete mathematical techniques. Hierarchical clustering of questionnaire data revealed three distinct patient groups. Analysis of patient diaries, employing natural language processing—a form of discrete data analysis—found keywords capturing emotional and psychological experiences, complementing the questionnaire results. Integration of questionnaire and diary data visualized the relationships between symptoms and treatment targets through a network graph. This personalized approach, akin to solving the traveling salesman problem in discrete mathematics, was validated through case studies, demonstrating its utility in guiding targeted interventions. The study emphasizes the significant potential of discrete mathematics-based data integration and visualization for personalized management of this complex condition.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In applications such as clinical safety analysis, the data of the experiments usually consist of frequency counts. In the analysis of such data, researchers often face the problem of multiple testing based on discrete test statistics, aimed at controlling family-wise error rate (FWER). Most existing FWER controlling procedures are developed for continuous data, which are often conservative when analyzing discrete data. By using minimal attainable p-values, several FWER controlling procedures have been specifically developed for discrete data in the literature. In this article, by using known marginal distributions of true null p-values, three more powerful stepwise procedures are developed, which are modified versions of the conventional Bonferroni, Holm and Hochberg procedures, respectively. It is shown that the first two procedures strongly control the FWER under arbitrary dependence and are more powerful than the existing Tarone-type procedures, while the last one only ensures control of the FWER in special settings. Through extensive simulation studies, we provide numerical evidence of superior performance of the proposed procedures in terms of the FWER control and minimal power. A real clinical safety data are used to demonstrate applications of our proposed procedures. An R package “MHTdiscrete” and a web application are developed for implementing the proposed procedures.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
The Poisson Process file concerns the solution of an exercise from the fourth module of the Statistics and Applied Data Analysis Specialization course at the University of Colorado Boulder that I took. In these notes, I intend to explain the most important steps.