Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.
This data release includes water-quality data collected at up to thirteen locations along the Merrimack River and Merrimack River Estuary in Massachusetts. In this study, conducted by the U.S. Geological Survey (USGS) in cooperation with the Massachusetts Department of Environmental Protection, discrete samples were collected, and continuous monitoring was completed from June to September 2020. The data include results of measured field properties (water temperature, specific conductivity, pH, dissolved oxygen) and laboratory concentrations of nitrogen and phosphorus species, total carbon, pheophytin-a, and chlorophyll-a. These data were collected to assess selected (mainly nutrients) water-quality conditions in the Merrimack River and Merrimack River Estuary at the thirteen locations and identify areas where more water-quality monitoring is needed. The discrete samples and continuous-monitoring data are also available in the USGS National Water Information System at https://waterdata.usgs.gov/nwis. This data release consists of (1) Table of the discrete water-quality data collected (Merrimack_DiscreteWQ_Data.csv); (2) Statistical summaries including the minimum, median, and maximum of the discrete water-quality data collected (Merrimack_DiscreteWQ_Statistical_Data.original.csv); (3) Statistical summaries including the minimum, median, and maximum of the continuous water-quality data collected (Merrimack_ContinuousWQ_Statistical_Data.csv); (4) Table of vertical profile data (Merrimack_VerticalWQ_Profiles_Data.csv); (5) Table of continuous monitor deployment location and dates (Merrimack_ContinuousWQ_Deployment_Dates.csv); (6) Time-series plots of continuous water-quality data (Continuous_QW_Plots_All.zip); (7) Vertical profile plots (Vertical Profiles_QW_Plots.zip).
This dataset includes discrete sample and profile data collected from DISCOVERY in the Indian Ocean and Southern Oceans (> 60 degrees South) from 1994-02-19 to 1994-03-30. These data include CHLOROFLUOROCARBON-11 (CFC-11), CHLOROFLUOROCARBON-113 (CFC-113), CHLOROFLUOROCARBON-12 (CFC-12), Carbon tetrachloride (CCL4), DISSOLVED OXYGEN, Delta Oxygen-18, HYDROSTATIC PRESSURE, NITRATE, Potential temperature (theta), SALINITY, WATER TEMPERATURE, phosphate and silicate. The instruments used to collect these data include CTD and bottle. These data were collected by Robert R. Dickson of Fisheries Laboratory - Lowestoft as part of the WOCE_ISS01h_74DI19940219 dataset. CDIAC associated the following cruise ID(s) with this dataset: DIS94 and WOCE_ISS01h_1994 The World Ocean Circulation Experiment (WOCE) was a major component of the World Climate Research Program with the overall goal of better understanding the ocean's role in climate and climatic changes resulting from both natural and anthropogenic causes. The CO2 survey took advantage of the sampling opportunities provided by the WOCE Hydrographic Program (WHP) cruises during this period between 1990 and 1998. The final collection covers approximately 23,000 stations from 94 WOCE cruises.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This MATLAB code, which pertains to the Adaptive Multi-step Levenberg-Marquardt (AMLM) algorithm, can be utilized for the nonlinear parameter fitting of discrete data. The input data is particularly required for power system current or voltage signals, which are supposed to be fitted to an exponential function.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In multilevel data, units at level 1 are nested in clusters at level 2, which in turn may be nested in even larger clusters at level 3, and so on. For continuous data, several authors have shown how to model multilevel data in a ‘wide’ or ‘multivariate’ format approach. We provide a general framework to analyze random intercept multilevel SEM in the ‘wide format’ (WF) and extend this approach for discrete data. In a simulation study, we vary response scale (binary, four response options), covariate presence (no, between-level, within-level), design (balanced, unbalanced), model misspecification (present, not present), and the number of clusters (small, large) to determine accuracy and efficiency of the estimated model parameters. With a small number of observations in a cluster, results indicate that the WF approach is a preferable approach to estimate multilevel data with discrete response options.
Discrete sample data from manual field collection and laboratory analyses taken in 2025. The database contains water quality, sediment, biological, air, and soil samples from monitoring locations across the state of Texas.
<markdown> The Interagency Ecological Program’s (IEP) Environmental Monitoring Program (EMP) was initiated in compliance with the Water Right Decision D-1379 (now mandated by Water Right Decision D-1641) and has monitored discrete water quality and nutrients in the upper San Francisco Estuary since 1975. The objectives of the EMP are to obtain consistent and accurate monthly data at established monitoring stations, provide and document information necessary to achieve compliance with salinity, flow, and dissolved oxygen standards, and to report this information for the purpose of management and conservation of the upper San Francisco Estuary. While the EMP also collects biological data, this dataset only includes the discrete water quality and nutrient data collected by the EMP from 1975-2021. Links to other EMP datasets can be found here </markdown>
This is a point feature class of environmental monitoring stations maintained in the California Department of Water Resources’ (hereafter the Department) Water Data Library Database (WDL) for discrete “grab” water quality sampling stations. The WDL database contains DWR-collected, current and historical, chemical and physical parameters found in drinking water, groundwater, and surface waters throughout the state. This dataset is comprised of a Stations point feature class and a related “Period of Record by Station and Parameter” table. The Stations point feature class contains basic information about each station including station name, station type, latitude, longitude, and the dates of the first and last sample collection events on record. The related Period of Record Table contains the list of parameters (i.e. chemical analyte or physical parameter) collected at each station along with the start date and end date (period of record) for each parameter and the number of data points collected. The Lab and Field results data associated with this discrete grab water quality stations dataset can be accessed from the California Natural Resources Agencies Open Data Platform at https://data.cnra.ca.gov/dataset/water-quality-data or from DWR’s Water Data Library web application at https://wdl.water.ca.gov/waterdatalibrary/index.cfm.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This paper demonstrates the flexibility of a general approach for the analysis of discrete time competing risks data that can accommodate complex data structures, different time scales for different causes, and nonstandard sampling schemes. The data may involve a single data source where all individuals contribute to analyses of both cause-specific hazard functions, overlapping datasets where some individuals contribute to the analysis of the cause-specific hazard function of only one cause while other individuals contribute to analyses of both cause-specific hazard functions, or separate data sources where each individual contributes to the analysis of the cause-specific hazard function of only a single cause. The approach is modularized into estimation and prediction. For the estimation step, the parameters and the variance-covariance matrix can be estimated using widely available software. The prediction step utilizes a generic program with plug-in estimates from the estimation step. The approach is illustrated with three prognostic models for stage IV male oral cancer using different data structures. The first model uses only men with stage IV oral cancer from population-based registry data. The second model strategically extends the cohort to improve the efficiency of the estimates. The third model improves the accuracy for those with a lower risk of other causes of death, by bringing in an independent data source collected under a complex sampling design with additional other-cause covariates. These analyses represent novel extensions of existing methodology, broadly applicable for the development of prognostic models capturing both the cancer and non-cancer aspects of a patient's health.
Discrete sample data from manual field collection and laboratory analyses taken since 2010. It contains water quality, sediment, biological, air, and soil samples from monitoring locations across the Lower Canadian Subregion of Texas, Hydrologic Unit Code (HUC) 1109.
Harmful algal blooms (HABs) are overgrowths of algae or cyanobacteria in water and can be harmful to humans and animals directly via toxin exposure or indirectly via changes in water quality and related impacts to ecosystems services, drinking water characteristics, and recreation. While HABs occur frequently throughout the United States, the driving conditions behind them are not well understood, especially in flowing waters. In order to facilitate future model development and characterization of HABs in the Illinois River Basin, this data release publishes a synthesized and cleaned collection of HABs-related water quality and quantity data for river and stream sites in the basin. It includes nutrients, major ions, sediment, physical properties, streamflow, chlorophyll and other types of water data. This data release contains files of harmonized data from the USGS National Water Information System (NWIS), the U.S. Army Corps of Engineers (USACE), the Illinois Environmental Protection Agency (IEPA), and a USGS Open File Report (OFR) containing toxin data in Illinois (Terrio and others, 2013: https://pubs.usgs.gov/of/2013/1019/pdf/ofr2013-1019.pdf). Both discrete data and continuous sensor data for 142 parameters (44 of which returned data) between October 1, 2015 and December 31, 2022 were downloaded from NWIS programmatically. All data were harmonized into a shared format (see files named data_{parameter_group}combined.csv). The USGS NWIS data went through additional cleaning and were also grouped by generic parameters (see pcode_group_xwalk.csv to see what parameter codes are mapped to which generic parameters). Any data not from USGS NWIS were kept outside of the parameter grouping files. Additional streamflow data for select locations was retrieved from the USACE and are available in data_usace_00060_combined.csv. Additional algal toxin data provided by the IEPA and in a USGS OFR report (Terrio and others, 2013), which include some lake sites, are available in data_algaltoxins_combined.csv. We also provide collapsed datasets of daily metrics for each water quality (“generic parameter”) group of USGS NWIS data (files named daily_metrics{parameter_group}.csv). Lastly, we include a site_metadata.csv containing site identification and location information for all sites with water quality and quantity data, and mappings to the National Hydrography Dataset flowlines where available. This work was completed as part of the USGS Proxies Project, an effort supported by the Water Mission Area (WMA) Water Quality Processes program to develop estimation methods for PFAS, harmful algal blooms, and metals, at multiple spatial and temporal scales.
This dataset includes profile discrete measurements of temperature, salinity, oxygen and CFCs obtained During the R/V Meteor cruise M85/1 (EXPOCODE 06M320110624) in the North Atlantic Ocean from 2011-06-24 to 2011-08-02. R/V Meteor Cruise M85, leg 1, was funded by the German Federal Ministry of Education and Research (BMBF) as part of the cooperative research program "North Atlantic".
The goal of this study was to develop a suite of inter-related water quality monitoring approaches capable of modeling and estimating the spatial and temporal gradients of particulate and dissolved total mercury (THg) concentration, and particulate and dissolved methyl mercury (MeHg), concentration, in surface waters across the Sacramento / San Joaquin River Delta (SSJRD). This suite of monitoring approaches included: a) data collection at fixed continuous monitoring stations (CMS) outfitted with in-situ sensors, b) spatial mapping using boat-mounted flow-through sensors, and c) satellite-based remote sensing. The focus of this specific Child Page is to present laboratory measured spectral data associated with discrete surface water samples collected as part of both the CMS and boat mapping sampling efforts. All laboratory-based measurement presented herein were conducted by the U.S. Geological Survey (USGS) Organic Matter Research Laboratory (OMRL) in Sacramento, Calif. The machine-readable (comma separated value, *.csv) files presented herein include spectral data collected using two different instruments: 1) Laboratory-based absorbance and fluorescence measurements on filtered water using an Aqualog (Hansen and others, 2018) and 2) Laboratory-based absorption measurements using a Varian Cary spectrophotometer on particulate samples collected on glass fiber filters (Kishino and others, 1985; Roesler, 1998). The reported spectral data includes: 1) fluorescence intensities across a wide range of excitation (240 to 800 nm) and emission (250 to 800 nm) wavelengths expressed as an excitation-emission matrix (EEM), 2) absorbance of light (from 239 nm to 800 nm) due to dissolved and colloidal substances, and 3) absorption coefficients (from 350 nm to 715 nm) for particulates using the quantitative filter technique (QFT).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In applications such as clinical safety analysis, the data of the experiments usually consist of frequency counts. In the analysis of such data, researchers often face the problem of multiple testing based on discrete test statistics, aimed at controlling family-wise error rate (FWER). Most existing FWER controlling procedures are developed for continuous data, which are often conservative when analyzing discrete data. By using minimal attainable p-values, several FWER controlling procedures have been specifically developed for discrete data in the literature. In this article, by using known marginal distributions of true null p-values, three more powerful stepwise procedures are developed, which are modified versions of the conventional Bonferroni, Holm and Hochberg procedures, respectively. It is shown that the first two procedures strongly control the FWER under arbitrary dependence and are more powerful than the existing Tarone-type procedures, while the last one only ensures control of the FWER in special settings. Through extensive simulation studies, we provide numerical evidence of superior performance of the proposed procedures in terms of the FWER control and minimal power. A real clinical safety data are used to demonstrate applications of our proposed procedures. An R package “MHTdiscrete” and a web application are developed for implementing the proposed procedures.
Discrete sample data from manual field collection and laboratory analyses taken since 2020. It contains water quality, sediment, biological, air, and soil samples from monitoring locations across the Lower Brazos Subregion of Texas, Hydrologic Unit Code (HUC) 1207.
The FORGE team is making these fracture models available to researchers wanting a set of natural fractures in the FORGE reservoir for use in their own modeling work. They have been used to predict stimulation distances during hydraulic stimulation at the open toe section of well 16A(78)-32. These fracture sets are fully stochastic and do not contain the deterministic set that matches the pilot well 58-32 FMI data. Well 58-32 has been completed and 16A(78)-32 is to be drilled as part of Phase 3. The original .fab files are not included due to redundancy. The *.fabgz data for the 800m and 1200m depth areas are in the native FracMan format and have been compressed using Gzip. Filtered data for the 800m depth area includes .csv spreadsheets, native FracMan (.fab), and GOCAD (.ts) files that are in a compressed zip format. The file titled "SGW 2020 Finnila and Podgorney DFN fracture files on GDR.pdf" is a description of the data and should be reviewed prior to data use.
https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
Unclassified three-dimensional point cloud by flightline and classified point cloud by 1 km tile, provided in LAZ format. Classifications follow standard ASPRS definitions. All point coordinates are provided in meters. Horizontal coordinates are referenced in the appropriate UTM zone and the ITRF00 datum. Elevations are referenced to Geoid12A.
Discrete sample data from manual field collection and laboratory analyses taken since 2020. It contains water quality, sediment, biological, air, and soil samples from monitoring locations across the Central Texas Coastal Subregion of Texas, Hydrologic Unit Code (HUC) 1210.
Discrete sample data from manual field collection and laboratory analyses taken since 2000. It contains water quality, sediment, biological, air, and soil samples from monitoring locations across the Lower Pecos Subregion of Texas, Hydrologic Unit Code (HUC) 1307.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The empirical analysis of discrete complete-information games has relied on behavioral restrictions in the form of solution concepts, such as Nash equilibrium. Choosing the right solution concept is crucial not just for the identification of payoff parameters, but also for the validity and informativeness of counterfactual exercises and policy implications. We say that a solution concept is discernible if it is possible to determine whether it generated the observed data on the players’ behavior and covariates. We propose a set of conditions that make it possible to discern solution concepts. In particular, our conditions are sufficient to tell whether the players’ choices emerged from Nash equilibria. We can also discriminate between rationalizable behavior, maxmin behavior, and collusive behavior. Finally, we identify the correlation structure of unobserved shocks in our model using a novel approach.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.