76 datasets found

Mutual Information between Discrete and Continuous Data Sets
plos.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian C. Ross (2023). Mutual Information between Discrete and Continuous Data Sets [Dataset]. http://doi.org/10.1371/journal.pone.0087357
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0087357
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Brian C. Ross
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.
n
Data from: Continuous-time spatially explicit capture-recapture models, with...
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Apr 21, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Foster; Bart Harmsen; Lorenzo Milazzo; Greg Distiller; David Borchers (2014). Continuous-time spatially explicit capture-recapture models, with an application to a jaguar camera-trap survey [Dataset]. http://doi.org/10.5061/dryad.mg5kv
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.mg5kv
Dataset updated
Apr 21, 2014
Dataset provided by
University of Cambridge
University of Cape Town
University of Belize
University of St Andrews
Authors
Rebecca Foster; Bart Harmsen; Lorenzo Milazzo; Greg Distiller; David Borchers
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Cockscomb Basin Wildlife Sanctuary, Belize
Description
Many capture-recapture surveys of wildlife populations operate in continuous time but detections are typically aggregated into occasions for analysis, even when exact detection times are available. This discards information and introduces subjectivity, in the form of decisions about occasion definition. We develop a spatio-temporal Poisson process model for spatially explicit capture-recapture (SECR) surveys that operate continuously and record exact detection times. We show that, except in some special cases (including the case in which detection probability does not change within occasion), temporally aggregated data do not provide sufficient statistics for density and related parameters, and that when detection probability is constant over time our continuous-time (CT) model is equivalent to an existing model based on detection frequencies. We use the model to estimate jaguar density from a camera-trap survey and conduct a simulation study to investigate the properties of a CT estimator and discrete-occasion estimators with various levels of temporal aggregation. This includes investigation of the effect on the estimators of spatio-temporal correlation induced by animal movement. The CT estimator is found to be unbiased and more precise than discrete-occasion estimators based on binary capture data (rather than detection frequencies) when there is no spatio-temporal correlation. It is also found to be only slightly biased when there is correlation induced by animal movement, and to be more robust to inadequate detector spacing, while discrete-occasion estimators with binary data can be sensitive to occasion length, particularly in the presence of inadequate detector spacing. Our model includes as a special case a discrete-occasion estimator based on detection frequencies, and at the same time lays a foundation for the development of more sophisticated CT models and estimators. It allows modelling within-occasion changes in detectability, readily accommodates variation in detector effort, removes subjectivity associated with user-defined occasions, and fully utilises CT data. We identify a need for developing CT methods that incorporate spatio-temporal dependence in detections and see potential for CT models being combined with telemetry-based animal movement models to provide a richer inference framework.
d
Data from: Continuous monitoring and discrete water-quality data from...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Continuous monitoring and discrete water-quality data from groundwater wells in the Edwards aquifer, Texas, 2014–15 [Dataset]. https://catalog.data.gov/dataset/continuous-monitoring-and-discrete-water-quality-data-from-groundwater-wells-in-the-edward
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Texas
Description
In cooperation with the San Antonio Water System, continuous and discrete water-quality data were collected from groundwater wells completed in the Edwards aquifer, Texas, 2014-2015. Discrete measurements of nitrate were made by using a nitrate sensor. Precipitation data from two sites in the National Oceanic and Atmospheric Administration Global Historical Climatology Network are included in the dataset. The continuous monitoring data were collected using water quality sensors and include hourly measurements of nitrate, specific conductance, and water level in two wells. Discrete measurements of nitrate, specific conductance, and vertical flow rate were collected from one well site at different depths throughout the well bore.
H
Polarization Measurement and Inference in Many Dimensions when Subgroups...
dataverse.harvard.edu
Updated Sep 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gordon Anderson (2017). Polarization Measurement and Inference in Many Dimensions when Subgroups Cannot be Identified [Dataset] [Dataset]. http://doi.org/10.7910/DVN/0BPRU2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/0BPRU2
Dataset updated
Sep 8, 2017
Dataset provided by
Harvard Dataverse
Authors
Gordon Anderson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1987 - 2001
Area covered
China
Description
The most popular general univariate polarization indexes for discrete and continuous variables are extended and combined to describe the extent of polarization between agents in a distribution defined over a collection of many discrete and continuous agent characteristics. A formula for the asymptotic variance of the index is also provided. The implementation of the index is illustrated with an application to Chinese urban household data drawn from six provinces in the years 1987 and 2001 (years spanning the growth and urbanization period subsequent to the economic reforms). The data relates to household adult equivalent log income, adult equivalent living space, which are both continuous variables and the education of the head of household which is a discrete variable. For this data set combining the characteristics changes the view of polarization that would be inferred from considering the indices individually.
Detecting Anomalies in Multivariate Data Sets with Switching Sequences and...
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Detecting Anomalies in Multivariate Data Sets with Switching Sequences and Continuous Streams Followers 0 --> [Dataset]. https://data.nasa.gov/dataset/detecting-anomalies-in-multivariate-data-sets-with-switching-sequences-and-continuous-stre
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
Numpy , pandas and matplot lib practice
kaggle.com
zip
Updated Jul 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pratham saraf (2023). Numpy , pandas and matplot lib practice [Dataset]. https://www.kaggle.com/datasets/prathamsaraf1389/numpy-pandas-and-matplot-lib-practise/suggestions
Explore at:
zip(385020 bytes)Available download formats
Dataset updated
Jul 16, 2023
Authors
pratham saraf
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
The dataset has been created specifically for practicing Python, NumPy, Pandas, and Matplotlib. It is designed to provide a hands-on learning experience in data manipulation, analysis, and visualization using these libraries.

Specifics of the Dataset:

The dataset consists of 5000 rows and 20 columns, representing various features with different data types and distributions. The features include numerical variables with continuous and discrete distributions, categorical variables with multiple categories, binary variables, and ordinal variables. Each feature has been generated using different probability distributions and parameters to introduce variations and simulate real-world data scenarios. The dataset is synthetic and does not represent any real-world data. It has been created solely for educational purposes.

One of the defining characteristics of this dataset is the intentional incorporation of various real-world data challenges:

Certain columns are randomly selected to be populated with NaN values, effectively simulating the common challenge of missing data. - The proportion of these missing values in each column varies randomly between 1% to 70%. - Statistical noise has been introduced in the dataset. For numerical values in some features, this noise adheres to a distribution with mean 0 and standard deviation 0.1. - Categorical noise is introduced in some features', with its categories randomly altered in about 1% of the rows. Outliers have also been embedded in the dataset, resonating with the Interquartile Range (IQR) rule

Context of the Dataset:

The dataset aims to provide a comprehensive playground for practicing Python, NumPy, Pandas, and Matplotlib. It allows learners to explore data manipulation techniques, perform statistical analysis, and create visualizations using the provided features. By working with this dataset, learners can gain hands-on experience in data cleaning, preprocessing, feature engineering, and visualization. Sources of the Dataset:

The dataset has been generated programmatically using Python's random number generation functions and probability distributions. No external sources or real-world data have been used in creating this dataset.
Identify the Data type (Continuous/Discrete)
kaggle.com
zip
Updated Mar 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubh (2021). Identify the Data type (Continuous/Discrete) [Dataset]. https://www.kaggle.com/shubhamsharma777/identify-the-data-type-continuousdiscrete
Explore at:
zip(69799 bytes)Available download formats
Dataset updated
Mar 10, 2021
Authors
Shubh
Description
Dataset

This dataset was created by Shubh

Contents
Water Quality Data
data.cnra.ca.gov
data.ca.gov
+1more
csv
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Water Resources (2025). Water Quality Data [Dataset]. https://data.cnra.ca.gov/dataset/water-quality-data
Explore at:
csv(334801812), csv(1084649919), csv(5978718), csv(112098838)Available download formats
Dataset updated
Nov 26, 2025
Dataset authored and provided by
California Department of Water Resourceshttp://www.water.ca.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The California Department of Water Resources (DWR) discrete (vs. continuous) water quality datasets contains DWR-collected, current and historical, chemical and physical parameters found in routine environmental, regulatory compliance monitoring, and special studies throughout the state.
U
Harmonized discrete and continuous water quality data in support of modeling...
data.usgs.gov
datasets.ai
+2more
Updated Oct 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lindsay Platt; Yaojia Chen; Jennifer Murphy; Elizabeth Nystrom; Noah Schmadel; Sarah Stackpoole; Michael Stouder; Jacob Zwart (2022). Harmonized discrete and continuous water quality data in support of modeling harmful algal blooms in the Illinois River Basin, 2005 - 2020 [Dataset]. http://doi.org/10.5066/P9RISQGE
Explore at:
Unique identifier
https://doi.org/10.5066/P9RISQGE
Dataset updated
Oct 16, 2022
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Lindsay Platt; Yaojia Chen; Jennifer Murphy; Elizabeth Nystrom; Noah Schmadel; Sarah Stackpoole; Michael Stouder; Jacob Zwart
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Oct 11, 2005 - Dec 31, 2020
Description
Harmful algal blooms (HABs) are overgrowths of algae or cyanobacteria in water and can be harmful to humans and animals directly via toxin exposure or indirectly via changes in water quality and related impacts to ecosystems services, drinking water characteristics, and recreation. While HABs occur frequently throughout the United States, the driving conditions behind them are not well understood, especially in flowing waters. In order to facilitate future model development and characterization of HABs in the Illinois River Basin, this data release publishes a synthesized and cleaned collection of HABs-related water quality and quantity data for river and stream sites in the basin. It includes nutrients, major ions, sediment, physical properties, streamflow, chlorophyll and other types of water data. This data release contains files of harmonized data from the USGS National Water Information System (NWIS), the U.S. Army Corps of Engineers (USACE), the Illinois Environmental Protec ...
f
Data from: Family-Wise Error Rate Controlling Procedures for Discrete Data
figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yalin Zhu; Wenge Guo (2023). Family-Wise Error Rate Controlling Procedures for Discrete Data [Dataset]. http://doi.org/10.6084/m9.figshare.9545174.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9545174.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Yalin Zhu; Wenge Guo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In applications such as clinical safety analysis, the data of the experiments usually consist of frequency counts. In the analysis of such data, researchers often face the problem of multiple testing based on discrete test statistics, aimed at controlling family-wise error rate (FWER). Most existing FWER controlling procedures are developed for continuous data, which are often conservative when analyzing discrete data. By using minimal attainable p-values, several FWER controlling procedures have been specifically developed for discrete data in the literature. In this article, by using known marginal distributions of true null p-values, three more powerful stepwise procedures are developed, which are modified versions of the conventional Bonferroni, Holm and Hochberg procedures, respectively. It is shown that the first two procedures strongly control the FWER under arbitrary dependence and are more powerful than the existing Tarone-type procedures, while the last one only ensures control of the FWER in special settings. Through extensive simulation studies, we provide numerical evidence of superior performance of the proposed procedures in terms of the FWER control and minimal power. A real clinical safety data are used to demonstrate applications of our proposed procedures. An R package “MHTdiscrete” and a web application are developed for implementing the proposed procedures.
U
Water-Quality Data for Discrete Samples and Continuous Monitoring on the...
data.usgs.gov
datasets.ai
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaitlin Laabs, Water-Quality Data for Discrete Samples and Continuous Monitoring on the Merrimack River, Massachusetts, June to September 2020 [Dataset]. http://doi.org/10.5066/P9H19THP
Explore at:
Unique identifier
https://doi.org/10.5066/P9H19THP
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Kaitlin Laabs
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jun 2, 2020 - Sep 30, 2020
Area covered
Merrimack River, Massachusetts
Description
This data release includes water-quality data collected at up to thirteen locations along the Merrimack River and Merrimack River Estuary in Massachusetts. In this study, conducted by the U.S. Geological Survey (USGS) in cooperation with the Massachusetts Department of Environmental Protection, discrete samples were collected, and continuous monitoring was completed from June to September 2020. The data include results of measured field properties (water temperature, specific conductivity, pH, dissolved oxygen) and laboratory concentrations of nitrogen and phosphorus species, total carbon, pheophytin-a, and chlorophyll-a. These data were collected to assess selected (mainly nutrients) water-quality conditions in the Merrimack River and Merrimack River Estuary at the thirteen locations and identify areas where more water-quality monitoring is needed. The discrete samples and continuous-monitoring data are also available in the USGS National Water Information System at https://wate ...
d
EcoSheds Summarized Designing Sustainable Landscapes Project Data for Hydro...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). EcoSheds Summarized Designing Sustainable Landscapes Project Data for Hydro Region 2 [Dataset]. https://catalog.data.gov/dataset/ecosheds-summarized-designing-sustainable-landscapes-project-data-for-hydro-region-2
Explore at:
Dataset updated
Nov 12, 2025
Dataset provided by
U.S. Geological Survey
Description
Summarization of the University of Massachusetts Landscape Ecology Lab Designing Sustainable Landscapes (DSL) datasets with the Spatial Hydro-Ecological Decision System (SHEDS) framework. These DSL data were summarized using the local and upstream total accumulation methods within SHEDS. The result are two sets of data, a continuous dataset and a discrete dataset. The continuous dataset contains the average value for the local SHEDS catchments and the area-weighted sums of the averages for the local and all upstream SHEDS catchments for all continuous variables in the DSL dataset. The discrete dataset contains the area in square meters covered by each class within all discrete variables in the DSL dataset for the local SHEDS catchments along with the area-weighted sum of the local and all upstream SHEDS catchment values.
m
Data for: Collapse mechanism analysis of historic masonry structures...
data.mendeley.com
Updated May 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Portioli (2019). Data for: Collapse mechanism analysis of historic masonry structures subjected to lateral loads: a comparison between continuous and discrete models [Dataset]. http://doi.org/10.17632/ycxvmj77x5.1
Explore at:
Unique identifier
https://doi.org/10.17632/ycxvmj77x5.1
Dataset updated
May 6, 2019
Authors
Francesco Portioli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finite element mesh, rigid block model coordinates and rigid block CAD models of numerical case study
UCI Automobile Dataset
kaggle.com
Updated Feb 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Otrivedi (2023). UCI Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/otrivedi/automobile-data/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Otrivedi
Description
In this project, I have done exploratory data analysis on the UCI Automobile dataset available at https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

This dataset consists of data From the 1985 Ward's Automotive Yearbook. Here are the sources

1) 1985 Model Import Car and Truck Specifications, 1985 Ward's Automotive Yearbook. 2) Personal Auto Manuals, Insurance Services Office, 160 Water Street, New York, NY 10038 3) Insurance Collision Report, Insurance Institute for Highway Safety, Watergate 600, Washington, DC 20037

Number of Instances: 398 Number of Attributes: 9 including the class attribute

Attribute Information:

mpg: continuous cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance)

This data set consists of three types of entities:

I - The specification of an auto in terms of various characteristics

II - Tts assigned an insurance risk rating. This corresponds to the degree to which the auto is riskier than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is riskier (or less), this symbol is adjusted by moving it up (or down) the scale. Actuaries call this process "symboling".

III - Its normalized losses in use as compared to other cars. This is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/specialty, etc...), and represents the average loss per car per year.

The analysis is divided into two parts:

Data Wrangling

Pre-processing data in python

Dealing with missing values

Data formatting

Data normalization

Binning

Exploratory Data Analysis

Descriptive statistics

Groupby

Analysis of variance

Correlation

Correlation stats

Acknowledgment Dataset: UCI Machine Learning Repository Data link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data
Data from: Multiple Kernel Learning for Heterogeneous Anomaly Detection:...
data.nasa.gov
datasets.ai
+3more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nasa.gov (2025). Multiple Kernel Learning for Heterogeneous Anomaly Detection: Algorithm and Aviation Safety Case Study [Dataset]. https://data.nasa.gov/dataset/multiple-kernel-learning-for-heterogeneous-anomaly-detection-algorithm-and-aviation-safety
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. In this paper, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequences of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also discuss results on real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
Mean Amplitude Glucose Excursion Interpolation
kaggle.com
zip
Updated Sep 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Merinda Lestandy (2020). Mean Amplitude Glucose Excursion Interpolation [Dataset]. https://www.kaggle.com/merinda33/mage-interpolation
Explore at:
zip(34322 bytes)Available download formats
Dataset updated
Sep 9, 2020
Authors
Merinda Lestandy
Description
Context

Blood Glucose discrete data set that already interpolated by Spline Method to measure value of MAGE. This data set aim at to find the alternative than using CGM (Continuous Glucose Monitoring) to predict diabetic using discrete data. The discrete data obtained from 27 fluctuations of blood glucose within 3 days that taken by Glucometer. After the data go through Interpolation method, there are 150+ point that can re-present as similar as CGM model.

Content

There are 42 Patients Column A as CLASS means divide the conditions into 3 groups (1 for Pre-Diabet patient, 2 for Diabet patient, 3 for Normal patient)

Acknowledgements

Thank you for 42 volunteers that who are willing to spend time and energy for this study Related article - http://beei.org/index.php/EEI/article/view/2387

Inspiration

Hope with this data can create another study relate with predict Diabetic to personal user, so we can monitor our life-style
black website
kaggle.com
zip
Updated Mar 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
listone (2023). black website [Dataset]. https://www.kaggle.com/datasets/listone/black-website
Explore at:
zip(22129518491 bytes)Available download formats
Dataset updated
Mar 23, 2023
Authors
listone
Description
The data can only be used for scientific research and commercial use is strictly prohibited. This is a underground industry web site dataset. It contains nearly 400,000 pieces of data. Each piece of data contains 14 attributes. All properties are contained in the result.json file. | Property | describes | data type | | --- | --- | --- | | ip | IP address | character string | | port | port number | continuous data| | server | web container |discrete data | | domain | domain name |text (domain name) | | title | site title |text | | org | organization |discrete data | | country | country |discrete data | | city | city |discrete data | | html | HTML original code |text | | screen | website screenshot | image| | header | Web response header information | text| | subject.CN | Common name information for SSL certificates |text (domain name) | | subject.N | SSL certificate subject optional name | text (list of domain names)| | links | Site external link |text (list of domain names) |
U
Data from: Data for multiple linear regression models for predicting...
data.usgs.gov
datasets.ai
+2more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Mosbrucker; Michael Zoeller; David Ramsey, Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. http://doi.org/10.5066/P9F1ZU8O
Explore at:
Unique identifier
https://doi.org/10.5066/P9F1ZU8O
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Adam Mosbrucker; Michael Zoeller; David Ramsey
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jun 20, 2013 - Dec 19, 2017
Area covered
Ohio
Description
Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models w ...
U
Discrete and daily-aligned groundwater levels, metadata, and other...
data.usgs.gov
s.cnmilf.com
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angela Robinson; Erik Wojtylko; William Asquith; Ronald Seanor; Courtney Killian; Virginia McGuire, Discrete and daily-aligned groundwater levels, metadata, and other attributes useful for statistical modeling for the Mississippi River Valley Alluvial aquifer, Mississippi Alluvial Plain, 1980–2019 [Dataset]. http://doi.org/10.5066/P9O3XGBK
Explore at:
Unique identifier
https://doi.org/10.5066/P9O3XGBK
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Angela Robinson; Erik Wojtylko; William Asquith; Ronald Seanor; Courtney Killian; Virginia McGuire
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jan 1, 1980 - Dec 31, 2019
Area covered
Mississippi River Alluvial Plain, Mississippi River
Description
A combination of discrete and daily-aligned groundwater levels for the Mississippi River Valley alluvial aquifer clipped to the Mississippi Alluvial Plain, as defined by Painter and Westerman (2018), with corresponding metadata are based on processing of U.S. Geological Survey National Water Information System (NWIS) (U.S. Geological Survey, 2020) data. The processing was made after retrieval using aggregation and filtering through the infoGW2visGWDB software (Asquith and Seanor, 2019). The nomenclature GWmaster mimics that of the output from infoGW2visGWDB. Two separate data retrievals for NWIS were made. First, the discrete data were retrieved, and second, continuous records from recorder sites with daily-mean or other daily statistics codes were retrieved. Each dataset was separately passed through the infoGW2visGWDB software to create a "GWmaster discrete" and "GWmaster continuous" and these tables were combined and then sorted on the site identifier and date to form the data ...
Fleet Level Anomaly Detection of Aviation Safety Data - Dataset - NASA Open...
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Fleet Level Anomaly Detection of Aviation Safety Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/fleet-level-anomaly-detection-of-aviation-safety-data
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
For the purposes of this paper, the National Airspace System (NAS) encompasses the operations of all aircraft which are subject to air traffic control procedures. The NAS is a highly complex dynamic system that is sensitive to aeronautical decision-making and risk management skills. In order to ensure a healthy system with safe flights a systematic approach to anomaly detection is very important when evaluating a given set of circumstances and for determination of the best possible course of action. Given the fact that the NAS is a vast and loosely integrated network of systems, it requires improved safety assurance capabilities to maintain an extremely low accident rate under increasingly dense operating conditions. Data mining based tools and techniques are required to support and aid operators’ (such as pilots, management, or policy makers) overall decision-making capacity. Within the NAS, the ability to analyze fleetwide aircraft data autonomously is still considered a significantly challenging task. For our purposes a fleet is defined as a group of aircraft sharing generally compatible parameter lists. Here, in this effort, we aim at developing a system level analysis scheme. In this paper we address the capability for detection of fleetwide anomalies as they occur, which itself is an important initiative toward the safety of the real-world flight operations. The flight data recorders archive millions of data points with valuable information on flights everyday. The operational parameters consist of both continuous and discrete (binary & categorical) data from several critical subsystems and numerous complex procedures. In this paper, we discuss a system level anomaly detection approach based on the theory of kernel learning to detect potential safety anomalies in a very large data base of commercial aircraft. We also demonstrate that the proposed approach uncovers some operationally significant events due to environmental, mechanical, and human factors issues in high dimensional, multivariate Flight Operations Quality Assurance (FOQA) data. We present the results of our detection algorithms on real FOQA data from a regional carrier.

Facebook

Twitter

Click to copy link

Link copied

Cite

Brian C. Ross (2023). Mutual Information between Discrete and Continuous Data Sets [Dataset]. http://doi.org/10.1371/journal.pone.0087357

Mutual Information between Discrete and Continuous Data Sets

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0087357

Dataset updated

May 30, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Brian C. Ross

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.

Clear search

Close search

Google apps

Main menu

Mutual Information between Discrete and Continuous Data Sets

Data from: Continuous-time spatially explicit capture-recapture models, with...

Data from: Continuous monitoring and discrete water-quality data from...

Polarization Measurement and Inference in Many Dimensions when Subgroups...

Detecting Anomalies in Multivariate Data Sets with Switching Sequences and...

Numpy , pandas and matplot lib practice

Identify the Data type (Continuous/Discrete)

Dataset

Contents

Water Quality Data

Harmonized discrete and continuous water quality data in support of modeling...

Data from: Family-Wise Error Rate Controlling Procedures for Discrete Data

Water-Quality Data for Discrete Samples and Continuous Monitoring on the...

EcoSheds Summarized Designing Sustainable Landscapes Project Data for Hydro...

Data for: Collapse mechanism analysis of historic masonry structures...

UCI Automobile Dataset

Data from: Multiple Kernel Learning for Heterogeneous Anomaly Detection:...

Mean Amplitude Glucose Excursion Interpolation

Context

Content

Acknowledgements

Inspiration

black website

Data from: Data for multiple linear regression models for predicting...

Discrete and daily-aligned groundwater levels, metadata, and other...

Fleet Level Anomaly Detection of Aviation Safety Data - Dataset - NASA Open...

Mutual Information between Discrete and Continuous Data Sets