76 datasets found
  1. Mutual Information between Discrete and Continuous Data Sets

    • plos.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian C. Ross (2023). Mutual Information between Discrete and Continuous Data Sets [Dataset]. http://doi.org/10.1371/journal.pone.0087357
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Brian C. Ross
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.

  2. n

    Data from: Continuous-time spatially explicit capture-recapture models, with...

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated Apr 21, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Foster; Bart Harmsen; Lorenzo Milazzo; Greg Distiller; David Borchers (2014). Continuous-time spatially explicit capture-recapture models, with an application to a jaguar camera-trap survey [Dataset]. http://doi.org/10.5061/dryad.mg5kv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 21, 2014
    Dataset provided by
    University of Cambridge
    University of Cape Town
    University of Belize
    University of St Andrews
    Authors
    Rebecca Foster; Bart Harmsen; Lorenzo Milazzo; Greg Distiller; David Borchers
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Cockscomb Basin Wildlife Sanctuary, Belize
    Description

    Many capture-recapture surveys of wildlife populations operate in continuous time but detections are typically aggregated into occasions for analysis, even when exact detection times are available. This discards information and introduces subjectivity, in the form of decisions about occasion definition. We develop a spatio-temporal Poisson process model for spatially explicit capture-recapture (SECR) surveys that operate continuously and record exact detection times. We show that, except in some special cases (including the case in which detection probability does not change within occasion), temporally aggregated data do not provide sufficient statistics for density and related parameters, and that when detection probability is constant over time our continuous-time (CT) model is equivalent to an existing model based on detection frequencies. We use the model to estimate jaguar density from a camera-trap survey and conduct a simulation study to investigate the properties of a CT estimator and discrete-occasion estimators with various levels of temporal aggregation. This includes investigation of the effect on the estimators of spatio-temporal correlation induced by animal movement. The CT estimator is found to be unbiased and more precise than discrete-occasion estimators based on binary capture data (rather than detection frequencies) when there is no spatio-temporal correlation. It is also found to be only slightly biased when there is correlation induced by animal movement, and to be more robust to inadequate detector spacing, while discrete-occasion estimators with binary data can be sensitive to occasion length, particularly in the presence of inadequate detector spacing. Our model includes as a special case a discrete-occasion estimator based on detection frequencies, and at the same time lays a foundation for the development of more sophisticated CT models and estimators. It allows modelling within-occasion changes in detectability, readily accommodates variation in detector effort, removes subjectivity associated with user-defined occasions, and fully utilises CT data. We identify a need for developing CT methods that incorporate spatio-temporal dependence in detections and see potential for CT models being combined with telemetry-based animal movement models to provide a richer inference framework.

  3. d

    Data from: Continuous monitoring and discrete water-quality data from...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Continuous monitoring and discrete water-quality data from groundwater wells in the Edwards aquifer, Texas, 2014–15 [Dataset]. https://catalog.data.gov/dataset/continuous-monitoring-and-discrete-water-quality-data-from-groundwater-wells-in-the-edward
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Texas
    Description

    In cooperation with the San Antonio Water System, continuous and discrete water-quality data were collected from groundwater wells completed in the Edwards aquifer, Texas, 2014-2015. Discrete measurements of nitrate were made by using a nitrate sensor. Precipitation data from two sites in the National Oceanic and Atmospheric Administration Global Historical Climatology Network are included in the dataset. The continuous monitoring data were collected using water quality sensors and include hourly measurements of nitrate, specific conductance, and water level in two wells. Discrete measurements of nitrate, specific conductance, and vertical flow rate were collected from one well site at different depths throughout the well bore.

  4. H

    Polarization Measurement and Inference in Many Dimensions when Subgroups...

    • dataverse.harvard.edu
    Updated Sep 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gordon Anderson (2017). Polarization Measurement and Inference in Many Dimensions when Subgroups Cannot be Identified [Dataset] [Dataset]. http://doi.org/10.7910/DVN/0BPRU2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Gordon Anderson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1987 - 2001
    Area covered
    China
    Description

    The most popular general univariate polarization indexes for discrete and continuous variables are extended and combined to describe the extent of polarization between agents in a distribution defined over a collection of many discrete and continuous agent characteristics. A formula for the asymptotic variance of the index is also provided. The implementation of the index is illustrated with an application to Chinese urban household data drawn from six provinces in the years 1987 and 2001 (years spanning the growth and urbanization period subsequent to the economic reforms). The data relates to household adult equivalent log income, adult equivalent living space, which are both continuous variables and the education of the head of household which is a discrete variable. For this data set combining the characteristics changes the view of polarization that would be inferred from considering the indices individually.

  5. Detecting Anomalies in Multivariate Data Sets with Switching Sequences and...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Detecting Anomalies in Multivariate Data Sets with Switching Sequences and Continuous Streams Followers 0 --> [Dataset]. https://data.nasa.gov/dataset/detecting-anomalies-in-multivariate-data-sets-with-switching-sequences-and-continuous-stre
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.

  6. Numpy , pandas and matplot lib practice

    • kaggle.com
    zip
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pratham saraf (2023). Numpy , pandas and matplot lib practice [Dataset]. https://www.kaggle.com/datasets/prathamsaraf1389/numpy-pandas-and-matplot-lib-practise/suggestions
    Explore at:
    zip(385020 bytes)Available download formats
    Dataset updated
    Jul 16, 2023
    Authors
    pratham saraf
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    The dataset has been created specifically for practicing Python, NumPy, Pandas, and Matplotlib. It is designed to provide a hands-on learning experience in data manipulation, analysis, and visualization using these libraries.

    Specifics of the Dataset:

    The dataset consists of 5000 rows and 20 columns, representing various features with different data types and distributions. The features include numerical variables with continuous and discrete distributions, categorical variables with multiple categories, binary variables, and ordinal variables. Each feature has been generated using different probability distributions and parameters to introduce variations and simulate real-world data scenarios. The dataset is synthetic and does not represent any real-world data. It has been created solely for educational purposes.

    One of the defining characteristics of this dataset is the intentional incorporation of various real-world data challenges:

    Certain columns are randomly selected to be populated with NaN values, effectively simulating the common challenge of missing data. - The proportion of these missing values in each column varies randomly between 1% to 70%. - Statistical noise has been introduced in the dataset. For numerical values in some features, this noise adheres to a distribution with mean 0 and standard deviation 0.1. - Categorical noise is introduced in some features', with its categories randomly altered in about 1% of the rows. Outliers have also been embedded in the dataset, resonating with the Interquartile Range (IQR) rule

    Context of the Dataset:

    The dataset aims to provide a comprehensive playground for practicing Python, NumPy, Pandas, and Matplotlib. It allows learners to explore data manipulation techniques, perform statistical analysis, and create visualizations using the provided features. By working with this dataset, learners can gain hands-on experience in data cleaning, preprocessing, feature engineering, and visualization. Sources of the Dataset:

    The dataset has been generated programmatically using Python's random number generation functions and probability distributions. No external sources or real-world data have been used in creating this dataset.

  7. Identify the Data type (Continuous/Discrete)

    • kaggle.com
    zip
    Updated Mar 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubh (2021). Identify the Data type (Continuous/Discrete) [Dataset]. https://www.kaggle.com/shubhamsharma777/identify-the-data-type-continuousdiscrete
    Explore at:
    zip(69799 bytes)Available download formats
    Dataset updated
    Mar 10, 2021
    Authors
    Shubh
    Description

    Dataset

    This dataset was created by Shubh

    Contents

  8. Water Quality Data

    • data.cnra.ca.gov
    • data.ca.gov
    • +1more
    csv
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Water Resources (2025). Water Quality Data [Dataset]. https://data.cnra.ca.gov/dataset/water-quality-data
    Explore at:
    csv(334801812), csv(1084649919), csv(5978718), csv(112098838)Available download formats
    Dataset updated
    Nov 26, 2025
    Dataset authored and provided by
    California Department of Water Resourceshttp://www.water.ca.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The California Department of Water Resources (DWR) discrete (vs. continuous) water quality datasets contains DWR-collected, current and historical, chemical and physical parameters found in routine environmental, regulatory compliance monitoring, and special studies throughout the state.

  9. U

    Harmonized discrete and continuous water quality data in support of modeling...

    • data.usgs.gov
    • datasets.ai
    • +2more
    Updated Oct 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lindsay Platt; Yaojia Chen; Jennifer Murphy; Elizabeth Nystrom; Noah Schmadel; Sarah Stackpoole; Michael Stouder; Jacob Zwart (2022). Harmonized discrete and continuous water quality data in support of modeling harmful algal blooms in the Illinois River Basin, 2005 - 2020 [Dataset]. http://doi.org/10.5066/P9RISQGE
    Explore at:
    Dataset updated
    Oct 16, 2022
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Lindsay Platt; Yaojia Chen; Jennifer Murphy; Elizabeth Nystrom; Noah Schmadel; Sarah Stackpoole; Michael Stouder; Jacob Zwart
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Oct 11, 2005 - Dec 31, 2020
    Description

    Harmful algal blooms (HABs) are overgrowths of algae or cyanobacteria in water and can be harmful to humans and animals directly via toxin exposure or indirectly via changes in water quality and related impacts to ecosystems services, drinking water characteristics, and recreation. While HABs occur frequently throughout the United States, the driving conditions behind them are not well understood, especially in flowing waters. In order to facilitate future model development and characterization of HABs in the Illinois River Basin, this data release publishes a synthesized and cleaned collection of HABs-related water quality and quantity data for river and stream sites in the basin. It includes nutrients, major ions, sediment, physical properties, streamflow, chlorophyll and other types of water data. This data release contains files of harmonized data from the USGS National Water Information System (NWIS), the U.S. Army Corps of Engineers (USACE), the Illinois Environmental Protec ...

  10. f

    Data from: Family-Wise Error Rate Controlling Procedures for Discrete Data

    • figshare.com
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yalin Zhu; Wenge Guo (2023). Family-Wise Error Rate Controlling Procedures for Discrete Data [Dataset]. http://doi.org/10.6084/m9.figshare.9545174.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Yalin Zhu; Wenge Guo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In applications such as clinical safety analysis, the data of the experiments usually consist of frequency counts. In the analysis of such data, researchers often face the problem of multiple testing based on discrete test statistics, aimed at controlling family-wise error rate (FWER). Most existing FWER controlling procedures are developed for continuous data, which are often conservative when analyzing discrete data. By using minimal attainable p-values, several FWER controlling procedures have been specifically developed for discrete data in the literature. In this article, by using known marginal distributions of true null p-values, three more powerful stepwise procedures are developed, which are modified versions of the conventional Bonferroni, Holm and Hochberg procedures, respectively. It is shown that the first two procedures strongly control the FWER under arbitrary dependence and are more powerful than the existing Tarone-type procedures, while the last one only ensures control of the FWER in special settings. Through extensive simulation studies, we provide numerical evidence of superior performance of the proposed procedures in terms of the FWER control and minimal power. A real clinical safety data are used to demonstrate applications of our proposed procedures. An R package “MHTdiscrete” and a web application are developed for implementing the proposed procedures.

  11. U

    Water-Quality Data for Discrete Samples and Continuous Monitoring on the...

    • data.usgs.gov
    • datasets.ai
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaitlin Laabs, Water-Quality Data for Discrete Samples and Continuous Monitoring on the Merrimack River, Massachusetts, June to September 2020 [Dataset]. http://doi.org/10.5066/P9H19THP
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Kaitlin Laabs
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jun 2, 2020 - Sep 30, 2020
    Area covered
    Merrimack River, Massachusetts
    Description

    This data release includes water-quality data collected at up to thirteen locations along the Merrimack River and Merrimack River Estuary in Massachusetts. In this study, conducted by the U.S. Geological Survey (USGS) in cooperation with the Massachusetts Department of Environmental Protection, discrete samples were collected, and continuous monitoring was completed from June to September 2020. The data include results of measured field properties (water temperature, specific conductivity, pH, dissolved oxygen) and laboratory concentrations of nitrogen and phosphorus species, total carbon, pheophytin-a, and chlorophyll-a. These data were collected to assess selected (mainly nutrients) water-quality conditions in the Merrimack River and Merrimack River Estuary at the thirteen locations and identify areas where more water-quality monitoring is needed. The discrete samples and continuous-monitoring data are also available in the USGS National Water Information System at https://wate ...

  12. d

    EcoSheds Summarized Designing Sustainable Landscapes Project Data for Hydro...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). EcoSheds Summarized Designing Sustainable Landscapes Project Data for Hydro Region 2 [Dataset]. https://catalog.data.gov/dataset/ecosheds-summarized-designing-sustainable-landscapes-project-data-for-hydro-region-2
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    Summarization of the University of Massachusetts Landscape Ecology Lab Designing Sustainable Landscapes (DSL) datasets with the Spatial Hydro-Ecological Decision System (SHEDS) framework. These DSL data were summarized using the local and upstream total accumulation methods within SHEDS. The result are two sets of data, a continuous dataset and a discrete dataset. The continuous dataset contains the average value for the local SHEDS catchments and the area-weighted sums of the averages for the local and all upstream SHEDS catchments for all continuous variables in the DSL dataset. The discrete dataset contains the area in square meters covered by each class within all discrete variables in the DSL dataset for the local SHEDS catchments along with the area-weighted sum of the local and all upstream SHEDS catchment values.

  13. m

    Data for: Collapse mechanism analysis of historic masonry structures...

    • data.mendeley.com
    Updated May 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Portioli (2019). Data for: Collapse mechanism analysis of historic masonry structures subjected to lateral loads: a comparison between continuous and discrete models [Dataset]. http://doi.org/10.17632/ycxvmj77x5.1
    Explore at:
    Dataset updated
    May 6, 2019
    Authors
    Francesco Portioli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Finite element mesh, rigid block model coordinates and rigid block CAD models of numerical case study

  14. UCI Automobile Dataset

    • kaggle.com
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Otrivedi (2023). UCI Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/otrivedi/automobile-data/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Otrivedi
    Description

    In this project, I have done exploratory data analysis on the UCI Automobile dataset available at https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

    This dataset consists of data From the 1985 Ward's Automotive Yearbook. Here are the sources

    1) 1985 Model Import Car and Truck Specifications, 1985 Ward's Automotive Yearbook. 2) Personal Auto Manuals, Insurance Services Office, 160 Water Street, New York, NY 10038 3) Insurance Collision Report, Insurance Institute for Highway Safety, Watergate 600, Washington, DC 20037

    Number of Instances: 398 Number of Attributes: 9 including the class attribute

    Attribute Information:

    mpg: continuous cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance)

    This data set consists of three types of entities:

    I - The specification of an auto in terms of various characteristics

    II - Tts assigned an insurance risk rating. This corresponds to the degree to which the auto is riskier than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is riskier (or less), this symbol is adjusted by moving it up (or down) the scale. Actuaries call this process "symboling".

    III - Its normalized losses in use as compared to other cars. This is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/specialty, etc...), and represents the average loss per car per year.

    The analysis is divided into two parts:

    Data Wrangling

    1. Pre-processing data in python
    2. Dealing with missing values
    3. Data formatting
    4. Data normalization
    5. Binning
    6. Exploratory Data Analysis

    7. Descriptive statistics

    8. Groupby

    9. Analysis of variance

    10. Correlation

    11. Correlation stats

    Acknowledgment Dataset: UCI Machine Learning Repository Data link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

  15. Data from: Multiple Kernel Learning for Heterogeneous Anomaly Detection:...

    • data.nasa.gov
    • datasets.ai
    • +3more
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). Multiple Kernel Learning for Heterogeneous Anomaly Detection: Algorithm and Aviation Safety Case Study [Dataset]. https://data.nasa.gov/dataset/multiple-kernel-learning-for-heterogeneous-anomaly-detection-algorithm-and-aviation-safety
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. In this paper, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequences of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also discuss results on real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.

  16. Mean Amplitude Glucose Excursion Interpolation

    • kaggle.com
    zip
    Updated Sep 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Merinda Lestandy (2020). Mean Amplitude Glucose Excursion Interpolation [Dataset]. https://www.kaggle.com/merinda33/mage-interpolation
    Explore at:
    zip(34322 bytes)Available download formats
    Dataset updated
    Sep 9, 2020
    Authors
    Merinda Lestandy
    Description

    Context

    Blood Glucose discrete data set that already interpolated by Spline Method to measure value of MAGE. This data set aim at to find the alternative than using CGM (Continuous Glucose Monitoring) to predict diabetic using discrete data. The discrete data obtained from 27 fluctuations of blood glucose within 3 days that taken by Glucometer. After the data go through Interpolation method, there are 150+ point that can re-present as similar as CGM model.

    Content

    There are 42 Patients Column A as CLASS means divide the conditions into 3 groups (1 for Pre-Diabet patient, 2 for Diabet patient, 3 for Normal patient)

    Acknowledgements

    Thank you for 42 volunteers that who are willing to spend time and energy for this study Related article - http://beei.org/index.php/EEI/article/view/2387

    Inspiration

    Hope with this data can create another study relate with predict Diabetic to personal user, so we can monitor our life-style

  17. black website

    • kaggle.com
    zip
    Updated Mar 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    listone (2023). black website [Dataset]. https://www.kaggle.com/datasets/listone/black-website
    Explore at:
    zip(22129518491 bytes)Available download formats
    Dataset updated
    Mar 23, 2023
    Authors
    listone
    Description

    The data can only be used for scientific research and commercial use is strictly prohibited. This is a underground industry web site dataset. It contains nearly 400,000 pieces of data. Each piece of data contains 14 attributes. All properties are contained in the result.json file. | Property | describes | data type | | --- | --- | --- | | ip | IP address | character string | | port | port number | continuous data| | server | web container |discrete data | | domain | domain name |text (domain name) | | title | site title |text | | org | organization |discrete data | | country | country |discrete data | | city | city |discrete data | | html | HTML original code |text | | screen | website screenshot | image| | header | Web response header information | text| | subject.CN | Common name information for SSL certificates |text (domain name) | | subject.N | SSL certificate subject optional name | text (list of domain names)| | links | Site external link |text (list of domain names) |

  18. U

    Data from: Data for multiple linear regression models for predicting...

    • data.usgs.gov
    • datasets.ai
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Mosbrucker; Michael Zoeller; David Ramsey, Data for multiple linear regression models for predicting microcystin concentration action-level exceedances in selected lakes in Ohio [Dataset]. http://doi.org/10.5066/P9F1ZU8O
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Adam Mosbrucker; Michael Zoeller; David Ramsey
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jun 20, 2013 - Dec 19, 2017
    Area covered
    Ohio
    Description

    Site-specific multiple linear regression models were developed for eight sites in Ohio—six in the Western Lake Erie Basin and two in northeast Ohio on inland reservoirs--to quickly predict action-level exceedances for a cyanotoxin, microcystin, in recreational and drinking waters used by the public. Real-time models include easily- or continuously-measured factors that do not require that a sample be collected. Real-time models are presented in two categories: (1) six models with continuous monitor data, and (2) three models with on-site measurements. Real-time models commonly included variables such as phycocyanin, pH, specific conductance, and streamflow or gage height. Many of the real-time factors were averages over time periods antecedent to the time the microcystin sample was collected, including water-quality data compiled from continuous monitors. Comprehensive models use a combination of discrete sample-based measurements and real-time factors. Comprehensive models w ...

  19. U

    Discrete and daily-aligned groundwater levels, metadata, and other...

    • data.usgs.gov
    • s.cnmilf.com
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angela Robinson; Erik Wojtylko; William Asquith; Ronald Seanor; Courtney Killian; Virginia McGuire, Discrete and daily-aligned groundwater levels, metadata, and other attributes useful for statistical modeling for the Mississippi River Valley Alluvial aquifer, Mississippi Alluvial Plain, 1980–2019 [Dataset]. http://doi.org/10.5066/P9O3XGBK
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Angela Robinson; Erik Wojtylko; William Asquith; Ronald Seanor; Courtney Killian; Virginia McGuire
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 1980 - Dec 31, 2019
    Area covered
    Mississippi River Alluvial Plain, Mississippi River
    Description

    A combination of discrete and daily-aligned groundwater levels for the Mississippi River Valley alluvial aquifer clipped to the Mississippi Alluvial Plain, as defined by Painter and Westerman (2018), with corresponding metadata are based on processing of U.S. Geological Survey National Water Information System (NWIS) (U.S. Geological Survey, 2020) data. The processing was made after retrieval using aggregation and filtering through the infoGW2visGWDB software (Asquith and Seanor, 2019). The nomenclature GWmaster mimics that of the output from infoGW2visGWDB. Two separate data retrievals for NWIS were made. First, the discrete data were retrieved, and second, continuous records from recorder sites with daily-mean or other daily statistics codes were retrieved. Each dataset was separately passed through the infoGW2visGWDB software to create a "GWmaster discrete" and "GWmaster continuous" and these tables were combined and then sorted on the site identifier and date to form the data ...

  20. Fleet Level Anomaly Detection of Aviation Safety Data - Dataset - NASA Open...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Fleet Level Anomaly Detection of Aviation Safety Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/fleet-level-anomaly-detection-of-aviation-safety-data
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    For the purposes of this paper, the National Airspace System (NAS) encompasses the operations of all aircraft which are subject to air traffic control procedures. The NAS is a highly complex dynamic system that is sensitive to aeronautical decision-making and risk management skills. In order to ensure a healthy system with safe flights a systematic approach to anomaly detection is very important when evaluating a given set of circumstances and for determination of the best possible course of action. Given the fact that the NAS is a vast and loosely integrated network of systems, it requires improved safety assurance capabilities to maintain an extremely low accident rate under increasingly dense operating conditions. Data mining based tools and techniques are required to support and aid operators’ (such as pilots, management, or policy makers) overall decision-making capacity. Within the NAS, the ability to analyze fleetwide aircraft data autonomously is still considered a significantly challenging task. For our purposes a fleet is defined as a group of aircraft sharing generally compatible parameter lists. Here, in this effort, we aim at developing a system level analysis scheme. In this paper we address the capability for detection of fleetwide anomalies as they occur, which itself is an important initiative toward the safety of the real-world flight operations. The flight data recorders archive millions of data points with valuable information on flights everyday. The operational parameters consist of both continuous and discrete (binary & categorical) data from several critical subsystems and numerous complex procedures. In this paper, we discuss a system level anomaly detection approach based on the theory of kernel learning to detect potential safety anomalies in a very large data base of commercial aircraft. We also demonstrate that the proposed approach uncovers some operationally significant events due to environmental, mechanical, and human factors issues in high dimensional, multivariate Flight Operations Quality Assurance (FOQA) data. We present the results of our detection algorithms on real FOQA data from a regional carrier.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Brian C. Ross (2023). Mutual Information between Discrete and Continuous Data Sets [Dataset]. http://doi.org/10.1371/journal.pone.0087357
Organization logo

Mutual Information between Discrete and Continuous Data Sets

Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Brian C. Ross
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.

Search
Clear search
Close search
Google apps
Main menu