100+ datasets found
  1. Mutual Information between Discrete and Continuous Data Sets

    • plos.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian C. Ross (2023). Mutual Information between Discrete and Continuous Data Sets [Dataset]. http://doi.org/10.1371/journal.pone.0087357
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Brian C. Ross
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.

  2. n

    ACES CONTINUOUS DATA V1

    • access.uat.earthdata.nasa.gov
    • cmr.earthdata.nasa.gov
    • +5more
    pdf
    Updated Aug 16, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). ACES CONTINUOUS DATA V1 [Dataset]. http://doi.org/10.5067/ACES/MULIPLE/DATA101
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 16, 2017
    Time period covered
    Jul 10, 2002 - Present
    Area covered
    Description

    The ALTUS Cloud Electrification Study (ACES) was based at the Naval Air Facility Key West in Florida. During August, 2002, ACES researchers conducted overflights of thunderstorms over the southwestern corner of Florida. For the first time in NASA research, an uninhabited aerial vehicle (UAV) named ALTUS was used to collect cloud electrification data. Carrying field mills, optical sensors, electric field sensors and other instruments, ALTUS allowed scientists to collect cloudelectrification data for the first time from above the storm, from its birth through dissipation. This experiment allowed scientists to achieve the dual goals of gathering weather data safely and http://example.com/testing new aircraft technology. This dataset consists of data collected from seven instruments: the Slow/Fast antenna, Electric Field Mill, Dual Optical Pulse Sensor, Searchcoil Magnetometer, Accelerometers, Gerdien Conductivity Probe, and the Fluxgate Magnetometer. Data consists of sensor reads at 50HZ throughout the flight from all 64 channels.

  3. d

    Overcoming the pitfalls of categorizing continuous variables in ecology,...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roxanne Beltran; Corey Tarwater (2024). Overcoming the pitfalls of categorizing continuous variables in ecology, evolution, and behavior [Dataset]. http://doi.org/10.5061/dryad.5x69p8d9r
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Roxanne Beltran; Corey Tarwater
    Time period covered
    Jan 1, 2023
    Description

    Many variables in biological research - from body size to life history timing to environmental characteristics - are measured continuously (e.g., body mass in kilograms) but analyzed as categories (e.g., large versus small), which can lower statistical power and change interpretation. We conducted a mini-review of 72 recent publications in six popular ecology, evolution, and behavior journals to quantify the prevalence of categorization. We then summarized commonly categorized metrics and simulated a dataset to demonstrate the drawbacks of categorization using common variables and realistic examples. We show that categorizing continuous variables is common (31% of publications reviewed). We also underscore that predictor variables can and should be collected and analyzed continuously. Finally, we provide recommendations on how to keep variables continuous throughout the entire scientific process. Together, these pieces comprise an actionable guide to increasing statistical power and fac..., , , # Overcoming the pitfalls of categorizing continuous variables in ecology and evolutionary biology

    https://doi.org/10.5061/dryad.5x69p8d9r

    We simulated data to quantify the detrimental impact of categorizing continuous variables using various statistical breakpoints and sample sizes (details below). To give the example biological relevance, we created a dataset that illustrates the complexity of life history theory and climate change impacts, and contains a predictor variable that is frequently categorized (Table 2) - reproductive timing in one year and its effect on body size in the following year. A reasonable research question would be: How does timing of reproduction in year t influence body mass at the start of the breeding season in year t+1? For illustrative purposes, let’s say we collected data from individually banded penguins in Antarctica. Based on the mechanistic relationships between seasonally available sea ice and food availabi...

  4. d

    Data from: Imperial Valley Dark Fiber Project Continuous DAS Data

    • catalog.data.gov
    • data.openei.org
    • +3more
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lawrence Berkeley National Laboratory (2025). Imperial Valley Dark Fiber Project Continuous DAS Data [Dataset]. https://catalog.data.gov/dataset/imperial-valley-dark-fiber-project-continuous-das-data-a0338
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Lawrence Berkeley National Laboratory
    Description

    The Imperial Valley Dark Fiber Project acquired Distributed Acoustic Sensing (DAS) seismic data on a ~28 km segment of dark fiber between the cities of Calipatria and Imperial in the Imperial Valley, Southern California. Dark fiber refers to unused optical fiber cables in telecommunications networks and is repurposed in this project for DAS applications. The objective, which is further detailed in the attached journal article from Ajo-Franklin et al., is to demonstrate dark fiber DAS as a tool for basin-scale geothermal exploration and monitoring. The included DAS data were recorded during two days at the beginning the project. Data is stored in the .h5 (HDF5) file format, readable using various software tools, including the 'h5read' and 'h5info' functions in Matlab. Provided here are examples of methods to read and use the data with the 'h5py' package in Python.

  5. Example of how to manually extract incubation bouts from interactive plots...

    • figshare.com
    txt
    Updated Jan 22, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Bulla (2016). Example of how to manually extract incubation bouts from interactive plots of raw data - R-CODE and DATA [Dataset]. http://doi.org/10.6084/m9.figshare.2066784.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 22, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Martin Bulla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    {# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}

  6. S

    The Code and Empirical Data Example for Continuous Bounded Response IRT...

    • scidb.cn
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiang Youxiang; Wen Hongbo (2025). The Code and Empirical Data Example for Continuous Bounded Response IRT Model [Dataset]. http://doi.org/10.57760/sciencedb.psych.00700
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Jiang Youxiang; Wen Hongbo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the Stan code applicable to the continuous bounded Item Response Model (IRT) model and an empirical data.

  7. d

    Continuous Resistivity Profiling, Electrical Resistivity Tomography and...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Continuous Resistivity Profiling, Electrical Resistivity Tomography and Hydrologic Data Collected in 2017 from Indian River Lagoon, Florida [Dataset]. https://catalog.data.gov/dataset/continuous-resistivity-profiling-electrical-resistivity-tomography-and-hydrologic-data-col
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Indian River Lagoon, Florida
    Description

    Extending 200 kilometers (km) along the Atlantic Coast of Central Florida, Indian River Lagoon (IRL) is one of the most biologically diverse estuarine systems in the continental United States. The lagoon is characterized by shallow, brackish waters and a width that varies between 0.5 and 9.0 km; there is significant human development along both shores. Scientists from the U.S. Geological Survey (USGS) St. Petersburg Coastal and Marine Science Center used continuous resistivity profiling (CRP, a towed electronic array) measurements, electrical resistivity tomography (ERT), and basic physical water column properties (for example, depth and temperature) to investigate submarine groundwater discharge at two locations, Eau Gallie North and Riverwalk Park, along the western shore of IRL. Eau Gallie North is near the central section of IRL and Riverwalk Park is approximately 20 km north of the Eau Gallie site. At each CRP study site, an 11-electrode marine resistivity array was towed over seven north–south shore parallel transects (EA–EG and RA–RG, respectively), situated between 75–1000 meters offshore, and approximately 1.5 km in length. Each transect was mapped three times in an alternating north–south direction to account for data collected by the concurrently-operating radon mapping system (Everhart and others, 2018). Repeat streaming resistivity surveys were collected bimonthly along these same tracklines, between March and November 2017, to determine seasonal and temporal variability. Since resistivity is a function of both geology and salinity, it is assumed that temporal shifts will reflect salinity changes, as the underlying geology will be presumed to remain constant. ERT study areas consisted of land- and shallow water-based surveys, where [DC] electrical current was injected into the ground via two current electrodes and received by nine potential electrodes. Electrode positions for both sites were recorded along six transects (T01-T06) and are provided in this data release as supplemental information (please see the ERT location map files included in, ERT_survey_maps.zip).

  8. n

    Data from: Time series methods for the analysis of soundscapes and other...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natalie Yoh; Charlotte L. Haley; Zuzana Burivalova (2024). Time series methods for the analysis of soundscapes and other cyclical ecological data [Dataset]. http://doi.org/10.5061/dryad.xpnvx0kn6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Argonne National Laboratory
    University of Wisconsin–Madison
    University of Kent
    Authors
    Natalie Yoh; Charlotte L. Haley; Zuzana Burivalova
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Biodiversity monitoring has entered an era of ‘big data’, exemplified by a near-continuous collection of sounds, images, chemical and other signals from organisms in diverse ecosystems. Such data streams have the potential to help identify new threats, assess the effectiveness of conservation interventions, as well as generate new ecological insights. However, appropriate analytical methods are often still missing, particularly with respect to characterizing cyclical temporal patterns. Here, we present a framework for characterizing and analysing ecological responses that represent nonstationary, complex temporal patterns and demonstrate the value of using Fourier transforms to decorrelate continuous data points. In our example, we use a framework based on three approaches (spectral analysis, magnitude squared coherence, and principal component analysis) to characterize differences in tropical forest soundscapes within and across sites and seasons in Gabon. By reconstructing the underlying, cyclic behaviour of the soundscape for each site, we show how one can identify circadian patterns in acoustic activity. Soundscapes in the dry season had a complex diel cycle, requiring multiple harmonics to represent daily variation, while in the wet season there was less variance attributable to the daily cyclic patterns. Our framework can be applied to most continuous, or near-continuous ecological data collected at a fine temporal resolution, allowing ecologists to explore patterns of temporal autocorrelation at multiple levels for biologically meaningful trends. Such methods will become indispensable as biological big data are used to understand the impact of anthropogenic pressures on biodiversity and to inform efforts to mitigate them. Methods We used acoustic data collected from eight sites in the Ogooué-Ivindo province of Gabon to demonstrate how time-series approaches can be leveraged to compare cyclical trends within and between groups of sites. All soundscape sampling occurred in closed, Gabonese rainforest with minimal habitat disturbance for at least twenty years. First, we sampled the soundscape in the rainy season at four sites within Ivindo National Park, between February 19th and March 2nd 2021 (referred to as the Ivindo sites). Second, we sampled the soundscape in the dry season at four sites near Massaha between July 17th and July 23rd 2021 (hereafter referred to as the Massaha sites, about 15km from the Ivindo sites). At the time of sampling, the Massaha sites were located within a logging concession but no logging activity had commenced and there was an ongoing petition for the area to be re-designated as a community conservation area. Additionally, we used one site from the Lope National Park. At each site, we deployed one bioacoustic recorder to quantify the soundscape, separating each sampling site by at least 1 km to ensure independence. Sample points were also positioned at least 200 m from roads, trails, and rivers. At each site, we deployed one Bioacoustic Recorder (BAR-LT, Frontier Labs) at 1.8 m above ground, with a single omnidirectional microphone pointing down. Recorders were programmed for continuous, autonomous recording in 30-minute segments for at least six days and set to record at 40 dB gain and a sample rate of 44.1 kHz. Incomplete days, e.g. the day of deployment and collection, were excluded from the analysis, to prevent the inclusion of human sounds and disturbance to the soundscape. To characterize the soundscape, we calculated the soundscape index Power Minus Noise (PMN) for 256 frequency bins between 0-11 kHz (~43 Hz bandwidth each) and for each minute of the day, using `AnalysisPrograms.exe'. PMN is the difference between the maximum decibel of each frequency bin and the corresponding decibel of the background noise profile for that bin. Therefore, it provides a measure of the sound intensity for each frequency bin absent of background noise and provides a proxy for acoustic activity. For further analyses, we summed all 256 PMN values for each minute of the day, yielding 1440 data points per day per site. We chose the PMN index as an example index in our time series analyses, because of its relatively simple interpretation and statistical properties.

  9. Petre_Slide_CategoricalScatterplotFigShare.pptx

    • figshare.com
    pptx
    Updated Sep 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Sep 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Benj Petre; Aurore Coince; Sophien Kamoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorical scatterplots with R for biologists: a step-by-step guide

    Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

    1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

    Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

    Protocol

    • Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

    • Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

    • Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

    Notes

    • Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

    • Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

    7 Display the graph in a separate window. Dot colors indicate

    replicates

    graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

    References

    Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

    Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

    Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

    https://cran.r-project.org/

    http://ggplot2.org/

  10. d

    Data from: How to use discrete choice experiments to capture stakeholder...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan R. Ellis; Qiana R. Cryer-Coupet; Bridget E. Weller; Kirsten Howard; Rakhee Raghunandan; Kathleen C. Thomas (2025). How to use discrete choice experiments to capture stakeholder preferences in social work research [Dataset]. http://doi.org/10.5061/dryad.z612jm6m0
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Alan R. Ellis; Qiana R. Cryer-Coupet; Bridget E. Weller; Kirsten Howard; Rakhee Raghunandan; Kathleen C. Thomas
    Description

    The primary article (cited below under "Related works") introduces social work researchers to discrete choice experiments (DCEs) for studying stakeholder preferences. The article includes an online supplement with a worked example demonstrating DCE design and analysis with realistic simulated data. The worked example focuses on caregivers' priorities in choosing treatment for children with attention deficit hyperactivity disorder. This dataset includes the scripts (and, in some cases, Excel files) that we used to identify appropriate experimental designs, simulate population and sample data, estimate sample size requirements for the multinomial logit (MNL, also known as conditional logit) and random parameter logit (RPL) models, estimate parameters using the MNL and RPL models, and analyze attribute importance, willingness to pay, and predicted uptake. It also includes the associated data files (experimental designs, data generation parameters, simulated population data and parameters, ..., In the worked example, we used simulated data to examine caregiver preferences for 7 treatment attributes (medication administration, therapy location, school accommodation, caregiver behavior training, provider communication, provider specialty, and monthly out-of-pocket costs) identified by dosReis and colleagues in a previous DCE. We employed an orthogonal design with 1 continuous variable (cost) and 12 dummy-coded variables (representing the levels of the remaining attributes, which were categorical). Using the parameter estimates published by dosReis et al., with slight adaptations, we simulated utility values for a population of 100,000 people, then selected a sample of 500 for analysis. Relying on random utility theory, we used the mlogit package in R to estimate the MNL and RPL models, using 5,000 Halton draws for simulated maximum likelihood estimation of the RPL model. In addition to estimating the utility parameters, we measured the relative importance of each attribute, esti..., , # Data from: How to Use Discrete Choice Experiments to Capture Stakeholder Preferences in Social Work Research

    Access this dataset on Dryad

    This dataset supports the worked example in:

    Ellis, A. R., Cryer-Coupet, Q. R., Weller, B. E., Howard, K., Raghunandan, R., & Thomas, K. C. (2024). How to use discrete choice experiments to capture stakeholder preferences in social work research. Journal of the Society for Social Work and Research. Advance online publication. https://doi.org/10.1086/731310

    The referenced article introduces social work researchers to discrete choice experiments (DCEs) for studying stakeholder preferences. In a DCE, researchers ask participants to complete a series of choice tasks: hypothetical situations in which each participant is presented with alternative scenarios and selects one or more. For example, social work researchers may want to know how parents and other caregivers pr...

  11. RUL Dataset from Continuous Casting Machine

    • kaggle.com
    zip
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iurii Katser (2023). RUL Dataset from Continuous Casting Machine [Dataset]. https://www.kaggle.com/datasets/yuriykatser/rul-dataset-from-continuous-casting-machine
    Explore at:
    zip(422843 bytes)Available download formats
    Dataset updated
    Nov 16, 2023
    Authors
    Iurii Katser
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Problem background and equipment description

    A continuous casting machine (hereafter ‘CCM’) is a unit that transforms liquid steel into solid billets of a given section, from which rolling is subsequently produced (for example, rebars). The mould sleeve is the most critical and quickly worn part of the CCM mould. The sleeve is a water-cooled copper pipe with a round or profile section. The molten metal crystallizes in contact with the sleeve walls, and the primary solid shell of the ingot is formed. The main production issue that comes up during the operation of sleeves is that defects appear on the surface of the copper pipe of the sleeve and distort the profile of its inner cavity. This disrupts the thermal conditions, which in turn affects the quality of the resulting ingots. There can be shape defects (for example, the diagonals of a square ingot become unequal and the so-called ‘rhomboidity’ occurs), the dimensions of the sides can come out wrong, and the ingot corners may develop cracks. These defects lead to further problems in rolling: the decreased quality of products and the number of rejects adversely affect the economic efficiency of production. To prevent this, the sleeve dimensions are measured at certain intervals along the entire length. If these dimensions deviate from the design ones, the sleeve is rejected. Another issue is a shorter useful life of the copper sleeves of the mould used in production. This issue is often associated with a change in the operating parameters of the continuous casting machine itself. Such parameters include temperature of the incoming molten metal, temperature of cooling water and others. The actual useful life of a mould sleeve is often less than that stated by the manufacturer, which again leads to additional equipment downtime and increases the possibility of accidents and extra production costs. The expected useful life in tons should be as follows: - 17,000 tons for 180x180, - 13,000 tons for 150x150.

    Data acquisition

    In the course of CCM operation, the automatic control system that runs the process of casting ingots creates a database of casting parameters. The collected parameters are averaged data for all the strands in each cast; the only thing that differs is the resistance of the sleeve for each strand. After removing the mould sleeve for inspection, the initial data on the process parameters of casting, the geometry of obtained ingots and other attributes can be uploaded from the SCADA. The data were collected from a real production facility but after that they were processed, cleared, aggregated and prepared by the authors to solve the RUL problem.

    RUL column

    This column is formed from the column "resistance, tonn" where for each sleeve, num_crystallizer and num_stream from the highest resistance (moment of breaking) current value is subtracted.

    Tasks

    The main task, based on the dataset, was to develop a model for determining the remaining useful life in tons, or remaining casts, of the crystallizer sleeve (the ‘RUL problem’). It is recommended to solve the problem for each cast from the first to the last minus one. However, in addition to solving the RUL problem, it is always important for production to tackle the task of determining the main factors that influence the reduction and extension of the remaining useful life. This is relevant because many sleeves fail to operate up to the expected useful life, indicated above. To this end, you may want to set yourself to solve the following tasks: - identify the factors that affect the remaining useful life, - develop recommendations on how to increase it, - compare the performance of sleeves that have and have not had the target resistance and determine the parameters that brought this about.

  12. u

    Data from: Quantifying accuracy and precision from continuous response data...

    • fdr.uni-hamburg.de
    csv, r
    Updated May 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruns, Patrick (2022). Quantifying accuracy and precision from continuous response data in studies of spatial perception and crossmodal recalibration [Dataset]. http://doi.org/10.25592/uhhfdm.10183
    Explore at:
    csv, rAvailable download formats
    Dataset updated
    May 4, 2022
    Dataset provided by
    Universität Hamburg
    Authors
    Bruns, Patrick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data and code associated with the study "Quantifying accuracy and precision from continuous response data in studies of spatial perception and crossmodal recalibration" by Patrick Bruns, Caroline Thun, and Brigitte Röder.

    example_code.R contains analysis code that can be used to to calculate error-based and regression-based localization performance metrics from single-subject response data with a working example in R. It requires as inputs a numeric vector containing the stimulus location (true value) in each trial and a numeric vector containing the corresponding localization response (perceived value) in each trial.

    example_data.csv contains the data used in the working example of the analysis code.

    localization.csv contains extracted localization performance metrics from 188 subjects which were analyzed in the study to assess the agreement between error-based and regression-based measures of accuracy and precision. The subjects had all naively performed an azimuthal sound localization task (see related identifiers for the underlying raw data).

    recalibration.csv contains extracted localization performance metrics from a subsample of 57 subjects in whom data from a second sound localization test, performed after exposure to audiovisual stimuli in which the visual stimulus was consistently presented 13.5° to the right of the sound source, were available. The file contains baseline performance (pre) and changes in performance after audiovisual exposure relative to baseline (delta) in each of the localization performance metrics.

    Localization performance metrics were either derived from the single-trial localization errors (error-based approach) or from a linear regression of localization responses on the actual target locations (regression-based approach).The following localization performance metrics were included in the study:

    bias: overall bias of localization responses to the left (negative values) or to the right (positive values), equivalent to constant error (CE) in error-based approaches and intercept in regression-based approaches

    absolute constant error (aCE): absolute value of bias (or CE), indicates the amount of bias irrespective of direction

    mean absolute contant error (maCE): mean of the aCE per target location, reflects over- or underestimation of peripheral target locations

    variable error (VE): mean of the standard deviations (SD) of the single-trial localization errors at each target location

    pooled variable error (pVE): SD of the single-trial localization errors pooled across trials from all target locations

    absolute error (AE): mean of the absolute values of the single-trial localization errors, sensitive to both bias and variability of the localization responses

    slope: slope of the regression model function, indicates an overestimation (values > 1) or underestimation (values < 1) of peripheral target locations

    R2: coefficient of determination of the regression model, indicates the goodness of the fit of the localization responses to the regression line

  13. f

    Data from: Simultaneous Edit-Imputation for Continuous Microdata

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Nov 7, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Quanli; Kim, Hang J.; Karr, Alan F.; Cox, Lawrence H.; Reiter, Jerome P. (2015). Simultaneous Edit-Imputation for Continuous Microdata [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001908617
    Explore at:
    Dataset updated
    Nov 7, 2015
    Authors
    Wang, Quanli; Kim, Hang J.; Karr, Alan F.; Cox, Lawrence H.; Reiter, Jerome P.
    Description

    Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods to identify the variables to change followed by hot deck imputation. We present an approach that fully integrates editing and imputation for continuous microdata under linear constraints. Our approach relies on a Bayesian hierarchical model that includes (i) a flexible joint probability model for the underlying true values of the data with support only on the set of values that satisfy all editing constraints, (ii) a model for latent indicators of the variables that are in error, and (iii) a model for the reported responses for variables in error. We illustrate the potential advantages of the Bayesian editing approach over existing approaches using simulation studies. We apply the model to edit faulty data from the 2007 U.S. Census of Manufactures. Supplementary materials for this article are available online.

  14. G

    EGS Collab Experiment 2: Continuous Broadband Seismic Waveform Data

    • gdr.openei.org
    • data.openei.org
    • +3more
    image, website
    Updated Sep 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica Rodriguez Tribaldos; Veronica Rodriguez Tribaldos (2022). EGS Collab Experiment 2: Continuous Broadband Seismic Waveform Data [Dataset]. http://doi.org/10.15121/1907655
    Explore at:
    website, imageAvailable download formats
    Dataset updated
    Sep 12, 2022
    Dataset provided by
    Geothermal Data Repository
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Geothermal Technologies Program (EE-4G)
    Lawrence Berkeley National Laboratory
    Authors
    Veronica Rodriguez Tribaldos; Veronica Rodriguez Tribaldos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Two broadband seismometers were installed on the 4100 level and recorded for the duration of EGS Collab Experiment #2. Inspired by published data from similar instruments installed in the Aspo Hard Rock Lab, these long-period instruments aimed to measure the tilting of the drift in response to the injection of fluid into the testbed.

    One instrument was installed underneath the wellheads in Site A (aka the "battery" alcove) and the other was installed along the east wall of the drift, south of Site B. Due to the feet of gravel (ballast) laid along the floor of the drift, we were unable to anchor the sensors directly to the rock. As a result, the coupling of the sensors to the experiment rock volume is likely poor. In addition, there are a number of noise sources that complicate the interpretation of the data. For example, sensor BBB is installed adjacent (within 3 ft) to the rail line that runs towards the Ross shaft. Trains (motors) run along this line almost daily and produce a large signal in these data. Careful extraction of periods of interest, as well as filtering for specific signals, is necessary.

    The sensors are Nanometrics Trillium Compact Posthole seismometers, sensitive down to 120 seconds period. They were installed as close to the drift wall and as deep as we could manually excavate (only about 1 ft or so). The holes were leveled with sand and the sensors were placed on a paver before backfilling with sand. The hole was then covered by a bucket filled with insulation to improve the sensor's isolation from daily temperature variations, which are minor but present due to drift ventilation from the surface.

    Data were recorded on Nanometrics Centaur digitizers at 100 Hz. The full response information is available in the StationXML file provided here, or by querying the sensors through the IRIS DMC (see links below). These instruments were provided free of charge through the IRIS PASSCAL instrument center. The network code is XP and the station codes are BBA and BBB. The waveform data can be queried through the IRIS FDSN server using any method the user likes. One convenient option is to use the Obspy python package: https://docs.obspy.org/packages/obspy.clients.fdsn.html

  15. r

    The Continuous Categorical: An Over- Simplex-Valued Exponential Family

    • resodate.org
    • service.tib.eu
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elliott Gordon-Rodriguez; Gabriel Loaiza-Ganem; John P. Cunningham (2025). The Continuous Categorical: An Over- Simplex-Valued Exponential Family [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdGhlLWNvbnRpbnVvdXMtY2F0ZWdvcmljYWwtLWFuLW92ZXItLXNpbXBsZXgtdmFsdWVkLWV4cG9uZW50aWFsLWZhbWlseQ==
    Explore at:
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Leibniz Data Manager
    Authors
    Elliott Gordon-Rodriguez; Gabriel Loaiza-Ganem; John P. Cunningham
    Description

    Simplex-valued data appear throughout statistics and machine learning, for example in the context of transfer learning and compression of deep networks.

  16. v

    Groundwater Level Data: All Historic Data

    • anrgeodata.vermont.gov
    • hub.arcgis.com
    • +1more
    Updated Jul 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Idaho Department of Water Resources (2022). Groundwater Level Data: All Historic Data [Dataset]. https://anrgeodata.vermont.gov/documents/f1a190a2077c4b7da87b9cc19d0a316e
    Explore at:
    Dataset updated
    Jul 18, 2022
    Dataset authored and provided by
    Idaho Department of Water Resources
    Description

    IDWR maintains a groundwater level database containing data primarily collected by IDWR, but also includes data gathered by the USGS, USBR, and other public and private entities. Please reach out to these other entities to obtain their full complete record, as not all values are present in this database (IDWR can provide a full list of data contributors upon request). IDWR staff manually measure the "depth to water" in wells throughout Idaho. Pressure transducers in many wells provide near-continuous water level measurements. IDWR strives to create complete and accurate data and may revise these data when indicated.

    “Groundwater Level Data: All Historic Data” includes all well data managed in IDWR’s internal database, regardless of current well status. For example, historic data from discontinued, abandoned, or inactive wells are contained in this dataset. IDWR’s water level data are also hosted in the Groundwater Data Portal (https://idwr-groundwater-data.idaho.gov/), which displays only actively monitored wells.

    The three files included in this download are 1) discrete (manual) depth to water measurements 2) continuous* (pressure transducer) depth to water measurements, and 3) the associated well metadata.

    *The continuous measurements data have been condensed to display only the shallowest daily pressure transducer measurements. Complete datasets are available upon request.

  17. Current velocity data from a continuous survey using a towed ADCP in...

    • zenodo.org
    txt
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chunyan Li; Chunyan Li (2023). Current velocity data from a continuous survey using a towed ADCP in Wilmington River Estuary, Georgia, USA [Dataset]. http://doi.org/10.5281/zenodo.8140139
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chunyan Li; Chunyan Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Wilmington River
    Description

    Title: Current velocity data from a continuous survey using a towed ADCP in Wilmington River Estuary, Georgia, USA

    Author/Data Collector: Chunyan Li

    Point of Contact, PI, Originator: Chunyan Li (cli@lsu.edu)

    Description:

    These are velocity profile data from vessel towed ADCP obtained in the Wilmington River Estuary during a survey conducted on Sep. 29, 2004, for ~ 11.5 hours. The instrument was an RDI 600 KHz Workhorse ADCP.

    The ADCP was mounted on a sled towed by the boat. The vertical bins were 0.5 m. The surveys were conducted at an average cruise speed of about 2.5–3 m/s except at the turns when the vessel had to slow down and during CTD casts when the vessel had to stop for a few minutes. A Seabird Electronic SBE 19 plus CTD was used to measure the vertical profiles of water temperature, salinity, fluorescence, light attenuation, and dissolved oxygen during the survey. Note that only ADCP data are included in this dataset.

    The data are averaged at about 30-second intervals, excluding bad data. The data presented here are in ASCII with the generic format provided by the RDI’s software WinRiver II output. There are a total of two data files. There are:

    ADCP_Sep29_2004_WM_000_ASC.TXT

    ADCP_Sep29_2004_WM_001_ASC.TXT

    Here is an example of the data –

    50 50 42 50 1 20 1

    4 9 29 11 32 26 57 468 60 2.361 -0.824 118.753 24.276

    49.04 93.15 -0.17 -0.46 0.00 6.00 0.00 3.08 6.63 6.64 6.68 6.59

    30.53 28.91 26.93 14.22 30.45

    32.00327167 -81.01664167 31.50 92.41 30.4

    -22.0 -7.2 -3.2 -11.2 10.0 -10.8 10.0 1.53 5.53

    50 cm BT dB 0.43 0.073

    1.53 52.35 193.13 -11.9 -51.0 -0.3 3.5 92.1 94.8 94.6 95.6 100 -2.02

    2.03 49.89 189.47 -8.2 -49.2 -0.1 -2.1 97.7 99.9 100.4 100.4 100 -2.28

    2.53 52.68 187.22 -6.6 -52.3 0.7 4.0 99.0 101.3 101.7 101.2 98 -2.84

    3.03 48.92 190.92 -9.3 -48.0 -0.2 5.1 99.2 101.3 101.9 101.6 100 -2.21

    3.53 51.77 185.37 -4.8 -51.5 0.3 5.7 98.6 101.4 102.0 101.2 100 -2.99

    4.03 48.03 186.14 -5.1 -47.8 1.0 1.2 98.5 101.1 101.9 101.0 100 -2.64

    4.53 50.69 190.02 -8.8 -49.9 0.5 3.6 98.5 101.2 101.9 101.2 100 -2.32

    5.03 43.70 186.57 -5.0 -43.4 0.7 4.4 98.3 101.2 101.9 101.2 100 -2.34

    5.53 41.22 186.13 -4.4 -41.0 1.5 -0.3 98.5 101.6 102.0 101.4 83 -2.35

    6.03 -32768 -32768 -32768 -32768 -32768 -32768 255 255 255 255 0 2147483647

    The ADCP data were used in Li et al. (2008).

    Acknowledgements

    I would like to thank Captain Harry Carter who assisted me by driving the boat for the whole day. He also assisted me with CTD casts, deployment, and retrieval of other CTDs. It was a great day working out with him.

    References

    Li, C., C. Chen, D. Guadagnoli, and I. Y. Georgiou (2008). Geometry-induced residual eddies in estuaries with curved channels: Observations and modeling studies, Journal of Geophysical Research, Vol. 113, C01005, doi:10.1029/2006JC004031.

  18. f

    Data from: A continuous morphological approach to study the evolution of...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Dec 6, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kriebel, Ricardo; Sytsma, Kenneth J.; Khabbazian, Mohammad (2017). A continuous morphological approach to study the evolution of pollen in a phylogenetic context: An example with the order Myrtales [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001779415
    Explore at:
    Dataset updated
    Dec 6, 2017
    Authors
    Kriebel, Ricardo; Sytsma, Kenneth J.; Khabbazian, Mohammad
    Description

    The study of pollen morphology has historically allowed evolutionary biologists to assess phylogenetic relationships among Angiosperms, as well as to better understand the fossil record. During this process, pollen has mainly been studied by discretizing some of its main characteristics such as size, shape, and exine ornamentation. One large plant clade in which pollen has been used this way for phylogenetic inference and character mapping is the order Myrtales, composed by the small families Alzateaceae, Crypteroniaceae, and Penaeaceae (collectively the “CAP clade”), as well as the large families Combretaceae, Lythraceae, Melastomataceae, Myrtaceae, Onagraceae and Vochysiaceae. In this study, we present a novel way to study pollen evolution by using quantitative size and shape variables. We use morphometric and morphospace methods to evaluate pollen change in the order Myrtales using a time-calibrated, supermatrix phylogeny. We then test for conservatism, divergence, and morphological convergence of pollen and for correlation between the latitudinal gradient and pollen size and shape. To obtain an estimate of shape, Myrtales pollen images were extracted from the literature, and their outlines analyzed using elliptic Fourier methods. Shape and size variables were then analyzed in a phylogenetic framework under an Ornstein-Uhlenbeck process to test for shifts in size and shape during the evolutionary history of Myrtales. Few shifts in Myrtales pollen morphology were found which indicates morphological conservatism. Heterocolpate, small pollen is ancestral with largest pollen in Onagraceae. Convergent shifts in shape but not size occurred in Myrtaceae and Onagraceae and are correlated to shifts in latitude and biogeography. A quantitative approach was applied for the first time to examine pollen evolution across a large time scale. Using phylogenetic based morphometrics and an OU process, hypotheses of pollen size and shape were tested across Myrtales. Convergent pollen shifts and position in the latitudinal gradient support the selective role of harmomegathy, the mechanism by which pollen grains accommodate their volume in response to water loss.

  19. c

    ckanext-importlib - Extensions - CKAN Ecosystem Catalog Beta

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-importlib - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-importlib
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The ckanext-importlib extension provides a library to facilitate automated or continuous dataset imports into CKAN using the API. It particularly addresses challenges associated with repetitive imports, such as checking for existing datasets based on a unique ID stored in extras, and managing related resources. The extension offers tools designed to support scenarios where CKAN data is continuously updated from external sources. Key Features: Dataset Existence Checks: Facilitates checking for the existence of datasets based on a unique identifier stored as an extra field, allowing for updates rather than duplications during re-imports. It can optionally consider an additional "source" extra field during this check. Resource Grouping: Provides functionality, exemplified by ResourceSeriesLoader, to manage grouped resources within datasets, specifically designed for handling time series data. Name Clashing Avoidance: Aims to avoid naming conflicts when deriving unique dataset names from titles by implementing mechanisms to prevent clashes. Framework Design: While not as flexible as initially intended, the extension architecture provides a framework for handling continuous data imports into CKAN and can be extended based on specific project requirements. Example Importer: The extension was designed as a generalized framework, based around the specific implementations for data.gov.uk ONS importer which serves an example in how to use the library. Use Cases: Continuous Data Feeds: Organizations that need to continuously import and update data from external sources can use ckanext-importlib to automate the process, ensuring data in CKAN remains current and accurate. Managing Time Series Data: Datasets with multiple data files, such as time series, can use the resource management features to ensure that these resources are properly linked and organized within the dataset. Data Synchronization: When mirroring or integrating datasets from other systems into CKAN, this extension simplifies the process of checking for existing data and updating it as needed, thereby maintaining data integrity. Technical Integration: To get the extension running, installation of dependencies mentioned within the source code repository is needed. This includes CKAN and it's dependencies followed by those specifically listed in pip-requirements.txt specific to this extension. Benefits & Impact: The ckanext-importlib streamlines the process of continually importing datasets into CKAN and improves system efficiency and data integrity. By providing mechanisms to handle updates and avoid data duplication, it ensures that effort isn't wasted on handling data discrepancies.

  20. MKAD (Open Sourced Code) - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). MKAD (Open Sourced Code) - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/mkad-open-sourced-code
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    MKAD
    Description

    The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files. It combines multiple kernels into a single optimization function using the One Class Support Vector Machine (OCSVM) framework. Any kernel function can be combined in the algorithm as long as it meets the Mercer conditions, however for the purposes of this code the data preformatting and kernel type is specific to the Flight Operations Quality Assurance (FOQA) data and has been integrated into the coding steps. For this domain, discrete binary switch sequences are used in the discrete kernel, and discretized continuous parameter features are used to form the continuous kernel. The OCSVM uses a training set of nominal examples (in this case flights) and evaluates test examples for anomaly detection to determine whether they are anomalous or not. After completing this analysis the algorithm reports the anomalous examples and determines whether there is a contribution from either or both continuous and discrete elements.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Brian C. Ross (2023). Mutual Information between Discrete and Continuous Data Sets [Dataset]. http://doi.org/10.1371/journal.pone.0087357
Organization logo

Mutual Information between Discrete and Continuous Data Sets

Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Brian C. Ross
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.

Search
Clear search
Close search
Google apps
Main menu