35 datasets found
  1. Indian food prediction using model of regression

    • kaggle.com
    zip
    Updated Nov 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivani Jaiswal (2020). Indian food prediction using model of regression [Dataset]. https://www.kaggle.com/datasets/shivijaiswal/indian-food-prediction-using-model-of-regression
    Explore at:
    zip(49807 bytes)Available download formats
    Dataset updated
    Nov 1, 2020
    Authors
    Shivani Jaiswal
    Description

    Dataset

    This dataset was created by Shivani Jaiswal

    Contents

  2. f

    Data from: From Black Box to Shining Spotlight: Using Random Forest...

    • tandf.figshare.com
    bin
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew J. Sage; Yang Liu; Joe Sato (2024). From Black Box to Shining Spotlight: Using Random Forest Prediction Intervals to Illuminate the Impact of Assumptions in Linear Regression [Dataset]. http://doi.org/10.6084/m9.figshare.20402389.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Andrew J. Sage; Yang Liu; Joe Sato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a pair of Shiny web applications that allow users to visualize random forest prediction intervals alongside those produced by linear regression models. The apps are designed to help undergraduate students deepen their understanding of the role that assumptions play in statistical modeling by comparing and contrasting intervals produced by regression models with those produced by more flexible algorithmic techniques. We describe the mechanics of each approach, illustrate the features of the apps, provide examples highlighting the insights students can gain through their use, and discuss our experience implementing them in an undergraduate class. We argue that, contrary to their reputation as a black box, random forests can be used as a spotlight, for educational purposes, illuminating the role of assumptions in regression models and their impact on the shape, width, and coverage rates of prediction intervals.

  3. Data from: Evaluating the Use of Uncertainty Visualisations for Imputations...

    • osf.io
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhraneel Sarma (2024). Evaluating the Use of Uncertainty Visualisations for Imputations of Data Missing At Random in Scatterplots [Dataset]. https://osf.io/q4y5r
    Explore at:
    Dataset updated
    Aug 26, 2024
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Abhraneel Sarma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary materials for the paper, Evaluating the Use of Uncertainty Visualisations for Imputations of Data Missing At Random in Scatterplots

    Abstract: Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems only support visualising data points with complete cases. This omission may potentially lead the user to biased analyses and insights. Imputation techniques can help estimate the value of a missing data point, but introduces additional uncertainty. In this work, we investigate the effects of visualising imputed values in charts using different types of uncertainty visualisation techniques—no imputation, mean, 95% confidence intervals, probability density plots, gradient intervals, and hypothetical outcome plots. We focus on scatterplots, which is a commonly used chart type, and conduct a crowdsourced study with 202 participants. We measure users’ bias and precision in performing two tasks—estimating average and detecting trend—and their self-reported confidence in performing these tasks. Our results suggest that, when estimating averages, uncertainty representations may reduce bias but at the cost of decreasing precision. When estimating trend, only hypothetical outcome plots may lead to a small probability of reducing bias while increasing precision. Participants in every uncertainty representation were less certain about their response when compared to the baseline. The findings point towards potential trade-offs in using uncertainty encodings for datasets with a large number of missing values.

  4. Acoustic features as a tool to visualize and explore marine soundscapes:...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Feb 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Cominelli; Nicolo' Bellin; Carissa D. Brown; Jack Lawson (2024). Acoustic features as a tool to visualize and explore marine soundscapes: Applications illustrated using marine mammal Passive Acoustic Monitoring datasets [Dataset]. http://doi.org/10.5061/dryad.3bk3j9kn8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    University of Parma
    Memorial University of Newfoundland
    Fisheries and Oceans Canada
    Authors
    Simone Cominelli; Nicolo' Bellin; Carissa D. Brown; Jack Lawson
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Passive Acoustic Monitoring (PAM) is emerging as a solution for monitoring species and environmental change over large spatial and temporal scales. However, drawing rigorous conclusions based on acoustic recordings is challenging, as there is no consensus over which approaches, and indices are best suited for characterizing marine and terrestrial acoustic environments. Here, we describe the application of multiple machine-learning techniques to the analysis of a large PAM dataset. We combine pre-trained acoustic classification models (VGGish, NOAA & Google Humpback Whale Detector), dimensionality reduction (UMAP), and balanced random forest algorithms to demonstrate how machine-learned acoustic features capture different aspects of the marine environment. The UMAP dimensions derived from VGGish acoustic features exhibited good performance in separating marine mammal vocalizations according to species and locations. RF models trained on the acoustic features performed well for labelled sounds in the 8 kHz range, however, low and high-frequency sounds could not be classified using this approach. The workflow presented here shows how acoustic feature extraction, visualization, and analysis allow for establishing a link between ecologically relevant information and PAM recordings at multiple scales. The datasets and scripts provided in this repository allow replicating the results presented in the publication. Methods Data acquisition and preparation We collected all records available in the Watkins Marine Mammal Database website listed under the “all cuts'' page. For each audio file in the WMD the associated metadata included a label for the sound sources present in the recording (biological, anthropogenic, and environmental), as well as information related to the location and date of recording. To minimize the presence of unwanted sounds in the samples, we only retained audio files with a single source listed in the metadata. We then labelled the selected audio clips according to taxonomic group (Odontocetae, Mysticetae), and species. We limited the analysis to 12 marine mammal species by discarding data when a species: had less than 60 s of audio available, had a vocal repertoire extending beyond the resolution of the acoustic classification model (VGGish), or was recorded in a single country. To determine if a species was suited for analysis using VGGish, we inspected the Mel-spectrograms of 3-s audio samples and only retained species with vocalizations that could be captured in the Mel-spectrogram (Appendix S1). The vocalizations of species that produce very low frequency, or very high frequency were not captured by the Mel-spectrogram, thus we removed them from the analysis. To ensure that records included the vocalizations of multiple individuals for each species, we only considered species with records from two or more different countries. Lastly, to avoid overrepresentation of sperm whale vocalizations, we excluded 30,000 sperm whale recordings collected in the Dominican Republic. The resulting dataset consisted in 19,682 audio clips with a duration of 960 milliseconds each (0.96 s) (Table 1). The Placentia Bay Database (PBD) includes recordings collected by Fisheries and Oceans Canada in Placentia Bay (Newfoundland, Canada), in 2019. The dataset consisted of two months of continuous recordings (1230 hours), starting on July 1st, 2019, and ending on August 31st 2029. The data was collected using an AMAR G4 hydrophone (sensitivity: -165.02 dB re 1V/µPa at 250 Hz) deployed at 64 m of depth. The hydrophone was set to operate following 15 min cycles, with the first 60 s sampled at 512 kHz, and the remaining 14 min sampled at 64 kHz. For the purpose of this study, we limited the analysis to the 64 kHz recordings. Acoustic feature extraction The audio files from the WMD and PBD databases were used as input for VGGish (Abu-El-Haija et al., 2016; Chung et al., 2018), a CNN developed and trained to perform general acoustic classification. VGGish was trained on the Youtube8M dataset, containing more than two million user-labelled audio-video files. Rather than focusing on the final output of the model (i.e., the assigned labels), here the model was used as a feature extractor (Sethi et al., 2020). VGGish converts audio input into a semantically meaningful vector consisting of 128 features. The model returns features at multiple resolution: ~1 s (960 ms); ~5 s (4800 ms); ~1 min (59’520 ms); ~5 min (299’520 ms). All of the visualizations and results pertaining to the WMD were prepared using the finest feature resolution of ~1 s. The visualizations and results pertaining to the PBD were prepared using the ~5 s features for the humpback whale detection example, and were then averaged to an interval of 30 min in order to match the temporal resolution of the environmental measures available for the area. UMAP ordination and visualization UMAP is a non-linear dimensionality reduction algorithm based on the concept of topological data analysis which, unlike other dimensionality reduction techniques (e.g., tSNE), preserves both the local and global structure of multivariate datasets (McInnes et al., 2018). To allow for data visualization and to reduce the 128 features to two dimensions for further analysis, we applied Uniform Manifold Approximation and Projection (UMAP) to both datasets and inspected the resulting plots. The UMAP algorithm generates a low-dimensional representation of a multivariate dataset while maintaining the relationships between points in the global dataset structure (i.e., the 128 features extracted from VGGish). Each point in a UMAP plot in this paper represents an audio sample with duration of ~ 1 second (WMD dataset), ~ 5 seconds (PBD dataset, humpback whale detections), or 30 minutes (PBD dataset, environmental variables). Each point in the two-dimensional UMAP space also represents a vector of 128 VGGish features. The nearer two points are in the plot space, the nearer the two points are in the 128-dimensional space, and thus the distance between two points in UMAP reflects the degree of similarity between two audio samples in our datasets. Areas with a high density of samples in UMAP space should, therefore, contain sounds with similar characteristics, and such similarity should decrease with increasing point distance. Previous studies illustrated how VGGish and UMAP can be applied to the analysis of terrestrial acoustic datasets (Heath et al., 2021; Sethi et al., 2020). The visualizations and classification trials presented here illustrate how the two techniques (VGGish and UMAP) can be used together for marine ecoacoustics analysis. UMAP visualizations were prepared the umap-learn package for Python programming language (version 3.10). All UMAP visualizations presented in this study were generated using the algorithm’s default parameters.
    Labelling sound sources The labels for the WMD records (i.e., taxonomic group, species, location) were obtained from the database metadata. For the PBD recordings, we obtained measures of wind speed, surface temperature, and current speed from (Fig 1) an oceanographic buy located in proximity of the recorder. We choose these three variables for their different contributions to background noise in marine environments. Wind speed contributes to underwater background noise at multiple frequencies, ranging 500 Hz to 20 kHz (Hildebrand et al., 2021). Sea surface temperature contributes to background noise at frequencies between 63 Hz and 125 Hz (Ainslie et al., 2021), while ocean currents contribute to ambient noise at frequencies below 50 Hz (Han et al., 2021) Prior to analysis, we categorized the environmental variables and assigned the categories as labels to the acoustic features (Table 2). Humpback whale vocalizations in the PBD recordings were processed using the humpback whale acoustic detector created by NOAA and Google (Allen et al., 2021), providing a model score for every ~5 s sample. This model was trained on a large dataset (14 years and 13 locations) using humpback whale recordings annotated by experts (Allen et al., 2021). The model returns scores ranging from 0 to 1 indicating the confidence in the predicted humpback whale presence. We used the results of this detection model to label the PBD samples according to presence of humpback whale vocalizations. To verify the model results, we inspected all audio files that contained a 5 s sample with a model score higher than 0.9 for the month of July. If the presence of a humpback whale was confirmed, we labelled the segment as a model detection. We labelled any additional humpback whale vocalization present in the inspected audio files as a visual detection, while we labelled other sources and background noise samples as absences. In total, we labelled 4.6 hours of recordings. We reserved the recordings collected in August to test the precision of the final predictive model. Label prediction performance We used Balanced Random Forest models (BRF) provided in the imbalanced-learn python package (Lemaître et al., 2017) to predict humpback whale presence and environmental conditions from the acoustic features generated by VGGish. We choose BRF as the algorithm as it is suited for datasets characterized by class imbalance. The BRF algorithm performs under sampling of the majority class prior to prediction, allowing to overcome class imbalance (Lemaître et al., 2017). For each model run, the PBD dataset was split into training (80%) and testing (20%) sets. The training datasets were used to fine-tune the models though a nested k-fold cross validation approach with ten-folds in the outer loop, and five-folds in the inner loop. We selected nested cross validation as it allows optimizing model hyperparameters and performing model evaluation in a single step. We used the default parameters of the BRF algorithm, except for the ‘n_estimators’ hyperparameter, for which we tested

  5. f

    Presents the detailed results of the experimentation using data imbalance...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balew Ayalew Kassie; Geletaw Sahle Tegenaw (2023). Presents the detailed results of the experimentation using data imbalance techniques. [Dataset]. http://doi.org/10.1371/journal.pdig.0000376.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    PLOS Digital Health
    Authors
    Balew Ayalew Kassie; Geletaw Sahle Tegenaw
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presents the detailed results of the experimentation using data imbalance techniques.

  6. c

    VIPVIZA - VIsualiZation of asymptomatic Atherosclerotic disease for optimum...

    • datacatalogue.cessda.eu
    • snd.se
    • +2more
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Norberg, Margareta; Näslund, Ulf; Wennberg, Patrik; Department of Public Health and Clinical Medicine (2024). VIPVIZA - VIsualiZation of asymptomatic Atherosclerotic disease for optimum cardiovascular prevention ─ a randomized controlled trial nested in the Västerbotten Intervention Program [Dataset]. https://datacatalogue.cessda.eu/detail?q=0dbe2c90610e1d86bb82d6804cec55db6df3ce9bbcf808a5a4ae575b4d81bfec
    Explore at:
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Department of Epidemiology and Global Health, Umeå University
    Umeå University
    Public Health and Clinical Medicine, Umeå University
    Family Medicine, Umeå University
    Authors
    Norberg, Margareta; Näslund, Ulf; Wennberg, Patrik; Department of Public Health and Clinical Medicine
    Time period covered
    1992 - 1998
    Area covered
    Sweden
    Variables measured
    Individual/Patient
    Measurement technique
    VIP historical data 20 years before basline, Registry extract and/or access to biobank sample, VIP Historical data 10 years before baseline, Registry extract and/or access to biobank sample, The Swedish Prescribed Drug Register, Registry extract and/or access to biobank sample, VIP baseline (lifestyle), Registry extract and/or access to biobank sample, Ultrasound baseline, Physical measurements and tests, Psychological (Problem management) , baseline, Self-administered questionnaire, Lifestyle 1 year, Self-administered questionnaire, Lifestyle 1 year, Biological tests, Clinical CVD risk factors and lifestyle habits 3 year, Self-administered questionnaire, Clinical CVD risk factors and lifestyle habits 3 year, Measurements and tests, Ultrasound 3 year, Physical measurements and tests, Psykologiska faktorer och reaktioner på VIPVIZA interventionen (Problemhantering), efter 3 år, Self-administered questionnaire, Included study participants, Military service mustering register, Registry extract and/or access to biobank sample, Included study participants, Biological tests
    Description

    The aim of the project is to develop better methods for prevention of cardiovascular diseases (CVD). It is based on the hypothesis that image-based information on subclinical atherosclerosis (i) increases the precision in the assessment of risk of CVD, (ii) improves communication and understanding of the risk, and as a consequence (iii) the motivation for and adherence to evidence-based pharmacological treatment and lifestyle modification will increase. In addition to conventional risk factor assessment and CVD prevention within the framework of Västerbotten Intervention Program.

    3500 healthy participants with low/moderate risk of CVD underwent ultrasound examination of the carotid arteries and were randomized to two groups. In the intervention group, the participants and their doctors received pictorial and graphic information in color about the participant’s subclinical atherosclerosis. No information about the ultrasound results was given to the control group. Follow-up after 1, 3 and 6.5 years includes sampling regarding clinical risk factors, blood for biomarker analyses, extensive questionnaires and interviews.

    At 3 and 6.5 years the ultrasound examination was repeated and all participants and their doctors were informed about the results. The database also includes register data regarding prescriptions of preventive medication, exposure data for air pollutants, data from health examinations within the VIP 10 and 20 years before VIPVIZA, and for men conscription data.

    After 10 years, registry data on endpoints, CVD morbidity and mortality will be collected.

    Access to VIPVIZA's data portal and research data from VIPVIZA is possible in collaboration with researchers within the VIPVIZA project. For further information, contact PI Ulf Näslund ulf.naslund@umu.se

  7. B

    Replication data for: Stochastic and Deterministic Modeling of the Future...

    • borealisdata.ca
    • open.library.ubc.ca
    bin, tsv, xls, xlsx
    Updated Feb 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahram Yarmand; Shahram Yarmand (2019). Replication data for: Stochastic and Deterministic Modeling of the Future Price of Crude oil and Bottled Water [Dataset]. http://doi.org/10.5683/SP2/VPF8J8
    Explore at:
    bin(225284), xls(144896), xlsx(1377738), tsv(2800)Available download formats
    Dataset updated
    Feb 27, 2019
    Dataset provided by
    Borealis
    Authors
    Shahram Yarmand; Shahram Yarmand
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Sep 10, 2017 - Dec 17, 2017
    Area covered
    United States, Crude Oil Prices: West Texas Intermediate (WTI) and U.S. bottled water
    Description

    Deterministic and stochastic are two methods for modeling of crude oil and bottled water market. Forecasting the price of the market directly affected energy producer and water user.There are two software, Tableau and Python, which are utilized to model and visualize both markets for the aim of estimating possible price in the future.The role of those software is to provide an optimal alternative with different methods (deterministic versus stochastic). The base of predicted price in Tableau is deterministic—global optimization and time series. In contrast, Monte Carlo simulation as a stochastic method is modeled by Python software. The purpose of the project is, first, to predict the price of crude oil and bottled water with stochastic (Monte Carlo simulation) and deterministic (Tableau software),second, to compare the prices in a case study of Crude Oil Prices: West Texas Intermediate (WTI) and the U.S. bottled water. 1. Introduction Predicting stock and stock price index is challenging due to uncertainties involved. We can analyze with a different aspect; the investors perform before investing in a stock or the evaluation of stocks by means of studying statistics generated by market activity such as past prices and volumes. The data analysis attempt to identify stock patterns and trends that may predict the estimation price in the future. Initially, the classical regression (deterministic) methods were used to predict stock trends; furthermore, the uncertainty (stochastic) methods were used to forecast as same as deterministic. According to Deterministic versus stochastic volatility: implications for option pricing models (1997), Paul Brockman & Mustafa Chowdhury researched that the stock return volatility is deterministic or stochastic. They reported that “Results reported herein add support to the growing literature on preference-based stochastic volatility models and generally reject the notion of deterministic volatility” (Pag.499). For this argument, we need to research for modeling forecasting historical data with two software (Tableau and Python). In order to forecast analyze Tableau feature, the software automatically chooses the best of up to eight models which generates the highest quality forecast. According to the manual of Tableau , Tableau assesses forecast quality optimize the smoothing of each model. The optimization model is global. The main part of the model is a taxonomy of exponential smoothing that analyzes the best eight models with enough data. The real- world data generating process is a part of the forecast feature and to support deterministic method. Therefore, Tableau forecast feature is illustrated the best possible price in the future by deterministic (time – series and prices). Monte Carlo simulation (MCs) is modeled by Python, which is predicted the floating stock market index . Forecasting the stock market by Monte Carlo demonstrates in mathematics to solve various problems by generating suitable random numbers and observing that fraction of the numbers that obeys some property or properties. The method utilizes to obtain numerical solutions to problems too complicated to solve analytically. It randomly generates thousands of series representing potential outcomes for possible returns. Therefore, the variable price is the base of a random number between possible spot price between 2002-2016 that present a stochastic method.

  8. d

    Comparison of R1 and R2 Online Research Data Services

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Szkirpan, Elizabeth (2023). Comparison of R1 and R2 Online Research Data Services [Dataset]. http://doi.org/10.7910/DVN/SHJABB
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Szkirpan, Elizabeth
    Description

    Compiled in mid-2022, this dataset contains the raw data file, randomized ranked lists of R1 and R2 research institutions, and files created to support data visualization for Elizabeth Szkirpan's 2022 study regarding availability of data services and research data information via university libraries for online users. Files are available in Microsoft Excel formats.

  9. e

    LDN Sqr Test Data

    • data.europa.eu
    json
    Updated Dec 31, 2000
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greater London Authority (2000). LDN Sqr Test Data [Dataset]. https://data.europa.eu/data/datasets/ldn-sqr-test-data
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 31, 2000
    Dataset authored and provided by
    Greater London Authority
    Description

    Test data for LDN Sqr D3 component development. Random data.

  10. d

    Data for An Online Application to Explain Community Immunity with...

    • search.dataone.org
    • borealisdata.ca
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Witteman, Holly; Hakim, Hina (2024). Data for An Online Application to Explain Community Immunity with Personalized Avatars: A Randomized Controlled Trial [Dataset]. http://doi.org/10.5683/SP3/41MWKO
    Explore at:
    Dataset updated
    Oct 30, 2024
    Dataset provided by
    Borealis
    Authors
    Witteman, Holly; Hakim, Hina
    Description

    Data collected as part of a randomized controlled trial of a web application "herdimm" to help people understand how herd immunity (community immunity) works. Lead author Hina Hakim collected and analysed the data. Codebook/data dictionary and all study materials are available at https://osf.io/hkysb/

  11. D

    Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph...

    • darus.uni-stuttgart.de
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitar Garkov; Tommaso Piselli; Emilio Di Giacomo; Karsten Klein; Giuseppe Liotta; Fabrizio Montecchiani; Falk Schreiber (2024). Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis - Replication data [Dataset]. http://doi.org/10.18419/DARUS-4231
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    DaRUS
    Authors
    Dimitar Garkov; Tommaso Piselli; Emilio Di Giacomo; Karsten Klein; Giuseppe Liotta; Fabrizio Montecchiani; Falk Schreiber
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4231https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4231

    Dataset funded by
    DFG
    Description

    This dataset contains the supplementary materials to our publication "Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis", where we report on a study we conducted. Please refer to publication for more details, also the abstract can be found at the end of this description. The dataset contains: The collection of graphs with layout used in the study The final, randomized experiment files used in the study The source code of the study prototype The collected, anonymized data in tabular form The code for the statistical analysis The Supplemental Materials PDF Paper abstract: Problem solving is a composite cognitive process, invoking a number of systems and subsystems, such as perception and memory. Individuals may form collectives to solve a given problem together, in collaboration, especially when complexity is thought to be high. To determine if and when collaborative problem solving is desired, we must quantify collaboration first. For this, we investigate the practical virtue of collaborative problem solving. Using visual graph analysis, we perform a study with 72 participants in two countries and three languages. We compare ad hoc pairs to individuals and nominal pairs, solving two different tasks on graphs in visuospatial mixed reality. The average collaborating pair does not outdo its nominal counterpart, but it does have a significant trade-off against the individual: an ad hoc pair uses 1.46 more time to achieve 4.6 higher accuracy. We also use the concept of task instance complexity to quantify differences in complexity. As task instance complexity increases, these differences largely scale, though with two notable exceptions. With this study we show the importance of using nominal groups as benchmark in collaborative virtual environments research. We conclude that a mixed reality environment does not automatically imply superior collaboration.

  12. stock_TESLA

    • kaggle.com
    Updated Dec 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira gibin (2023). stock_TESLA [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/stock-tesla
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    willian oliveira gibin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The “Tesla Stock Price Data (Last One Year)” dataset is a comprehensive collection of historical stock market information, focusing on Tesla Inc. (TSLA) for the past year. This dataset serves as a valuable resource for financial analysts, investors, researchers, and data enthusiasts who are interested in studying the trends, patterns, and performance of Tesla’s stock in the financial markets.It consists of 9 columns referring to date, high and low prices, open and closing value, volume, cumulative open and of course changing of price.At a first glance in order to better understand the data we should plot the time series of each attribute.The cumulative Open Interest(OI) is the total open contracts that are being held in a particular Future or Call or Put contracts on the Exchange. We can see that the biggest drop of the stock happened in January of 2023 and after 5 to 6 months it regained its stock value round the summer of the same year with opening and closing price around 300.As a next step we are going to plot some more plots in order ro better understand the relation between our target column(change price) with every other attribute. In order to interpret the results:

    Linear Regression:

    Mean Absolute Error (MAE): 6.28 This model, on average, predicts the “Price Change” within approximately 6.28 units of the true value. Mean Squared Error (MSE): 52.97 MSE measures the average of squared differences, and this value suggests some variability in prediction errors. Root Mean Squared Error (RMSE): 7.28 RMSE is the square root of MSE and is in the same units as the target variable. An RMSE of 7.28 indicates the typical prediction error. R-squared (R2): 0.0868 R-squared represents the proportion of the variance in the target variable explained by the model. An R2 of 0.0868 suggests that the model explains only a small portion of the variance, indicating limited predictive power. Decision Tree Regression:

    Mean Absolute Error (MAE): 9.21 This model, on average, predicts the “Price Change” within approximately 9.21 units of the true value, which is higher than the Linear Regression model. Mean Squared Error (MSE): 150.69 The MSE is relatively high, indicating larger prediction errors and more variability. Root Mean Squared Error (RMSE): 12.28 RMSE of 12.28 is notably higher, suggesting that this model has larger prediction errors. R-squared (R2): -1.598 The negative R-squared value indicates that the model performs worse than a horizontal line as a predictor, indicating a poor fit. Random Forest Regression:

    Mean Absolute Error (MAE): 6.99 This model, on average, predicts the “Price Change” within approximately 6.99 units of the true value, similar to Linear Regression. Mean Squared Error (MSE): 62.79 MSE is lower than the Decision Tree model but higher than Linear Regression, suggesting intermediate prediction accuracy Root Mean Squared Error (RMSE): 7.92 RMSE is also intermediate, indicating moderate prediction errors. R-squared (R2): -0.0824 The negative R-squared suggests that the Random Forest model does not perform well and has limited predictive power.

  13. VSRR Provisional County-Level Drug Overdose Death Counts

    • catalog.data.gov
    • healthdata.gov
    • +3more
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). VSRR Provisional County-Level Drug Overdose Death Counts [Dataset]. https://catalog.data.gov/dataset/vsrr-provisional-county-level-drug-overdose-death-counts-d154f
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    This data visualization presents county-level provisional counts for drug overdose deaths based on a current flow of mortality data in the National Vital Statistics System. County-level provisional counts include deaths occurring within the 50 states and the District of Columbia, as of the date specified and may not include all deaths that occurred during a given time period. Provisional counts are often incomplete and causes of death may be pending investigation resulting in an underestimate relative to final counts (see Technical Notes). The provisional data presented on the dashboard below include reported 12 month-ending provisional counts of death due to drug overdose by the decedent’s county of residence and the month in which death occurred. Percentages of deaths with a cause of death pending further investigation and a note on historical completeness (e.g. if the percent completeness was under 90% after 6 months) are included to aid in interpretation of provisional data as these measures are related to the accuracy of provisional counts (see Technical Notes). Counts between 1-9 are suppressed in accordance with NCHS confidentiality standards. Provisional data presented on this page will be updated on a quarterly basis as additional records are received. Technical Notes Nature and Sources of Data Provisional drug overdose death counts are based on death records received and processed by the National Center for Health Statistics (NCHS) as of a specified cutoff date. The cutoff date is generally the first Sunday of each month. National provisional estimates include deaths occurring within the 50 states and the District of Columbia. NCHS receives the death records from the state vital registration offices through the Vital Statistics Cooperative Program (VSCP). The timeliness of provisional mortality surveillance data in the National Vital Statistics System (NVSS) database varies by cause of death and jurisdiction in which the death occurred. The lag time (i.e., the time between when the death occurred and when the data are available for analysis) is longer for drug overdose deaths compared with other causes of death due to the time often needed to investigate these deaths (1). Thus, provisional estimates of drug overdose deaths are reported 6 months after the date of death. Provisional death counts presented in this data visualization are for “12 month-ending periods,” defined as the number of deaths occurring in the 12 month period ending in the month indicated. For example, the 12 month-ending period in June 2020 would include deaths occurring from July 1, 2019 through June 30, 2020. The 12 month-ending period counts include all seasons of the year and are insensitive to reporting variations by seasonality. These provisional counts of drug overdose deaths and related data quality metrics are provided for public health surveillance and monitoring of emerging trends. Provisional drug overdose death data are often incomplete, and the degree of completeness varies by jurisdiction and 12 month-ending period. Consequently, the numbers of drug overdose deaths are underestimated based on provisional data relative to final data and are subject to random variation. Cause of Death Classification and Definition of Drug Deaths Mortality statistics are compiled in accordance with the World Health Organizations (WHO) regulations specifying that WHO member nations classify and code causes of death with the current revision of the International Statistical Classification of Diseases and Related Health Problems (ICD). ICD provides the basic guidance used in virtually all countries to code and classify causes of death. It provides not only disease, injury, and poisoning categories but also the rules used to select the single underlying cause of death for tabulation from the several diagnoses that may be reported on a single death certificate, as well as definitions, tabulation lists, the format of the death certificate, and regul

  14. f

    Data Sheet 1_Development and evaluation of a mixed reality music...

    • frontiersin.figshare.com
    • figshare.com
    csv
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Erdmann; Markus von Berg; Jochen Steffens (2025). Data Sheet 1_Development and evaluation of a mixed reality music visualization for a live performance based on music information retrieval.csv [Dataset]. http://doi.org/10.3389/frvir.2025.1552321.s003
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    Frontiers
    Authors
    Matthias Erdmann; Markus von Berg; Jochen Steffens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The present study explores the development and evaluation of a mixed reality music visualization for a live music performance. Real-time audio analysis and crossmodal correspondences were used as design guidelines for creating the visualization, which was presented through a head-mounted-display. To assess the impact of the music visualization on the audience’s aesthetic experience, a baseline visualization was designed, featuring the same visual elements but with random changes of color and movement. The audience’s aesthetic experience of the two conditions (i.e., listening to the same song with different visualizations) was assessed using the Aesthetic Emotions Scale (AESTHEMOS) questionnaire. Additionally, participants answered questions regarding the perceived audiovisual congruence of the stimuli and questionnaires about individual musicality and aesthetic receptivity. The results show that the visualization controlled by real-time audio analysis was associated with a slightly enhanced aesthetic experience of the audiovisual composition compared to the randomized visualization, thereby supporting similar findings reported in the literature. Furthermore, the tested personal characteristics of the participants did not significantly affect aesthetic experience. Significant correlations between these characteristics and the aesthetic experience were observed only when the ratings were averaged across conditions. An open interview provided deeper insights into the participants’ overall experiences of the live music performance. The results of the study offer insights into the development of real-time music visualization in mixed reality, examines how the specific audiovisual stimuli employed influence the aesthetic experience, and provides potential technical guidelines for creating new concert formats.

  15. d

    Data from: Distances and their visualization in studies of spatial-temporal...

    • dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Jan 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arthur Georges (2024). Distances and their visualization in studies of spatial-temporal genetic variation using single nucleotide polymorphisms (SNPs) [Dataset]. http://doi.org/10.5061/dryad.4b8gthtkn
    Explore at:
    Dataset updated
    Jan 17, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Arthur Georges
    Time period covered
    Jan 1, 2023
    Description

    Distance measures are widely used for examining genetic structure in datasets that comprise many individuals scored for a very large number of attributes. Genotype datasets composed of single nucleotide polymorphisms (SNPs) typically contain bi-allelic scores for tens of thousands if not hundreds of thousands of loci. We examine the application of distance measures to SNP genotypes and sequence tag presence-absences (SilicoDArT) and use real datasets and simulated data to illustrate pitfalls in the application of genetic distances and their visualization. The datasets used to illustrate points in the associated review are provided here together with the R script used to analyse the data. Data are either simulated internal to this script or are SNP data generated as part of other studies and included as compressed binary files readily accessable by reading into R using R base function readRDS(). Refer to the analysis script for examples., A dataset was constructed from a SNP matrix generated for the freshwater turtles in the genus Emydura, a recent radiation of Chelidae in Australasia. The dataset (SNP_starting_data.Rdata) includes selected populations that vary in level of divergence to encompass variation within species and variation between closely related species. Sampling localities with evidence of admixture between species were removed. Monomorphic loci were removed, and the data was filtered on call rate (>95%), repeatability (>99.5%) and read depth (5x < read depth < 50x). Where there was more than one SNP per sequence tag, only one was retained at random. The resultant dataset had 18,196 SNP loci scored for 381 individuals from 7 sampling localities or populations – Emydura victoriae [Ord River, NT, n=15], E. tanybaraga [Holroyd River, Qld, n=10], E. subglobosa worrelli [Daly River, NT, n=25], E. subglobosa subglobosa [Fly River, PNG, n=55], E. macquarii macquarii [Murray Darling Basin north, NSW/..., , # Data from: Distances and their visualization in studies of spatial-temporal genetic variation using single nucleotide polymorphisms (SNPs)

    This Dryad entry contains the datafiles and associated R script to generate the analyses presented in the companion review article. They include SNP datasets for Australian Turtles and the Australian Blue Mountains Skink, and the associated SilicoDArT data (null alleles matrix) for the turtles. They are for illustration purposes, and have been modified to meet the requirements of the analysis being presented.

    Description of the data and file structure

    The turtle SNP data comprises a matrix of entities (individuals) versus attributes (loci) taking on the states 0 for homozygous reference allele, 2 for homozygous alternate allele and 1 for the heterozygous state. The data are stored in compressed form as an adegenet genlight object with associated locus metadata (e.g. callrate, reproducibility) and individual metadata (e.g. latitude, longitud...

  16. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.

  17. d

    Dataset for scientific paper "Simulated plant‑mediated oxygen input has...

    • search.dataone.org
    • osti.gov
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongli Zhou; Teri O'Meara; Zoe G. Cardon; Jiaze Wang; Benjamin N. Sulman; Anne E. Giblin; Inke Forbrich (2024). Dataset for scientific paper "Simulated plant‑mediated oxygen input has strong impacts on fine‑scale porewater biogeochemistry and weak impacts on integrated methane fluxes in coastal wetlands", a modeling study based on field observation at the tidal salt marshes of the Parker River Estuary, Massachusetts, United States [Dataset]. http://doi.org/10.15485/2370458
    Explore at:
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    ESS-DIVE
    Authors
    Yongli Zhou; Teri O'Meara; Zoe G. Cardon; Jiaze Wang; Benjamin N. Sulman; Anne E. Giblin; Inke Forbrich
    Area covered
    Description

    This dataset is the raw and processed data for the paper "Simulated plant ‑ mediated oxygen input has strong impacts on fine ‑ scale porewater biogeochemistry and weak impacts on integrated methane fluxes in coastal wetlands". This study investigated how plant-mediated oxygen input affects subsurface biogeochemical reactions of organic carbon degradation and the resulting methane emissions of coastal wetlands by model simulation. We used the subsurface geochemical simulator PFLOTRAN for the modeling, which produced the simulated changes in porewater chemical substances and methane emissions over 10 days under different scenarios of plant-mediated oxygen input. Specifically, this dataset contains: 1) the input files for PFLOTRAN of all simulation runs conducted in this study. Those files are with an extension of ".in", containing information of the biogeochemical reaction network (stoichiometry, reaction rate, Monod constants, etc), fluid flow rate and oxygen concentration in the fluid which together simulated the plant-mediated oxygen input, the configuration of artificial reactions that simulated the methane fluxes, etc. The PFLOTRAN input files are text files, which can be opened by NotePad, but running these input files will require proper installation of PFLOTRAN (instruction: https://documentation.pflotran.org/user_guide/how_to/installation/installation.html). 2) the raw and processed model output from PFLOTRAN of all simulation runs, and 3) the python scripts used to process the raw model output, including random allocation of root cells, converting raw data into organized formats, calculating the methane fluxes based on the model output, data visualization, etc. The raw and processed model output from PFLOTRAN are in .spydata format, which can be viewed with Python. and 3) the python scripts for data processing and analysis are programming scripts, which can be opened with Python. This modeling work, in particular the model parameterization of root density and initial conditions of porewater concentrations of biogeochemical substances, was based on field measurements at the salt marsh of the Upper Parker River Estuary, Massachusetts, United States.

  18. DATS 6401 - Final Project - Yon ho Cheong.zip

    • figshare.com
    zip
    Updated Dec 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yon ho Cheong (2018). DATS 6401 - Final Project - Yon ho Cheong.zip [Dataset]. http://doi.org/10.6084/m9.figshare.7471007.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 15, 2018
    Dataset provided by
    figshare
    Authors
    Yon ho Cheong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau

  19. Statistical Test of Distance–Duality Relation with Type Ia Supernovae and...

    • zenodo.org
    • data.niaid.nih.gov
    bin, xz
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ma Cong; Ma Cong (2020). Statistical Test of Distance–Duality Relation with Type Ia Supernovae and Baryon Acoustic Oscillations (3rd version) [Dataset]. http://doi.org/10.5281/zenodo.1219473
    Explore at:
    bin, xzAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ma Cong; Ma Cong
    Description

    Summary

    This package contains data and processing tools for replicating the research presented in the paper "Statistical Test of Distance–Duality Relation with Type Ia Supernovae and Baryon Acoustic Oscillations" (2018, ApJ, DOI: 10.3847/1538-4357/aac88f, arXiv:1604.04631).

    The compressed archive file "ddmc-nosample-v3.1.tar.xz" contains only the compressed SNIa data, the BAO measurements, and 3rd-party data files used in this work. The random samples can be re-created by the tools included in the package. This is the file suitable for low-speed download.

    The file "ddmc-v3.1.tar.xz" contains the full set of random sample output files and analysis results in addition to those in the "ddmc-nosample-v3.1.tar.xz" file. This is the archive containing all the data and figure files used directly in the paper.

    To uncompress the files, the XZ Utils software package is required.

    The file "CHECKSUM.asc" is a GPG-clearsigned text file containing the SHA-512 checksum values for file integrity verification. The text file itself is signed with the GPG key 0xE977A6E990102402 available from keyservers.

    Please read the README files in each package for more details and instructions.

    Release notes for version 3.1

    Version 3.1 is a minor revision with the addition of some alternative input parameter distributions.

    Release notes for version 3

    This is the 3rd version representing a re-written analysis of the distance-duality test. This new version updated and renamed the complementary parameter (CP) sets to match the ones used in the paper. New results concerning the interpretation of results as a diagnostics of distance measurement systematics are presented. Also included are updated utility scripts, new tests for Gaussian approximation to the results, and new data-visualization scripts.

    Earlier versions

    Earlier versions are available from Zenodo. Links: v1, v2.

  20. o

    Data from: Visualization of early events in acetic acid denaturation of...

    • omicsdi.org
    Updated Jan 1, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Visualization of early events in acetic acid denaturation of HIV-1 protease: a molecular dynamics study. [Dataset]. https://www.omicsdi.org/dataset/biostudies/S-EPMC3126794
    Explore at:
    Dataset updated
    Jan 1, 2013
    Variables measured
    Unknown
    Description

    Protein denaturation plays a crucial role in cellular processes. In this study, denaturation of HIV-1 Protease (PR) was investigated by all-atom MD simulations in explicit solvent. The PR dimer and monomer were simulated separately in 9 M acetic acid (9 M AcOH) solution and water to study the denaturation process of PR in acetic acid environment. Direct visualization of the denaturation dynamics that is readily available from such simulations has been presented. Our simulations in 9 M AcOH reveal that the PR denaturation begins by separation of dimer into intact monomers and it is only after this separation that the monomer units start denaturing. The denaturation of the monomers is flagged off by the loss of crucial interactions between the ?-helix at C-terminal and surrounding ?-strands. This causes the structure to transit from the equilibrium dynamics to random non-equilibrating dynamics. Residence time calculations indicate that denaturation occurs via direct interaction of the acetic acid molecules with certain regions of the protein in 9 M AcOH. All these observations have helped to decipher a picture of the early events in acetic acid denaturation of PR and have illustrated that the ?-helix and the ?-sheet at the C-terminus of a native and functional PR dimer should maintain both the stability and the function of the enzyme and thus present newer targets for blocking PR function.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shivani Jaiswal (2020). Indian food prediction using model of regression [Dataset]. https://www.kaggle.com/datasets/shivijaiswal/indian-food-prediction-using-model-of-regression
Organization logo

Indian food prediction using model of regression

decision tree ,random forest ,linear regression, and data visualisation

Explore at:
zip(49807 bytes)Available download formats
Dataset updated
Nov 1, 2020
Authors
Shivani Jaiswal
Description

Dataset

This dataset was created by Shivani Jaiswal

Contents

Search
Clear search
Close search
Google apps
Main menu