96 datasets found
  1. m

    Stroke_Analysis

    • data.mendeley.com
    Updated Dec 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vamsi Bandi (2020). Stroke_Analysis [Dataset]. http://doi.org/10.17632/jpb5tds9f6.1
    Explore at:
    Dataset updated
    Dec 2, 2020
    Authors
    Vamsi Bandi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains primary attributes like Age, NIHSS, mRS, Systolic Blood pressure, Diastolic blood pressure, Glucose, Paralysis, Smoking, BMI, Cholesterol with standard deviation values 23.69, 11.27, 1.87, 24.92, 18.34, 56.11, 1.106, 0.9, 6.23, 20.26 and Mean values 47.12, 18.12, 3.67, 153.09, 103.65, 225.85, 1.36, 0.88, 33.73, 217.53

  2. Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Mar 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. http://doi.org/10.5281/zenodo.6384007
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 25, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  3. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1)...

    • catalog.data.gov
    • gimi9.com
    Updated Feb 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency, Office of Research and Development (ORD), Center for Public Health and Environmental Assessment (CPHEA), Pacific Ecological Systems Division (PESD), (2025). The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: Surficial Lithology in Watershed [Dataset]. https://catalog.data.gov/dataset/the-streamcat-dataset-accumulated-attributes-for-nhdplusv2-version-2-1-catchments-for-the--5783e
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Contiguous United States, United States
    Description

    This dataset represents the density of 18 USGS lithology classes within individual, local NHDPlusV2 catchments and upstream, contributing watersheds(see Data Sources for links to NHDPlusV2 data and USGS). Attributes were calculated for every local NHDPlusV2 catchment and then accumulated to provide watershed-level metrics for USGS lithology data. This data set is derived from the USGS raster map of 18 lithology classes (categorical data type) for the conterminous USA. The map was produced based on texture, internal structure, thickness, and environment of deposition or formation of materials. These 18 lithology classes were summarized by local catchment and by watershed to produce 18 local catchment-level and watershed-level metrics as a categorical data type.

  4. f

    Summary of variables of the data set included in the analysis.

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owen Bodger; Aidan Byrne; Philip A. Evans; Sarah Rees; Gwen Jones; Claire Cowell; Mike B. Gravenor; Rhys Williams (2023). Summary of variables of the data set included in the analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0027161.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Owen Bodger; Aidan Byrne; Philip A. Evans; Sarah Rees; Gwen Jones; Claire Cowell; Mike B. Gravenor; Rhys Williams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Footnote: (f) denotes a categorical variable, (c) a continuous covariate and (n) a nominal variable.

  5. H

    Replication Data for: Nursery Data Set

    • dataverse.harvard.edu
    Updated Apr 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenjuan Wang (2018). Replication Data for: Nursery Data Set [Dataset]. http://doi.org/10.7910/DVN/MBFQK0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Wenjuan Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset is downloaded from UCI repository. https://archive.ics.uci.edu/ml/datasets/nursery the dataset contains categorical data to rank nursery school applicants. The original dataset contains 5 classes. Classes were reorganized in order to remain with only two classes (”recommended” or ”not recommended”).

  6. Data from: car sales

    • kaggle.com
    zip
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sridhar jakkaraju (2023). car sales [Dataset]. https://www.kaggle.com/datasets/sridharjakkaraju/car-sales/code
    Explore at:
    zip(120379 bytes)Available download formats
    Dataset updated
    Oct 30, 2023
    Authors
    sridhar jakkaraju
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by sridhar jakkaraju

    Released under CC0: Public Domain

    Contents

  7. m

    Bridging the Gap in Hypertension Management: Evaluating Blood Pressure...

    • data.mendeley.com
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abu sufian (2025). Bridging the Gap in Hypertension Management: Evaluating Blood Pressure Control and Associated Risk Factors in a Resource-Constrained Setting [Dataset]. http://doi.org/10.17632/56jyjndvcr.1
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    abu sufian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Description

    This dataset contains a simulated collection of 1,00000 patient records designed to explore hypertension management in resource-constrained settings. It provides comprehensive data for analyzing blood pressure control rates, associated risk factors, and complications. The dataset is ideal for predictive modelling, risk analysis, and treatment optimization, offering insights into demographic, clinical, and treatment-related variables.

    Dataset Structure

    1. Dataset Volume

      • Size: 10,000 records. • Features: 19 variables, categorized into Sociodemographic, Clinical, Complications, and Treatment/Control groups.

    2. Variables and Categories

    A. Sociodemographic Variables

    1. Age:
    •  Continuous variable in years.
    •  Range: 18–80 years.
    •  Mean ± SD: 49.37 ± 12.81.
    2. Sex:
    •  Categorical variable.
    •  Values: Male, Female.
    3. Education:
    •  Categorical variable.
    •  Values: No Education, Primary, Secondary, Higher Secondary, Graduate, Post-Graduate, Madrasa.
    4. Occupation:
    •  Categorical variable.
    •  Values: Service, Business, Agriculture, Retired, Unemployed, Housewife.
    5. Monthly Income:
    •  Categorical variable in Bangladeshi Taka.
    •  Values: <5000, 5001–10000, 10001–15000, >15000.
    6. Residence:
    •  Categorical variable.
    •  Values: Urban, Sub-urban, Rural.
    

    B. Clinical Variables

    7. Systolic BP:
    •  Continuous variable in mmHg.
    •  Range: 100–200 mmHg.
    •  Mean ± SD: 140 ± 15 mmHg.
    8. Diastolic BP:
    •  Continuous variable in mmHg.
    •  Range: 60–120 mmHg.
    •  Mean ± SD: 90 ± 10 mmHg.
    9. Elevated Creatinine:
    •  Binary variable (\geq 1.4 \, \text{mg/dL}).
    •  Values: Yes, No.
    10. Diabetes Mellitus:
    •  Binary variable.
    •  Values: Yes, No.
    11. Family History of CVD:
    •  Binary variable.
    •  Values: Yes, No.
    12. Elevated Cholesterol:
    •  Binary variable (\geq 200 \, \text{mg/dL}).
    •  Values: Yes, No.
    13. Smoking:
    •  Binary variable.
    •  Values: Yes, No.
    

    C. Complications

    14. LVH (Left Ventricular Hypertrophy):
    •  Binary variable (ECG diagnosis).
    •  Values: Yes, No.
    15. IHD (Ischemic Heart Disease):
    •  Binary variable.
    •  Values: Yes, No.
    16. CVD (Cerebrovascular Disease):
    •  Binary variable.
    •  Values: Yes, No.
    17. Retinopathy:
    •  Binary variable.
    •  Values: Yes, No.
    

    D. Treatment and Control

    18. Treatment:
    •  Categorical variable indicating therapy type.
    •  Values: Single Drug, Combination Drugs.
    19. Control Status:
    •  Binary variable.
    •  Values: Controlled, Uncontrolled.
    

    Dataset Applications

    1. Predictive Modeling:
    •  Develop models to predict blood pressure control status using demographic and clinical data.
    2. Risk Analysis:
    •  Identify significant factors influencing hypertension control and complications.
    3. Severity Scoring:
    •  Quantify hypertension severity for patient risk stratification.
    4. Complications Prediction:
    •  Forecast complications like IHD, LVH, and CVD for early intervention.
    5. Treatment Guidance:
    •  Analyze therapy efficacy to recommend optimal treatment strategies.
    
  8. Black Friday Sales EDA

    • kaggle.com
    Updated Oct 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rushikesh Konapure (2022). Black Friday Sales EDA [Dataset]. https://www.kaggle.com/datasets/rishikeshkonapure/black-friday-sales-eda
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rushikesh Konapure
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset History

    A retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summaries of various customers for selected high-volume products from last month. The data set also contains customer demographics (age, gender, marital status, city type, stay in the current city), product details (productid and product category) and Total purchase amount from last month.

    Now, they want to build a model to predict the purchase amount of customers against various products which will help them to create a personalized offer for customers against different products.

    Tasks to perform

    The purchase column is the Target Variable, perform Univariate Analysis and Bivariate Analysis w.r.t the Purchase.

    Masked in the column description means already converted from categorical value to numerical column.

    Below mentioned points are just given to get you started with the dataset, not mandatory to follow the same sequence.

    DATA PREPROCESSING

    • Check the basic statistics of the dataset

    • Check for missing values in the data

    • Check for unique values in data

    • Perform EDA

    • Purchase Distribution

    • Check for outliers

    • Analysis by Gender, Marital Status, occupation, occupation vs purchase, purchase by city, purchase by age group, etc

    • Drop unnecessary fields

    • Convert categorical data into integer using map function (e.g 'Gender' column)

    • Missing value treatment

    • Rename columns

    • Fill nan values

    • map range variables into integers (e.g 'Age' column)

    Data Visualisation

    • visualize individual column
    • Age vs Purchased
    • Occupation vs Purchased
    • Productcategory1 vs Purchased
    • Productcategory2 vs Purchased
    • Productcategory3 vs Purchased
    • City category pie chart
    • check for more possible plots

    All the Best!!

  9. l

    Drug consumption database: quantified categorical attributes

    • figshare.le.ac.uk
    • figshare.com
    txt
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elaine Fehrman; Vincent Egan; Evgeny Mirkes (2023). Drug consumption database: quantified categorical attributes [Dataset]. http://doi.org/10.25392/leicester.data.7588409.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    University of Leicester
    Authors
    Elaine Fehrman; Vincent Egan; Evgeny Mirkes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Drug consumption database with quantified categorical attributes. DescriptionDB.pdf contains detailed description of database.

  10. o

    Accompanying simulated data for "Go multivariate: recommendations on...

    • explore.openaire.eu
    Updated Mar 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Mildiner Moraga; Emmeke Aarts (2022). Accompanying simulated data for "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity" [Dataset]. http://doi.org/10.5281/zenodo.6385196
    Explore at:
    Dataset updated
    Mar 25, 2022
    Authors
    Sebastian Mildiner Moraga; Emmeke Aarts
    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners. This repository contains data generated for the manuscript: "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  11. p

    Prostate Cancer - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Prostate Cancer - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/prostate-cancer
    Explore at:
    Dataset updated
    Oct 7, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset of 100 patients to implement the machine learning algorithm and thereby interpreting results The data set consists of 100 observations and 10 variables (out of which 8 numeric variables and one categorical variable and is ID) which are as follows: Id 1.Radius 2.Texture 3.Perimeter 4.Area 5.Smoothness 6.Compactness 7.diagnosis_result 8.Symmetry 9.Fractal dimension

  12. f

    Risky Business: Factor Analysis of Survey Data – Assessing the Probability...

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cees van der Eijk; Jonathan Rose (2023). Risky Business: Factor Analysis of Survey Data – Assessing the Probability of Incorrect Dimensionalisation [Dataset]. http://doi.org/10.1371/journal.pone.0118900
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Cees van der Eijk; Jonathan Rose
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper undertakes a systematic assessment of the extent to which factor analysis the correct number of latent dimensions (factors) when applied to ordered-categorical survey items (so-called Likert items). We simulate 2400 data sets of uni-dimensional Likert items that vary systematically over a range of conditions such as the underlying population distribution, the number of items, the level of random error, and characteristics of items and item-sets. Each of these datasets is factor analysed in a variety of ways that are frequently used in the extant literature, or that are recommended in current methodological texts. These include exploratory factor retention heuristics such as Kaiser’s criterion, Parallel Analysis and a non-graphical scree test, and (for exploratory and confirmatory analyses) evaluations of model fit. These analyses are conducted on the basis of Pearson and polychoric correlations. We find that, irrespective of the particular mode of analysis, factor analysis applied to ordered-categorical survey data very often leads to over-dimensionalisation. The magnitude of this risk depends on the specific way in which factor analysis is conducted, the number of items, the properties of the set of items, and the underlying population distribution. The paper concludes with a discussion of the consequences of over-dimensionalisation, and a brief mention of alternative modes of analysis that are much less prone to such problems.

  13. The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1)...

    • catalog.data.gov
    • gimi9.com
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency, Office of Research and Development (ORD), Center for Public Health and Environmental Assessment (CPHEA), Pacific Ecological Systems Division (PESD), (2025). The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: National Land Cover Database [Dataset]. https://catalog.data.gov/dataset/the-lakecat-dataset-accumulated-attributes-for-nhdplusv2-version-2-1-catchments-for-the-co-2c040
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Contiguous United States, United States
    Description

    This dataset represents the land cover data within individual local and accumulated upstream catchments for NHDPlusV2 Waterbodies based on the NLCD. Catchment boundaries in LakeCat are defined in one of two ways, on-network or off-network. The on-network catchment boundaries follow the catchments provided in the NHDPlusV2 and the metrics for these lakes mirror metrics from StreamCat, but will substitute the COMID of the NHDWaterbody for that of the NHDFlowline. The off-network catchment framework uses the NHDPlusV2 flow direction rasters to define non-overlapping lake-catchment boundaries and then links them through an off-network flow table. This data set is derived from the NLCD raster composed of 16 land cover classes (categorical data type) for the conterminous USA. Four classes of the NLCD were excluded as they were specific to Alaska land covers. This raster was produced based on a decision-tree classification of 2001, 2004, 2006, 2008, 2011, 2013, 2016, and 2019 Landsat satellite data. This dataset will include additional years as they become available.

  14. Z

    Adult dataset preprocessed

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adult dataset preprocessed [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12533513
    Explore at:
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    Schuster, Verena
    Pustozerova, Anastasia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The files "adult_train.csv" and "adult_test.csv" contain preprocessed versions of the Adult dataset from the USI repository.

    The file "adult_preprocessing.ipynb" contains a python notebook file with all the preprocessing steps used to generate "adult_train.csv" and "adult_test.csv" from the original Adult dataset.

    The preprocessing steps include:

    One-hot-encoding of categorical values

    Imputation of missing values using knn-imputer with k=1

    Standard scaling of ordinal attributes

    Note: we assume the scenario when the test set is available before training (every attribute besides the target - "income"), therefore we combine train and test sets before the preprocessing.

  15. Z

    BE-KONFORM data set (Group A) (Bedarfsermittlung im Rahmen der Erstellung...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brühmann, Boris. A. (2024). BE-KONFORM data set (Group A) (Bedarfsermittlung im Rahmen der Erstellung des Konzepts für das Forschungsdatenmanagement an der Medizinischen Fakultät der Universität Freiburg [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7390789
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Brühmann, Boris. A.
    Knaus, Jochen
    Binder, Harald
    Fichtner, Urs A.
    Horstmeier, Lukas M.
    Area covered
    Freiburg im Breisgau
    Description

    This dataset was collected within the BE-KONFORM study investigating employee's needs regarding research data management at the Medical Faculty of the University of Freiburg. The full dataset captures 236 complete cases. The study included a randomized module allocating subjects to one of two groups. Group A received the information that data will be published (n=113) and group B did not. Due to data protection law, only data of group A, where written informed consent was given (n=112) could be published here. This dataset had to be prepared for publication in order to avoid de-anonymisation of subjects. Therefore, variable [anzahl_m] was recoded into a categorical variable. Open text answers, where combination of variables might lead to an identification of the subjects were changed, where necessary. Changes are marked with brackets: [changed text]. Information on survey mode and sampling is provided in the data note.

  16. u

    Data from: A TripAdvisor Dataset for Dyadic Context Analysis

    • portalinvestigacion.udc.gal
    • data.niaid.nih.gov
    • +1more
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    López-Riobóo Botana, Iñigo Luis; Alonso-Betanzos, Amparo; Bolón-Canedo, Verónica; Guijarro-Berdiñas, Bertha; López-Riobóo Botana, Iñigo Luis; Alonso-Betanzos, Amparo; Bolón-Canedo, Verónica; Guijarro-Berdiñas, Bertha (2022). A TripAdvisor Dataset for Dyadic Context Analysis [Dataset]. https://portalinvestigacion.udc.gal/documentos/668fc448b9e7c03b01bd8b43
    Explore at:
    Dataset updated
    2022
    Authors
    López-Riobóo Botana, Iñigo Luis; Alonso-Betanzos, Amparo; Bolón-Canedo, Verónica; Guijarro-Berdiñas, Bertha; López-Riobóo Botana, Iñigo Luis; Alonso-Betanzos, Amparo; Bolón-Canedo, Verónica; Guijarro-Berdiñas, Bertha
    Description

    There are many contexts where dyadic data are present. In social networks, users are linked to a variety of items, defining interactions. In the social platform of TripAdvisor, users are linked to restaurants by means of reviews posted by them. Using the information of these interactions, we can get valuable insights for forecasting, proposing tasks related to recommender systems, sentiment analysis, text-based personalisation or text summarisation, among others. Furthermore, in the context of TripAdvisor there is a scarcity of public datasets and lack of well-known benchmarks for model assessment. We present six new TripAdvisor datasets from the restaurants of six different cities: London, New York, New Delhi, Paris, Barcelona and Madrid. If you use this data, please cite the following paper under submission process (preprint - arXiv) We exclusively collected the reviews written in English from the restaurants of each city. The tabular data is comprised of a set of six different CSV files, containing numerical, categorical and text features: parse_count: numerical (integer), corresponding number of extracted review by the web scraper (auto-incremental) author_id: categorical (string), univocal, incremental and anonymous identifier of the user (UID_XXXXXXXXXX) restaurant_name: categorical (string), name of the restaurant matching the review rating_review: numerical (integer), review score in the range 1-5 sample: categorical (string), indicating “positive” sample for scores 4-5 and “negative” for scores 1-3 review_id: categorical (string), univocal and internal identifier of the review (review_XXXXXXXXX) title_review: text, review title review_preview: text, preview of the review, truncated in the website when the text is very long review_full: text, complete review date: timestamp, publication date of the review in the format (day, month, year) city: categorical (string), city of the restaurant which the review was written for url_restaurant: text, restaurant url

  17. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Nafiz Sadman
    Nishat Anjum
    Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  18. m

    Data from: Las Vegas Strip

    • data.mendeley.com
    Updated Jul 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sérgio Moro (2017). Las Vegas Strip [Dataset]. http://doi.org/10.17632/tsf9sjdwh2.1
    Explore at:
    Dataset updated
    Jul 29, 2017
    Authors
    Sérgio Moro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Las Vegas Strip, Las Vegas
    Description

    This dataset includes quantitative and categorical features from online reviews from 21 hotels located in Las Vegas Strip, extracted from TripAdvisor (http://www.tripadvisor.com). All the 504 reviews were collected between January and August of 2016. The dataset contains 504 records and 20 tuned features (as of “status = included”, from Table 1 of the article mentioned below), 24 per hotel (two per each month, randomly selected), regarding the year of 2015.

  19. Z

    Enriched Data of Wind Farms (EDWin)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haller, Marina (2023). Enriched Data of Wind Farms (EDWin) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7558884
    Explore at:
    Dataset updated
    Jan 23, 2023
    Dataset authored and provided by
    Haller, Marina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EDWin (Enriched Data of Wind Farms) is a dataset developed to provide information about global wind farms. The dataset is based on OpenStreetMap (OSM) data and has been enriched with additional variables obtained from various databases. The dataset includes two separate data sets, one for global turbines and one for wind farms. As of September 2022, this dataset contains the most recent information available.

    The datasets have the following structures:

    Wind Turbine data

    The data for wind turbines includes 359,947 entries and 12 columns.

        Variable Name
        Description
    
    
    
    
        id
        Key value of the data point
    
    
        lon
        Longitude of the location
    
    
        lat
        Latitude of the location
    
    
        country
        Country where the turbine is located
    
    
        continent
        Continent where the turbine is located
    
    
        land cover
        The type of land on which the turbine is located
    
    
        landform
        The physical features of the land on which the turbine is located
    
    
        elevation
        The altitude of the turbine
    
    
        turbine spacing
        The distance between turbines in the wind farm
    
    
        WFid
        Wind Farm ID
    
    
        number of turbines
        The number of turbines in the wind farm
    
    
        shape
        The rough shape of the wind farm
    

    Wind Farm data

    The data for wind farms includes 20,608 entries and 11 columns.

        Variable Name
        Description
    
    
    
    
        WFid
        Wind Farm ID
    
    
        lon
        Longitude of the location (center of the wind farm)
    
    
        lat
        Latitude of the location (center of the wind farm)
    
    
        country
        Country where the wind farm is located
    
    
        continent
        Continent where the wind farm is located
    
    
        land cover
        The modal value of the land cover for the turbines in the wind farm
    
    
        landform
        The average value of the landform for the turbines in the wind farm
    
    
        elevation
        The average elevation of the turbines in the wind farm
    
    
        turbine spacing
        The average turbine spacing for the turbines in the wind farm
    
    
        number of turbines
        The number of turbines in the wind farm
    
    
        shape
        The rough shape of the wind farm
    

    Note that the data for "Country", "Continent", "Land Cover", "Landform", "Elevation" and "Turbine spacing" were collected turbine-specific and later added to the wind farm dataset in an aggregated form. For the categorical variables, the modulus of the respective turbine values was taken, and for numerical variables, the average was calculated. The two variables, number of turbines (i.e. wind farm size) and wind farm shape (i.e. a rough shape of the wind farm), were obtained from the wind farms data and added to the turbine dataset.

    Sources

    [1] Open street map. https://openstreetmap.org/. [Online] Accessed: 2022-10-02.

    [2] Cutler J. Cleveland, Christopher Morris, Dictionary of Energy (Second Edition), Elsevier, 2015, Pages 638-655, ISBN 9780080968117

    https://doi.org/10.1016/B978-0-08-096811-7.50023-8.

    [4] Dunnett, S., Sorichetta, A., Taylor, G. et al. Harmonised global datasets of wind and solar farm locations and power. Sci Data 7, 130 (2020).

    https://doi.org/10.1038/s41597-020-0469-8

    [5] Buchhorn, M. ; Lesiv, M. ; Tsendbazar, N. - E. ; Herold, M. ; Bertels, L. ; Smets, B. Copernicus Global Land Cover Layers-Collection 2. Remote Sensing 2020, 12 Volume 108, 1044. doi:10.3390/rs12061044

    [6] Theobald, D. M., Harrison-Atlas, D., Monahan, W. B., & Albano, C. M. (2015). Ecologically-relevant maps of landforms and physiographic diversity for climate adaptation planning. PloS one, 10(12), e0143619

    [7] Global Multi-resolution Terrain Elevation Data 2010 courtesy of the U.S. Geological Survey

  20. Data from: Login Data Set for Risk-Based Authentication

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono (2022). Login Data Set for Risk-Based Authentication [Dataset]. http://doi.org/10.5281/zenodo.6782156
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Login Data Set for Risk-Based Authentication

    Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.

    This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.

    The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.

    WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.

    Overview

    The data set contains the following features related to each login attempt on the SSO:

    FeatureData TypeDescriptionRange or Example
    IP AddressStringIP address belonging to the login attempt0.0.0.0 - 255.255.255.255
    CountryStringCountry derived from the IP addressUS
    RegionStringRegion derived from the IP addressNew York
    CityStringCity derived from the IP addressRochester
    ASNIntegerAutonomous system number derived from the IP address0 - 600000
    User Agent StringStringUser agent string submitted by the clientMozilla/5.0 (Windows NT 10.0; Win64; ...
    OS Name and VersionStringOperating system name and version derived from the user agent stringWindows 10
    Browser Name and VersionStringBrowser name and version derived from the user agent stringChrome 70.0.3538
    Device TypeStringDevice type derived from the user agent string(mobile, desktop, tablet, bot, unknown)1
    User IDIntegerIdenfication number related to the affected user account[Random pseudonym]
    Login TimestampIntegerTimestamp related to the login attempt[64 Bit timestamp]
    Round-Trip Time (RTT) [ms]IntegerServer-side measured latency between client and server1 - 8600000
    Login SuccessfulBooleanTrue: Login was successful, False: Login failed(true, false)
    Is Attack IPBooleanIP address was found in known attacker data set(true, false)
    Is Account TakeoverBooleanLogin attempt was identified as account takeover by incident response team of the online service(true, false)

    Data Creation

    As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.

    The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.

    • The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.

    • The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.

    • The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.

    Regarding the Data Values

    Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.

    You can recognize them by the following values:

    • ASNs with values >= 500.000

    • IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)

    Study Reproduction

    Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.

    The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.

    See RESULTS.md for more details.

    Ethics

    By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.

    The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.

    Publication

    You can find more details on our conducted study in the following journal article:

    Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
    Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
    ACM Transactions on Privacy and Security

    Bibtex

    @article{Wiefling_Pump_2022,
     author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi},
     title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}},
     journal = {{ACM} {Transactions} on {Privacy} and {Security}},
     doi = {10.1145/3546069},
     publisher = {ACM},
     year  = {2022}
    }

    License

    This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:

    Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069

    1. Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vamsi Bandi (2020). Stroke_Analysis [Dataset]. http://doi.org/10.17632/jpb5tds9f6.1

Stroke_Analysis

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 2, 2020
Authors
Vamsi Bandi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This data set contains primary attributes like Age, NIHSS, mRS, Systolic Blood pressure, Diastolic blood pressure, Glucose, Paralysis, Smoking, BMI, Cholesterol with standard deviation values 23.69, 11.27, 1.87, 24.92, 18.34, 56.11, 1.106, 0.9, 6.23, 20.26 and Mean values 47.12, 18.12, 3.67, 153.09, 103.65, 225.85, 1.36, 0.88, 33.73, 217.53

Search
Clear search
Close search
Google apps
Main menu