23 datasets found
  1. d

    Data and Code for: \"Universal Adaptive Normalization Scale (AMIS):...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kravtsov, Gennady (2025). Data and Code for: \"Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System\" [Dataset]. http://doi.org/10.7910/DVN/BISM0N
    Explore at:
    Dataset updated
    Nov 15, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Kravtsov, Gennady
    Description

    Dataset Title: Data and Code for: "Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System" Description: This dataset contains source data and processing results for validating the Adaptive Multi-Interval Scale (AMIS) normalization method. Includes educational performance data (student grades), economic statistics (World Bank GDP), and Python implementation of the AMIS algorithm with graphical interface. Contents: - Source data: educational grades and GDP statistics - AMIS normalization results (3, 5, 9, 17-point models) - Comparative analysis with linear normalization - Ready-to-use Python code for data processing Applications: - Educational data normalization and analysis - Economic indicators comparison - Development of unified metric systems - Methodology research in data scaling Technical info: Python code with pandas, numpy, scipy, matplotlib dependencies. Data in Excel format.

  2. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  3. MIMIC-IV Lab Events Subset - Preprocessed for Data Normalization...

    • zenodo.org
    text/x-python
    Updated Oct 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ali Azadi; ali Azadi (2025). MIMIC-IV Lab Events Subset - Preprocessed for Data Normalization Analysis.xlsx [Dataset]. http://doi.org/10.5281/zenodo.17272946
    Explore at:
    text/x-pythonAvailable download formats
    Dataset updated
    Oct 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    ali Azadi; ali Azadi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description


    This updated version includes a Python script (glucose_analysis.py) that performs statistical evaluation of the glucose normalization process described in the associated thesis. The script supports key analyses, including normality assessment (Shapiro–Wilk test), variance homogeneity (Levene’s test), mean comparison (ANOVA), effect size estimation (Cohen’s d), and calculation of confidence intervals for the mean difference. These results validate the impact of Min-Max normalization on clinical data structure and usability within CDSS workflows. The script is designed to be reproducible and complements the processed dataset already included in this repository.

  4. f

    Additional file 8 of pyMeSHSim: an integrative python package for biomedical...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Jun 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luo, Zhi-Hui; Chen, Zhen-Xia; Yang, Zhuang; Zhang, Hong-Yu; Shi, Meng-Wei (2020). Additional file 8 of pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000508988
    Explore at:
    Dataset updated
    Jun 19, 2020
    Authors
    Luo, Zhi-Hui; Chen, Zhen-Xia; Yang, Zhuang; Zhang, Hong-Yu; Shi, Meng-Wei
    Description

    Additional file 8 Supplementary Table 6. MeSH term number in each category correctly identified by pyMeSHSim, Dnorm, TaggerOne and Nelson’s manual work.

  5. FaceMatch project in python

    • kaggle.com
    zip
    Updated Feb 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sadaf koondhar (2023). FaceMatch project in python [Dataset]. https://www.kaggle.com/datasets/sadafkoondhar/facematch-project-in-python
    Explore at:
    zip(634 bytes)Available download formats
    Dataset updated
    Feb 20, 2023
    Authors
    sadaf koondhar
    Description

    Face recognition is a popular computer vision application that allows machines to identify and verify human faces from images or videos. Python is a widely used programming language for implementing face recognition systems due to its simplicity, flexibility, and availability of powerful libraries such as OpenCV, Dlib, and TensorFlow.

    Here's a professional description of a face recognition project in Python:

    Dataset collection: Collect a dataset of facial images to train the model. This can be done using publicly available datasets such as LFW, CelebA, or private data.

    Preprocessing: Preprocess the dataset to improve model accuracy. This includes face detection, alignment, and normalization.

    Feature extraction: Extract features from the preprocessed facial images using a pre-trained deep neural network such as VGG or ResNet. This will transform each face image into a feature vector that represents the unique characteristics of the face.

    Training: Train a machine learning model such as a support vector machine (SVM) or a neural network using the extracted features and corresponding labels. The model should be optimized to minimize false positives and false negatives.

    Testing: Evaluate the trained model on a test dataset to measure its performance. This can be done using metrics such as accuracy, precision, and recall.

    Deployment: Deploy the model to a production environment where it can be used to recognize faces in real-time. This can be done using a web-based interface or a standalone application.

    Improvements: Continuously improve the model by adding new data, refining the preprocessing steps, and tuning the model hyperparameters.

    Some additional advanced techniques that can be used to improve face recognition include:

    Face recognition with deep learning: Use deep learning techniques such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs) to train more accurate models.

    Multi-face recognition: Train models to recognize multiple faces in an image or video stream.

    Face recognition with privacy protection: Incorporate privacy protection techniques such as blurring or anonymization of facial features to protect personal information.

    Overall, a face recognition project in Python involves collecting and preprocessing data, extracting features, training and evaluating machine learning models, deploying the model in a production environment, and continuously improving the accuracy and efficiency of the system.

  6. UCI Automobile Dataset

    • kaggle.com
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Otrivedi (2023). UCI Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/otrivedi/automobile-data/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Otrivedi
    Description

    In this project, I have done exploratory data analysis on the UCI Automobile dataset available at https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

    This dataset consists of data From the 1985 Ward's Automotive Yearbook. Here are the sources

    1) 1985 Model Import Car and Truck Specifications, 1985 Ward's Automotive Yearbook. 2) Personal Auto Manuals, Insurance Services Office, 160 Water Street, New York, NY 10038 3) Insurance Collision Report, Insurance Institute for Highway Safety, Watergate 600, Washington, DC 20037

    Number of Instances: 398 Number of Attributes: 9 including the class attribute

    Attribute Information:

    mpg: continuous cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance)

    This data set consists of three types of entities:

    I - The specification of an auto in terms of various characteristics

    II - Tts assigned an insurance risk rating. This corresponds to the degree to which the auto is riskier than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is riskier (or less), this symbol is adjusted by moving it up (or down) the scale. Actuaries call this process "symboling".

    III - Its normalized losses in use as compared to other cars. This is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/specialty, etc...), and represents the average loss per car per year.

    The analysis is divided into two parts:

    Data Wrangling

    1. Pre-processing data in python
    2. Dealing with missing values
    3. Data formatting
    4. Data normalization
    5. Binning
    6. Exploratory Data Analysis

    7. Descriptive statistics

    8. Groupby

    9. Analysis of variance

    10. Correlation

    11. Correlation stats

    Acknowledgment Dataset: UCI Machine Learning Repository Data link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

  7. f

    Additional file 4 of pyMeSHSim: an integrative python package for biomedical...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Jun 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luo, Zhi-Hui; Chen, Zhen-Xia; Zhang, Hong-Yu; Yang, Zhuang; Shi, Meng-Wei (2020). Additional file 4 of pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000508974
    Explore at:
    Dataset updated
    Jun 19, 2020
    Authors
    Luo, Zhi-Hui; Chen, Zhen-Xia; Zhang, Hong-Yu; Yang, Zhuang; Shi, Meng-Wei
    Description

    Additional file 4 Supplementary Table 2. GWAS phenotypes parsed by Nelson’s group and pyMeSHSim, and the semantic similarity between them calculated by pyMeSHSim and meshes.

  8. f

    Additional file 10 of pyMeSHSim: an integrative python package for...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Jun 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shi, Meng-Wei; Chen, Zhen-Xia; Luo, Zhi-Hui; Yang, Zhuang; Zhang, Hong-Yu (2020). Additional file 10 of pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000509009
    Explore at:
    Dataset updated
    Jun 19, 2020
    Authors
    Shi, Meng-Wei; Chen, Zhen-Xia; Luo, Zhi-Hui; Yang, Zhuang; Zhang, Hong-Yu
    Description

    Additional file 10 : Supplementary Table 8. DNorm or TaggerOne perfectly recognized MeSH terms, but pyMeSHSim failed. The semantic similarity between them calculated by pyMeSHSim. pyMeSHSim_Score is semantic similarity between Nelson_MeSH _ID and pyMeSHSim_MeSH_ID, taggerOne_score is semantic similarity between Nelson_MeSH _ID and TaggerOne_MeSH_ID, DNorm_score is semantic similarity between Nelson_MeSH _ID and Dnorm_MeSH_ID.

  9. Z

    AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and...

    • data-staging.niaid.nih.gov
    • nde-dev.biothings.io
    • +1more
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Becerra-Tomé, Alberto; Rodríguez Miret, Jan; Rodríguez Ortega, Miguel; Marsol Torrent, Sergi; Lima-López, Salvador; Farré-Maduell, Eulàlia; Krallinger, Martin (2024). AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and Normalization [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14210424
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Barcelona Supercomputing Center
    Authors
    Becerra-Tomé, Alberto; Rodríguez Miret, Jan; Rodríguez Ortega, Miguel; Marsol Torrent, Sergi; Lima-López, Salvador; Farré-Maduell, Eulàlia; Krallinger, Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises a professions gazetteer generated with automatically extracted terminology from the Mesinesp2 corpus, a manually annotated corpus in which domain experts have labeled a set of scientific literature, clinical trials, and patent abstracts, as well as clinical case reports.

    A silver gazetteer for mention classification and normalization is created combining the predictions of automatic Named Entity Recognition models and normalization using Entity Linking to three controlled vocabularies SNOMED CT, NCBI and ESCO. The sources are 265,025 different documents, where 249,538 correspond to MESINESP2 Corpora and 15,487 to clinical cases from open clinical journals. From them, 5,682,000 mentions are extracted and 4,909,966 (86.42%) are normalized to any of the ontologies: SNOMED CT (4,909,966) for diseases, symptoms, drugs, locations, occupations, procedures and species; ESCO (215,140) for occupations; and NCBI (1,469,256) for species.

    The repository contains a .tsv file with the following columns:

    filenameid: A unique identifier combining the file name and mention span within the text. This ensures each extracted mention is uniquely traceable. Example: biblio-1000005#239#256 refers to a mention spanning characters 239–256 in the file with the name biblio-1000005.

    span: The specific text span (mention) extracted from the document, representing a term or phrase identified in the dataset. Example: centro oncológico.

    source: The origin of the document, indicating the corpus from which the mention was extracted. Possible values: mesinesp2, clinical_cases.

    filename: The name of the file from which the mention was extracted. Example: biblio-1000005.

    mention_class: Categories or semantic tags assigned to the mention, describing its type or context in the text. Example: ['ENFERMEDAD', 'SINTOMA'].

    codes_esco: The normalized ontology codes from the European Skills, Competences, Qualifications, and Occupations (ESCO) vocabulary for the identified mention (if applicable). This field may be empty if no ESCO mapping exists. Example: 30629002.

    terms_esco: The human-readable terms from the ESCO ontology corresponding to the codes_esco. Example: ['responsable de recursos', 'director de recursos', 'directora de recursos'].

    codes_ncbi: The normalized ontology codes from the NCBI Taxonomy vocabulary for species (if applicable). This field may be empty if no NCBI mapping exists.

    terms_ncbi: The human-readable terms from the NCBI Taxonomy vocabulary corresponding to the codes_ncbi. Example: ['Lacandoniaceae', 'Pandanaceae R.Br., 1810', 'Pandanaceae', 'Familia'].

    codes_sct: The normalized ontology codes from SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) vocabulary for diseases, symptoms, drugs, locations, occupations, procedures, and species (if applicable). Example: 22232009.

    terms_sct: The human-readable terms from the SNOMED CT ontology corresponding to the codes_sct. Example: ['adjudicador de regulaciones del seguro nacional'].

    sct_sem_tag: The semantic category tag assigned by SNOMED CT to describe the general classification of the mention. Example: environment.

    Suggestion: If you load the dataset using python, it is recommended to read the columns containing lists as follows

    import ast

    df["mention_class"] = df["mention_class"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)

    License

    This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). This means you are free to:

    Share: Copy and redistribute the material in any medium or format.

    Adapt: Remix, transform, and build upon the material for any purpose, even commercially.

    Attribution Requirement: Please credit the dataset creators appropriately, provide a link to the license, and indicate if changes were made.

    Contact

    If you have any questions or suggestions, please contact us at:

    Martin Krallinger ()

    Additional resources and corpora

    If you are interested, you might want to check out these corpora and resources:

    MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)

    MEDDOPROF corpus

    Codes Reference List (for MEDDOPROF-NORM)

    Annotation Guidelines

    Occupations Gazetteer

  10. t

    Supplementary data - stress-induced changes in magnetite: insights from a...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Supplementary data - stress-induced changes in magnetite: insights from a numerical analysis of the verwey transition - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-hbwbdgigwbcvtfyc
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Abstract: The dataset contains measurements of magnetic susceptibility in dependence of temperature of shocked magnetite and of a natural magnetite single crystal before and after manual crushing. A python code for evaluation of low-temperature susceptibility curves is included. The data are supplementary to: Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International TechnicalRemarks: The data set contains k-T curves of - Initial magnetite ore from Sydvaranger mine (Norway) - the same ore after shock at 3, 5, 10, 20 and 30 GPa under laboratory conditions and after subsequent heating to 973 K -Natural magnetite single crystal (initial and after manual crushing) The data set contains a python code for evaluation of normalized low-temperature k-T curves. Experimental conditions are described in [1]. The approach for k-T curve evaluation is described in [2] [1]: Kontny, A., Reznik, B., Boubnov, A., Göttlicher, J. and Steininger, R., 2018. Postshock Thermally Induced Transformations in Experimentally Shocked Magnetite, Geochemistry, Geophysics, Geosystems, Vol. 19, 3, pp. 921–931, doi:10.1002/2017GC007331. [2] Fuchs, H., Kontny, A. and Schilling, F.R., 2024. Stress-induced Changes in Magnetite: Insights from a Numerical Analysis of the Verwey Transition, Geophysical Journal International

  11. Wordle Answer Search Trends Dataset (2021–2025)

    • kaggle.com
    zip
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025
    Explore at:
    zip(30419 bytes)Available download formats
    Dataset updated
    Jun 26, 2025
    Authors
    Ankush Kamboj
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

    It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

    🔍 Hypothesis

    Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

    This dataset supports exploration of:

    • Wordle Answers
    • Trends for wordle answers
    • Correlation between wordle answer rarity and search interest

    Columns

    ColumnDescription
    dateDate of the Wordle puzzle
    wordCorrect 5-letter Wordle answer
    gameWordle game number
    wordfreq_commonalityNormalized frequency score using Python’s wordfreq library
    subtlex_commonalityNormalized frequency score using SUBTLEX-US dataset
    trend_day_globalGoogle search interest on the day (global, all categories)
    trend_avg_200_global200-day average search interest (global, all categories)
    trend_day_languageSearch interest on Wordle day (Language Resources category)
    trend_avg_200_language200-day average search interest (Language Resources category)

    Notes: - All trend values are relative (0–100 scale, per Google Trends)

    🧮 Methodology

    • Wordle answers were scraped from wordfinder.yourdictionary.com
    • Commonality scores were computed using:
      • wordfreq Python library
      • SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
    • Trend data was fetched using Google Trends API via pytrends

    📊 Analysis

    Can find analysis done using this data in the blog post

  12. Code for statistics, NMDS and Heatmap plots

    • figshare.com
    json
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vlastimil Novak (2025). Code for statistics, NMDS and Heatmap plots [Dataset]. http://doi.org/10.6084/m9.figshare.29700029.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vlastimil Novak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains R and Jupyter code and data for making statistical analyses and create NMDS and Heatmap figures for EcoFAB 2.0 Ring Trial. "two-way-anova-ecofab-ringtrial-stats.ipynb" generates statistical analysis utilizing RawData from the paper."Heat_KEGG.r" generates combative genomics heat map utilizing normalized KEGG pathway gene abundance in "Heat_KEGG.xlsx" dataset."Heat_TM.r" generates heat map plot for targeted metabolomics utilizing normalized metabolite intensity "Heat_TM.xlsx" dataset"NMDS_UM.r" generates NMDS plots for untargeted metabolomics utilizing raw peak heights for detected features in "NMDS_UM.xlsx" and "NMDS_UM1.xlsx" datasets"NMDS_seq.r" generates NMDS plots for root and media microbiome composition utilizing relative bacterial abundances from 16S rRNA sequencing in "Seq_media.xlsx" and "Seq_root" datasets.

  13. Metabolomics Data Preprocessing PQN PCA

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Metabolomics Data Preprocessing PQN PCA [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/metabolomics-data-preprocessing-pqn-pca
    Explore at:
    zip(22763 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset provides a step-by-step pipeline for preprocessing metabolomics data.

    The pipeline implements Probabilistic Quotient Normalization (PQN) to correct dilution effects in metabolomics measurements.

    Includes guidance on handling raw metabolomics datasets obtained from LC-MS or NMR experiments.

    Demonstrates Principal Component Analysis (PCA) for dimensionality reduction and exploratory data analysis.

    Includes data visualization techniques to interpret PCA results effectively.

    Suitable for metabolomics researchers and data scientists working on omics data.

    Enables better reproducibility of preprocessing workflows for metabolomics studies.

    Can be used to normalize data, detect outliers, and identify major patterns in metabolomics datasets.

    Provides a Python-based notebook that is easy to adapt to new datasets.

    Includes example datasets and code snippets for immediate application.

    Helps users understand the impact of normalization on downstream statistical analyses.

    Supports integration with other metabolomics pipelines or machine learning workflows.

  14. Z

    Task Scheduler Performance Survey Results

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakub Beránek; Stanislav Böhm; Vojtěch Cima (2020). Task Scheduler Performance Survey Results [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2630588
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    IT4Innovations
    Authors
    Jakub Beránek; Stanislav Böhm; Vojtěch Cima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Task scheduler performance survey

    This dataset contains results of task graph scheduler performance survey. The results are stored in the following files, which correspond to simulations performed on the elementary, irw and pegasus task graph datasets published at https://doi.org/10.5281/zenodo.2630384.

    elementary-result.zip

    irw-result.zip

    pegasus-result.zip

    The files contain compressed pandas dataframes in CSV format, it can be read with the following Python code: python import pandas as pd frame = pd.read_csv("elementary-result.zip")

    Each row in the frame corresponds to a single instance of a task graph that was simulated with a specific configuration (network model, scheduler etc.). The list below summarizes the meaning of the individual columns.

    graph_name - name of the benchmarked task graph

    graph_set - name of the task graph dataset from which the graph originates

    graph_id - unique ID of the graph

    cluster_name - type of cluster used in this instance the format is x; 32x16 means 32 workers, each with 16 cores

    bandwidth - network bandwidth [MiB]

    netmodel - network model (simple or maxmin)

    scheduler_name - name of the scheduler

    imode - information mode

    min_sched_interval - minimal scheduling delay [s]

    sched_time - duration of each scheduler invocation [s]

    time - simulated makespan of the task graph execution [s]

    execution_time - real duration of all scheduler invocations [s]

    total_transfer - amount of data transferred amongst workers [MiB]

    The file charts.zip contains charts obtained by processing the datasets. On the X axis there is always bandwidth in [MiB/s]. There are the following files:

    [DATASET]-schedulers-time - Absolute makespan produced by schedulers [seconds]

    [DATASET]-schedulers-score - The same as above but normalized with respect to the best schedule (shortest makespan) for the given configuration.

    [DATASET]-schedulers-transfer - Sums of transfers between all workers for a given configuration [MiB]

    [DATASET]-[CLUSTER]-netmodel-time - Comparison of netmodels, absolute times [seconds]

    [DATASET]-[CLUSTER]-netmodel-score - Comparison of netmodels, normalized to the average of model "simple"

    [DATASET]-[CLUSTER]-netmodel-transfer - Comparison of netmodels, sum of transfered data between all workers [MiB]

    [DATASET]-[CLUSTER]-schedtime-time - Comparison of MSD, absolute times [seconds]

    [DATASET]-[CLUSTER]-schedtime-score - Comparison of MSD, normalized to the average of "MSD=0.0" case

    [DATASET]-[CLUSTER]-imode-time - Comparison of Imodes, absolute times [seconds]

    [DATASET]-[CLUSTER]-imode-score - Comparison of Imodes, normalized to the average of "exact" imode

    Reproducing the results

    1. Download and install Estee (https://github.com/It4innovations/estee)

    $ git clone https://github.com/It4innovations/estee $ cd estee $ pip install .

    1. Generate task graphs You can either use the provided script benchmarks/generate.py to generate graphs from three categories (elementary, irw and pegasus):

    $ cd benchmarks $ python generate.py elementary.zip elementary $ python generate.py irw.zip irw $ python generate.py pegasus.zip pegasus

    or use our task graph dataset that is provided at https://doi.org/10.5281/zenodo.2630384.

    1. Run benchmarks To run a benchmark suite, you should prepare a JSON file describing the benchmark. The file that was used to run experiments from the paper is provided in benchmark.json. Then you can run the benchmark using this command:

    $ python pbs.py compute benchmark.json

    The benchmark script can be interrupted at any time (for example using Ctrl+C). When interrupted, it will store the computed results to the result file and restore the computation when launched again.

    1. Visualizing results

    $ python view.py --all

    The resulting plots will appear in a folder called outputs.

  15. Z

    TCGA Glioblastoma Multiforme (GBM) Gene Expression

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Baskiyar (2023). TCGA Glioblastoma Multiforme (GBM) Gene Expression [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8187688
    Explore at:
    Dataset updated
    Jul 27, 2023
    Authors
    Swati Baskiyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    The Cancer Genome Atlas (TCGA) was a large-scale collaborative project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). It aimed to comprehensively characterize the genomic and molecular landscape of various cancer types. This dataset contains information about GBM, an aggressive and highly malignant brain tumor that arises from glial cells, characterized by rapid growth and infiltrative behavior. The gene expression profile was measured experimentally using the Affymetrix HT Human Genome U133a microarray platform by the Broad Institute of MIT and Harvard University cancer genomic characterization center. The Sample IDs serve as unique identifiers for each sample.

    Inspiration:

    This dataset was uploaded to UBRITE for GTKB project.

    Instruction:

    The log2(x) normalization was removed, and z-normalization was performed on the dataset using a Python script.

    Acknowledgments:

    Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8

    The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013). https://doi.org/10.1038/ng.2764

    U-BRITE last update: 07/13/2023

  16. Zomato Food Delivery Insight Data

    • kaggle.com
    zip
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    I_Vasanth_P (2025). Zomato Food Delivery Insight Data [Dataset]. https://www.kaggle.com/datasets/ivasanthp/zomato-food-delivery-insight-data
    Explore at:
    zip(123449 bytes)Available download formats
    Dataset updated
    Jul 14, 2025
    Authors
    I_Vasanth_P
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Problem Statement:

    Imagine you are working as a data scientist at Zomato. Your goal is to enhance operational efficiency and improve customer satisfaction by analyzing food delivery data. You need to build an interactive Streamlit tool that enables seamless data entry for managing orders, customers, restaurants, and deliveries. The tool should support robust database operations like adding columns or creating new tables dynamically while maintaining compatibility with existing code. ##Business_Use_Cases: Order Management: Identifying peak ordering times and locations. Tracking delayed and canceled deliveries. Customer Analytics: Analyzing customer preferences and order patterns. Identifying top customers based on order frequency and value. Delivery Optimization: Analyzing delivery times and delays to improve logistics. Tracking delivery personnel performance. Restaurant Insights: Evaluating the most popular restaurants and cuisines. Monitoring order values and frequency by restaurant.

    #Approach: 1) Dataset Creation: Use Python (Faker) to generate synthetic datasets for customers, orders, restaurants, and deliveries. Populate the SQL database with these datasets. 2) Database Design: Create normalized SQL tables for Customers, Orders, Restaurants, and Deliveries. Ensure compatibility for dynamic schema changes (e.g., adding columns, creating new tables). 3) Data Entry Tool: Develop a Streamlit app for: Adding, updating, and deleting records in the SQL database. Dynamically creating new tables or modifying existing ones. 4) Data Insights: Use SQL queries and Python to extract insights like peak times, delayed deliveries, and customer trends. Visualize the insights in the Streamlit app.(Add on) 5) OOP Implementation: Encapsulate database operations in Python classes. Implement robust and reusable methods for CRUD (Create, Read, Update, Delete) operations. 6) Order Management: Identifying peak ordering times and locations. Tracking delayed and canceled deliveries. 7) Customer Analytics: Analyzing customer preferences and order patterns. Identifying top customers based on order frequency and value.

    8) Delivery Optimization: Analyzing delivery times and delays to improve logistics. Tracking delivery personnel performance. 9) Restaurant Insights: Evaluating the most popular restaurants and cuisines. Monitoring order values and frequency by restaurant.

    **##Results: ** By the end of this project, learners will achieve: A fully functional SQL database for managing food delivery data. An interactive Streamlit app for data entry and analysis. Should write 20 sql queries and do analysis. Dynamic compatibility with database schema changes. Comprehensive insights into order trends, delivery performance, and customer behavior.

    ##Project Evaluation metrics: Database Design: Proper normalization of tables and relationships between them. Code Quality: Use of OOP principles to ensure modularity and scalability. Robust error handling for database operations. Streamlit App Functionality: Usability of the interface for data entry and insights. Compatibility with schema changes. Data Insights: Use 20 sql queries for data analysis Documentation: Clear and comprehensive explanation of the code and approach.

  17. FAERS Drug Event Signal Dataset

    • kaggle.com
    zip
    Updated Aug 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anupam Debnath (2025). FAERS Drug Event Signal Dataset [Dataset]. https://www.kaggle.com/datasets/anurmi/faers-drug-event-signals
    Explore at:
    zip(21083041 bytes)Available download formats
    Dataset updated
    Aug 17, 2025
    Authors
    Anupam Debnath
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📄 Kaggle Dataset Description (FAERS Signals) Title

    FDA FAERS Adverse Drug Event Signals (Processed)

    Subtitle

    Drug–Adverse Event counts and disproportionality metrics (PRR, ROR) from the FDA’s Adverse Event Reporting System (FAERS).

    🧾 Overview

    The FDA Adverse Event Reporting System (FAERS) is a publicly available database of adverse drug event reports, medication error reports, and product quality complaints. This dataset provides processed, analysis-ready FAERS data, focusing on drug–adverse event pairs with quarterly counts and basic signal detection metrics.

    📊 What’s Inside? faers_drug_event_counts.csv

    Clean, normalized table of drug–event pairs

    Quarterly (QTR) counts of adverse events

    faers_signals_prr_ror.csv

    Proportional Reporting Ratio (PRR) and Reporting Odds Ratio (ROR) for each drug–event pair

    Simple thresholds applied (min. count filter)

    🔍 Potential Use Cases

    Pharmacovigilance signal detection

    Drug safety surveillance Predictive modeling of future label changes

    Text/data mining for biomedical research

    Event-driven investment research (biopharma risk signals)

    ⚠️ Limitations

    FAERS is a spontaneous reporting system (subject to underreporting, duplication, and reporting bias).

    Counts do not equal incidence rates.

    Use this data for signal detection, not risk quantification.

    This dataset is processed for Kaggle use and may not contain all FAERS fields.

    📚 Source & License

    Source: FDA FAERS Public Data

    License: US Government Work (Public Domain)

    🔥 This dataset bridges raw FDA data and ML-ready inputs, helping researchers, data scientists, and regulatory experts run faster signal detection workflows.

    This dataset contains processed outputs from the FDA Adverse Event Reporting System (FAERS).
    It provides cleaned quarterly counts of drug–event pairs along with disproportionality metrics such as PRR (Proportional Reporting Ratio) and ROR (Reporting Odds Ratio).

    📊 Contents

    • faers_drug_event_counts.csv
      Raw counts of drug–event pairs per quarter.

    • faers_signals_prr_ror.csv
      Signal detection metrics (PRR, ROR) with thresholds applied.

    🧾 Schema

    faers_drug_event_counts.csv

    ColumnDescription
    DRUGNAME_NORMNormalized drug name
    QTRReport quarter (YYYYQn)
    PT_NORMMedDRA Preferred Term (adverse event)
    n_reportsNumber of case reports
    quarter_folderSource folder of ASCII data

    faers_signals_prr_ror.csv

    ColumnDescription
    DRUGNAME_NORMNormalized drug name
    PT_NORMMedDRA Preferred Term
    n_reportsCase counts
    PRRProportional Reporting Ratio
    RORReporting Odds Ratio
    PRR_signalBoolean flag if PRR > threshold
    ROR_signalBoolean flag if ROR > threshold

    🚀 Example Usage

    import pandas as pd
    
    # Load drug-event counts
    counts = pd.read_csv("/kaggle/input/faers-signals/faers_drug_event_counts.csv")
    
    # Top 10 drugs by number of reports
    print(counts.groupby("DRUGNAME_NORM")["n_reports"].sum().nlargest(10))
    
    # Load signals
    signals = pd.read_csv("/kaggle/input/faers-signals/faers_signals_prr_ror.csv")
    
    # Find signals for Metformin
    metformin_signals = signals[signals["DRUGNAME_NORM"] == "METFORMIN"]
    print(metformin_signals.head())
    
    
    📌 Citation
    
    If you use this dataset, please cite:
    
    FDA FAERS (2024–2025). Processed by anurmi.
    Data source: U.S. Food & Drug Administration (public domain).
    
    🔖 Tags
    
    pharmacovigilance adverse-events drug-safety FDA healthcare pharmacology signal-detection medical-data public-health time-series
    
  18. Python Energy Microscope: Benchmarking 5 Execution

    • kaggle.com
    zip
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Fatin Shadab Turja (2025). Python Energy Microscope: Benchmarking 5 Execution [Dataset]. https://www.kaggle.com/datasets/fatinshadab/python-energy-microscope-dataset
    Explore at:
    zip(176065 bytes)Available download formats
    Dataset updated
    Jun 18, 2025
    Authors
    Md. Fatin Shadab Turja
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Description

    This dataset was created as part of the research project “Python Under the Microscope: A Comparative Energy Analysis of Execution Methods” (2025). The study explores the environmental sustainability of Python software by benchmarking five execution strategies—CPython, PyPy, Cython, ctypes, and py_compile—across 15 classical algorithmic workloads.

    Purpose & Motivation

    With energy and carbon efficiency becoming critical in modern computing, this dataset aims to:

    Quantify execution time, CPU energy usage, and carbon emissions

    Enable reproducible analysis of performance–sustainability trade-offs

    Introduce and validate the GreenScore, a composite metric for sustainability-aware software evaluation

    Data Collection & Tools

    All benchmarks were executed on a controlled laptop environment (Intel Core i5-1235U, Linux 6.8). Energy was measured via Intel RAPL counters using the pyRAPL library. Carbon footprint was estimated using a conversion factor of 0.000475 gCO₂ per joule based on regional electricity intensity.

    Each algorithm–method pair was run 50 times, capturing robust statistics for energy (μJ), time (s), and derived CO₂ emissions.

    Dataset Structure Overview

    Per-method folders (cpython/, pypy/, etc.) contain raw energy/ and time/ CSV files for all 15 benchmarks (50 trials each), as well as mean summaries.

    Aggregate folder includes combined metric comparisons, normalized data, and carbon footprint estimations.

    Analysis folder contains derived datasets: normalized scores, standard deviation, and the final GreenScore rankings used in our paper.

    Usage

    This dataset is ideal for:

    Reproducible software sustainability studies

    Benchmarking Python execution strategies

    Analyzing energy–performance–carbon trade-offs

    Validating green metrics and measurement tools

    Researchers and practitioners are encouraged to use, extend, and cite this dataset in sustainability-aware software design.

  19. GWPZ La Dorada

    • kaggle.com
    zip
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Iturriago (2025). GWPZ La Dorada [Dataset]. https://www.kaggle.com/datasets/lucasiturriago/gwpz-la-dorada
    Explore at:
    zip(107332771 bytes)Available download formats
    Dataset updated
    Aug 30, 2025
    Authors
    Lucas Iturriago
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    La Dorada
    Description

    Dataset Description

    This dataset contains geospatial and remote sensing data for the La Dorada area, Colombia, prepared for deep learning tasks (e.g., water potential mapping using Conv1D/MLP models). The dataset is stored in NPZ format for easy loading with NumPy and TensorFlow.

    Data Structure

    The dataset consists of four main components:

    1. Variables (x_var)

    • Description: Spatially distributed variables representing environmental, geological, and anthropogenic factors.
    • Shape: [3584, 1097, 10] (rows, columns, channels)
    • Channels:

      1. Gravimetry
      2. Drainage
      3. Urban centers
      4. Geological faults
      5. Lakes
      6. Wetlands
      7. Red Edge 1
      8. Red Edge 2
      9. Red Edge 3
      10. NIR (Near Infrared)
    • Normalization: All values are normalized to the range [0, 1].

    2. Images (x_img)

    • Description: True-color Sentinel-2 imagery (RGB).
    • Shape: [3584, 1097, 3, 1] (rows, columns, channels, extra dimension for Conv1D compatibility)
    • Channels: Red, Green, Blue
    • Normalization: [0, 1]

    3. Categorical Layers (x_cat)

    • Description: Categorical data representing vegetation cover.
    • Shape: [3584, 1097, 1] (rows, columns, channels)
    • Channels: Vegetation cover categories (encoded as numeric codes)
    • Data type: int32
    • Normalization: Not normalized (encoded categories)

    4. Target (y)

    • Description: Water points heatmap derived from geospatial measurements.
    • Shape: [3584, 1097, 1] (rows, columns, channels)
    • Normalization: [0, 1]

    NPZ Pipeline

    The NPZ file contains all arrays in their original shapes:

    import numpy as np
    
    data = np.load("dataset_ladorada.npz")
    x_var = data["x_var"]  # shape: (3584, 1097, 10)
    x_img = data["x_img"]  # shape: (3584, 1097, 3, 1)
    x_cat = data["x_cat"]  # shape: (3584, 1097, 1)
    y = data["y"]      # shape: (3584, 1097, 1)
    
    • You can load the arrays directly and feed them into your TensorFlow or PyTorch pipelines.
    • Each array preserves its dtype for compatibility: float32 for continuous variables and target, int32 for categorical data.

    Notes

    • All arrays have been resized to [3584, 1097] for consistency.
    • The dataset is designed for pixel-wise prediction, where each pixel represents a spatial location.
    • Input variables include both continuous (numeric) and categorical features.
    • The extra dimension in x_img ([..., 1]) allows direct usage in Conv1D layers without reshaping.
  20. Student Performance and Learning Behavior Dataset

    • kaggle.com
    zip
    Updated Sep 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adil Shamim (2025). Student Performance and Learning Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/adilshamim8/student-performance-and-learning-style
    Explore at:
    zip(78897 bytes)Available download formats
    Dataset updated
    Sep 4, 2025
    Authors
    Adil Shamim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a comprehensive view of student performance and learning behavior, integrating academic, demographic, behavioral, and psychological factors.

    It was created by merging two publicly available Kaggle datasets, resulting in a unified dataset of 14,003 student records with 16 attributes. All entries are anonymized, with no personally identifiable information.

    Key Features

    • Study behaviors & engagementStudyHours, Attendance, Extracurricular, AssignmentCompletion, OnlineCourses, Discussions
    • Resources & environmentResources, Internet, EduTech
    • Motivation & psychologyMotivation, StressLevel
    • DemographicsGender, Age (18–30 years)
    • Learning preferenceLearningStyle
    • Performance indicatorsExamScore, FinalGrade

    Objectives & Use Cases

    The dataset can be used for:

    • Predictive modeling → Regression/classification of student performance (ExamScore, FinalGrade)
    • Clustering analysis → Identifying learning behavior groups with K-Means or other unsupervised methods
    • Educational analytics → Exploring how study habits, stress, and motivation affect outcomes
    • Adaptive learning research → Linking behavioral patterns to personalized learning pathways

    Analysis Pipeline (from original study)

    The dataset was analyzed in Python using:

    • Preprocessing → Encoding, normalization (z-score, Min–Max), deduplication
    • Clustering → K-Means, Elbow Method, Silhouette Score, Davies–Bouldin Index
    • Dimensionality Reduction → PCA (2D/3D visualizations)
    • Statistical Analysis → ANOVA, regression for group differences
    • Interpretation → Mapping clusters to LearningStyle categories & extracting insights for adaptive learning

    File

    • merged_dataset.csv → 14,003 rows × 16 columns Includes student demographics, behaviors, engagement, learning styles, and performance indicators.

    Provenance

    This dataset is an excellent playground for educational data mining — from clustering and behavioral analytics to predictive modeling and personalized learning applications.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kravtsov, Gennady (2025). Data and Code for: \"Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System\" [Dataset]. http://doi.org/10.7910/DVN/BISM0N

Data and Code for: \"Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System\"

Explore at:
Dataset updated
Nov 15, 2025
Dataset provided by
Harvard Dataverse
Authors
Kravtsov, Gennady
Description

Dataset Title: Data and Code for: "Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System" Description: This dataset contains source data and processing results for validating the Adaptive Multi-Interval Scale (AMIS) normalization method. Includes educational performance data (student grades), economic statistics (World Bank GDP), and Python implementation of the AMIS algorithm with graphical interface. Contents: - Source data: educational grades and GDP statistics - AMIS normalization results (3, 5, 9, 17-point models) - Comparative analysis with linear normalization - Ready-to-use Python code for data processing Applications: - Educational data normalization and analysis - Economic indicators comparison - Development of unified metric systems - Methodology research in data scaling Technical info: Python code with pandas, numpy, scipy, matplotlib dependencies. Data in Excel format.

Search
Clear search
Close search
Google apps
Main menu