15 datasets found
  1. seaborn_tips_dataset

    • kaggle.com
    Updated Apr 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranjeet Jain (2018). seaborn_tips_dataset [Dataset]. https://www.kaggle.com/datasets/ranjeetjain3/seaborn-tips-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ranjeet Jain
    Description

    Dataset

    This dataset was created by Ranjeet Jain

    Contents

  2. Bank Data Analysis

    • kaggle.com
    Updated Mar 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Gallegos (2022). Bank Data Analysis [Dataset]. https://www.kaggle.com/stevegallegos/bank-marketing-data-set/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Steve Gallegos
    Description

    Data Set Information

    The bank.csv dataset describes about a phone call between customer and customer care staffs who are working for Portuguese banking institution. The dataset is about, whether the customer will get the scheme or product such as bank term deposit. Maximum the data will have ‘yes’ or ‘no’ type data.

    Goal

    The main goal is to predict if clients will subscribe to a term deposit or not.

    Attribute Information

    -Input Variables -

    Bank Client Data: 1 - age: (numeric) 2 - job: type of job (categorical: admin., blue-collar, entrepreneur, housemaid, management, retired, self-employed, services, student, technician, unemployed, unknown) 3 - marital: marital status (categorical: divorced, married, single, unknown; note: divorced means either divorced or widowed) 4 - education: (categorical: basic.4y, basic.6y, basic.9y, high.school, illiterate, professional.course, university.degree, unknown) 5 - default: has credit in default? (categorical: no, yes, unknown) 6 - housing: has housing loan? (categorical: no, yes, unknown) 7 - loan: has personal loan? (categorical: no, yes, unknown)

    Related with the Last Contact of the Current Campaign: 8 - contact: contact communication type (categorical: cellular, telephone) 9 - month: last contact month of year (categorical: jan, feb, mar, ..., nov, dec) 10 - day_of_week: last contact day of the week (categorical: mon, tue, wed, thu, fri) 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

    Other Attributes: 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: failure, nonexistent, success)

    #Social and Economic Context Attributes 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric)

    Output Variable (Desired Target): 21 - y (deposit): - has the client subscribed a term deposit? (binary: yes, no) -> changed column title from '***y***' to '***deposit***'

    Source

    [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

  3. A

    ‘Waiter's Tips Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Waiter's Tips Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-waiter-s-tips-dataset-b284/7835f609/?iid=004-884&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Waiter's Tips Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aminizahra/tips-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    One waiter recorded information about each tip he received over a period of a few months working in one restaurant. In all he recorded 244 tips.

    Acknowledgements

    The data was reported in a collection of case studies for business statistics.

    Bryant, P. G. and Smith, M (1995) Practical Data Analysis: Case Studies in Business Statistics. Homewood, IL: Richard D. Irwin Publishing

    The dataset is also available through the Python package Seaborn.

    Hint

    Of course, this database has additional columns compared to other tips datasets.

    Dataset info

    RangeIndex: 244 entries, 0 to 243

    Data columns (total 11 columns):

    # Column Non-Null Count Dtype

    0 total_bill 244 non-null float64

    1 tip 244 non-null float64

    2 sex 244 non-null object

    3 smoker 244 non-null object

    4 day 244 non-null object

    5 time 244 non-null object

    6 size 244 non-null int64

    7 price_per_person 244 non-null float64

    8 Payer Name 244 non-null object

    9 CC Number 244 non-null int64

    10 Payment ID 244 non-null object

    dtypes: float64(3), int64(2), object(6)

    Some details

    total_bill a numeric vector, the bill amount (dollars)

    tip a numeric vector, the tip amount (dollars)

    sex a factor with levels Female Male, gender of the payer of the bill

    Smoker a factor with levels No Yes, whether the party included smokers

    day a factor with levels Friday Saturday Sunday Thursday, day of the week

    time a factor with levels Day Night, rough time of day

    size a numeric vector, number of ppartyeople in

    --- Original source retains full ownership of the source dataset ---

  4. Hepatocellular Carcinoma TCGA-LIHC Mutation Information CSV...

    • figshare.com
    txt
    Updated Oct 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tane Kim (2021). Hepatocellular Carcinoma TCGA-LIHC Mutation Information CSV (TCGA_LIHC_Mutation_Input.csv) [Dataset]. http://doi.org/10.6084/m9.figshare.16822900.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 17, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tane Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: While the acquired risk factors of liver cancer in Asia are relatively well understood, the underlying genetic background of liver cancer in Asians has not been well established or correlated with clinical outcomes. Objective: To identify gene mutations linked with worse outcomes in Asian patients with hepatocellular carcinoma (HCC). Methods: A total of 347 Asian and Non-Asian patients with HCC were analyzed in this study. TCGA patient mutation and clinical data were downloaded through TCGAbiolinksGUI and analyzed using the Python NumPy, Matplotlib, seaborn, and SciPy libraries. Statistical significance was determined by Welch’s t-test (unequal variances t-test), with P-values < 0.05 considered to be statistically significant. Results: Mutations in five genes (TP53, TTN, OBSCN, MUC5B, CSMD1) were statistically linked with increased mortality in Asians compared to non-Asians, four of which (TTN, OBSCN, MUC5B, CSMD1) were also more prevalent in the Asian population. Within the Asian cohort, two gene mutations (TTN, HMCN1) were statistically linked with worse outcomes. The TP53 mutation predicts worse outcomes within the non-Asian cohort, but not within the Asian cohort. Conclusions: This study identified multiple genetic biomarkers that can aid in the recognition, surveillance, prognosis, and gene therapy of HCC.

  5. A

    ‘A Waiter's Tips’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘A Waiter's Tips’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-a-waiter-s-tips-8e84/83b2a987/?iid=009-810&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘A Waiter's Tips’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jsphyg/tipping on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    One waiter recorded information about each tip he received over a period of a few months working in one restaurant. In all he recorded 244 tips.

    Can you predict the tip amount?

    Acknowledgements

    The data was reported in a collection of case studies for business statistics.

    Bryant, P. G. and Smith, M (1995) Practical Data Analysis: Case Studies in Business Statistics. Homewood, IL: Richard D. Irwin Publishing

    The dataset is also available through the Python package Seaborn.

    --- Original source retains full ownership of the source dataset ---

  6. D

    Data from: Code for: Experimental Investigations of the Flow-Following...

    • darus.uni-stuttgart.de
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Hofmann; Ryan Rautenbach (2023). Code for: Experimental Investigations of the Flow-Following Capabilities and Hydrodynamic Characteristics of Lagrangian Sensor Particles With Respect to Their Centre of Mass [Dataset]. http://doi.org/10.18419/DARUS-3314
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    DaRUS
    Authors
    Sebastian Hofmann; Ryan Rautenbach
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Dataset funded by
    DFG
    Description

    Data for 2D Lagrangian Particle tracking and evaluation for their hydrodynamic characteristics ## Abstract This dataset entails PYTHON code for fluid mechanic evaluation of Lagrangian Particles with the "Consensus-Based tracking with Selective Rejection of Tracklets" (CSRT) algorithm in the "OpenCV" library, written by Ryan Rautenbach in the framework of his Master thesis. ## Workflow for Lagrangian Particle tracking and evaluatio via OpenCV In the following a brief introduction and guide based on the folders in the repository is laid out. More code specific instructions can be found in the respective codes. working_env_RMR.yml --> Contains the entire environment including software versions (here used with Spyder IDE and Conda) with which the datasets were evaluated. 01 --> The tracking always begins with the same 01_milti[...] folder in which the python code with OpenCV algorithm is located. For tracking the tracking to work certain directories are required in which the raw images are to be stored (separate from anything else) as well as a directory in which the results are to be save (not the same directory as the raw data). After tracking is completed for all respective experiments and the results directories are adequately labelled and stored any of the other code files can be used for respective analyses. The order of folders beyond the first 01 directory has no relevance to the order of evaluation however can ease the understanding of evaluated data if followed. 02 --> Evaluation of amount of circulations and respective circulation time in experimental vat. (code can be extended to calculate the circulation time based on the various plains that are artificially set) 03 --> Code for the calculation of the amount of contacts with the vat floor. Code requires certain visual evaluations based on the LP trajectories, as the plain/barrier for the contact evaluation has to be manually set. 04 --> Contains two codes that can be applied to results data to combine individual results into larger more processable arrays within python. 05 --> Contains the code to plot the trajectory of single experiments of Lagrangian particles based on their positional results and velocity at respective position, highlighting the trajectory over the experiment. 06 --> Condes to create 1D histograms based on the probability density distribution and velocity distributions in cumulative experiments. 07 --> Codes for plotting the 2D probability density distribution (2D Histograms) of Lagrangian Particles based on the cumulative experiments. Code provides values for the 2D grid, plotting is conducted in Origin Lab or similar graphing tools, graphing can also be conducted in python whereby the seaborn (matplotlib) library is suggested. 08 --> Contain the code for the dimensionless evaluation of the results based on the respective Stokes number approaches and weighted averages. 2D histograms are also vital to this evaluation, whereby the plotting is again conducted in Origin Lab as values are only calculated in code. 09 --> Directory does not contain any python codes but instead contains the respective Origin Lab files for the graphing, plotting and evaluation of results calculated via python is given. Respective tables, histograms and heat maps are hereby given to be used as templates if necessary.

  7. Apple Leaf Disease Detection Using Vision Transformer

    • zenodo.org
    text/x-python
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amreen Batool; Amreen Batool (2025). Apple Leaf Disease Detection Using Vision Transformer [Dataset]. http://doi.org/10.5281/zenodo.15702007
    Explore at:
    text/x-pythonAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amreen Batool; Amreen Batool
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a Python script for classifying apple leaf diseases using a Vision Transformer (ViT) model. The dataset used is the Plant Village dataset, which contains images of apple leaves with four classes: Healthy, Apple Scab, Black Rot, and Cedar Apple Rust. The script includes data preprocessing, model training, and evaluation steps.

    Table of Contents

    Introduction

    The goal of this project is to classify apple leaf diseases using a Vision Transformer (ViT) model. The dataset is divided into four classes: Healthy, Apple Scab, Black Rot, and Cedar Apple Rust. The script includes data preprocessing, model training, and evaluation steps.

    Code Explanation

    1. Importing Libraries

    • The script starts by importing necessary libraries such as matplotlib, seaborn, numpy, pandas, tensorflow, and sklearn. These libraries are used for data visualization, data manipulation, and building/training the deep learning model.

    2. Visualizing the Dataset

    • The walk_through_dir function is used to explore the dataset directory structure and count the number of images in each class.
    • The dataset is divided into Train, Val, and Test directories, each containing subdirectories for the four classes.

    3. Data Augmentation

    • The script uses ImageDataGenerator from Keras to apply data augmentation techniques such as rotation, horizontal flipping, and rescaling to the training data. This helps in improving the model's generalization ability.
    • Separate generators are created for training, validation, and test datasets.

    4. Patch Visualization

    • The script defines a Patches layer that extracts patches from the images. This is a crucial step in Vision Transformers, where images are divided into smaller patches that are then processed by the transformer.
    • The script visualizes these patches for different patch sizes (32x32, 16x16, 8x8) to understand how the image is divided.

    5. Model Training

    • The script defines a Vision Transformer (ViT) model using TensorFlow and Keras. The model is compiled with the Adam optimizer and categorical cross-entropy loss.
    • The model is trained for a specified number of epochs, and the training history is stored for later analysis.

    6. Model Evaluation

    • After training, the model is evaluated on the test dataset. The script generates a confusion matrix and a classification report to assess the model's performance.
    • The confusion matrix is visualized using seaborn to provide a clear understanding of the model's predictions.

    7. Visualizing Misclassified Images

    • The script includes functionality to visualize misclassified images, which helps in understanding where the model is making errors.

    8. Fine-Tuning and Learning Rate Adjustment

    • The script demonstrates how to fine-tune the model by adjusting the learning rate and re-training the model.

    Steps for Implementation

    1. Dataset Preparation

      • Ensure that the dataset is organized into Train, Val, and Test directories, with each directory containing subdirectories for each class (Healthy, Apple Scab, Black Rot, Cedar Apple Rust).
    2. Install Required Libraries

      • Install the necessary Python libraries using pip:
        pip install tensorflow matplotlib seaborn numpy pandas scikit-learn
    3. Run the Script

      • Execute the script in a Python environment. The script will automatically:
        • Load and preprocess the dataset.
        • Apply data augmentation.
        • Train the Vision Transformer model.
        • Evaluate the model and generate performance metrics.
    4. Analyze Results

      • Review the confusion matrix and classification report to understand the model's performance.
      • Visualize misclassified images to identify potential areas for improvement.
    5. Fine-Tuning

      • Experiment with different patch sizes, learning rates, and data augmentation techniques to improve the model's accuracy.
  8. Replication Kit: "Are Unit and Integration Test Definitions Still Valid for...

    • zenodo.org
    • explore.openaire.eu
    application/gzip, bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Trautsch; Fabian Trautsch; Steffen Herbold; Jens Grabowski; Steffen Herbold; Jens Grabowski (2020). Replication Kit: "Are Unit and Integration Test Definitions Still Valid for Modern Java Projects? An Empirical Study on Open-Source Projects" [Dataset]. http://doi.org/10.5281/zenodo.1415334
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fabian Trautsch; Fabian Trautsch; Steffen Herbold; Jens Grabowski; Steffen Herbold; Jens Grabowski
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Replication Kit for the Paper "Are Unit and Integration Test Definitions Still Valid for Modern Java Projects? An Empirical Study on Open-Source Projects"
    This additional material shall provide other researchers with the ability to replicate our results. Furthermore, we want to facilitate further insights that might be generated based on our data sets.

    Structure
    The structure of the replication kit is as follows:

    • additional_visualizations: contains additional visualizations (Venn-Diagrams) for each projects for each of the data sets that we used
    • data_analysis: contains two python scripts that we used to analyze our raw data (one for each research question)
    • data_collection_tools: contains all source code used for the data collection, including the used versions of the COMFORT framework, the BugFixClassifier, and the used tools of the SmartSHARK environment;
    • mongodb_no_authors: Archived dump of our MongoDB that we created by executing our data collection tools. The "comfort" database can be restored via the mongorestore command.


    Additional Visualizations
    We provide two additional visualizations for each project:
    1)

    For each of these data sets there exist one visualization for each project that shows four Venn-Diagrams for each of the different defect types. These Venn-Diagrams show the number of defects that were detected by either unit, or integration tests (or both).

    Furthermore, we added boxplots for each of the data sets (i.e., ALL and DISJ) showing the scores of unit and integration tests for each defect type.


    Analysis scripts
    Requirements:
    - python3.5
    - tabulate
    - scipy
    - seaborn
    - mongoengine
    - pycoshark
    - pandas
    - matplotlib

    Both python files contain all code for the statistical analysis we performed.

    Data Collection Tools
    We provide all data collection tools that we have implemented and used throughout our paper. Overall it contains six different projects and one python script:

    • BugFixClassifier: Used to classify our defects.
    • comfort-core: Core of the comfort framework. Used to classify our tests into unit and integration tests and calculate different metrics for these tests.
    • comfort-jacoco-listner: Used to intercept the coverage collection process as we were executing the tests of our case study projects.
    • issueSHARK: Used to collect data from the ITSs of the projects.
    • pycoSHARK: Library that contains models for the used ORM mapper that is used insight the SmartSHARK environment.
    • vcsSHARK: Used to collect data from the VCSs of the projects.

  9. Agent-Based Reinforcement Learning Model of Burglary (ARLMB) datasets for...

    • figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sedar Olmez (2023). Agent-Based Reinforcement Learning Model of Burglary (ARLMB) datasets for article: Learning the Rational Choice Perspective: A Reinforcement Learning Approach to Simulating Offender Behaviours in Criminological Agent-Based Models [Dataset]. http://doi.org/10.6084/m9.figshare.20418735.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sedar Olmez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data deposit contains synthetically generated crime data from the Agent-Based Reinforcement Learning Model of Burglary developed for the research article: Learning the Rational Choice Perspective: A Reinforcement Learning Approach to Simulating Offender Behaviours in Criminological Agent-Based Models

    The data directory is as follows:

    Model/

    Data_Analysis_Notebook.ipynb MC1_Data MC2_Data MC3_Data

    The Data_Analysis_Notebook.ipynb is the jupyter notebook used to produce the analysis within the article. This notebook requires python 3.* with packages such as matplotlib, seaborn, numpy, pandas, plotly, scipy to run.

    The MC1, MC2 and MC3 folders contain the .txt files containing the data outputs used for analysis in the article. Where MC1 = Experiment Condition 1 in the article.

    Each column of the data is described as follows:

    AgentID: A unique agent identifier. Action: The current action an agent has chosen can be one of [OFFEND, DON'T OFFEND, MOVE]. Area: The locality in which the above action has taken place. Target_Attractiveness: The target attractiveness value of the property that has been victimised. Target_Reward: The reward at the property that has been victimised. Target_Risk: The risk surrounding the property that has been victimised. Target_Effort: The effort of the property victimised by the specific offender agent. Total_Cumulative_Reward: The total sum of Target_Attractiveness acquired by the offender agent. xAxisPos: The x-axis position of the cell the offender agent is currently in. zAxisPos: The y-axis position of the cell the offender agent is currently in. Zone_Travelled_To: The locality the offender agent is currently travelling towards. Episode: The current episode. Distance_To_Home: The normalised Euclidean distance to the offender agent's home node from the current victimised target. Distance_To_Next_Node: The normalised Euclidean distance to the next routine activity node from the current victimised target. Timestep: The current discrete time point. Target_Cumulative_Reward: The total amount of Target_Attractiveness the offender agent wants to achieve.

  10. Z

    Supplementary material: Burial Analysis on the Middle Bronze Age in the...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laabs, Julian (2024). Supplementary material: Burial Analysis on the Middle Bronze Age in the Carpathian Basin (dataset and scripts) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7355008
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset authored and provided by
    Laabs, Julian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pannonian Basin
    Description

    This is the supplementary material of the paper "Wealth Consumption, Sociopolitical Organization, and Change: A Perspective from Burial Analysis on the Middle Bronze Age in the Carpathian Basin" (accessible over doi: https://doi.org/10.1515/opar-2022-0281). Please consult the publication for in depth description of the data, its context and for the method applied on the data, as well as references to primary sources. The data tables comprise the burial data of the Hungarian Middle Bronze Age cemeteries of Dunaújváros-Duna-dűlő, Dömsöd, Adony, Lovasberény, Csanytelek-Palé, Kelebia, Hernádkak, Gelej, Pusztaszikszó and Streda nad Bodrogom. The script "supplementary_material_2_wealth_index_calculation.py" provides the calculation of a wealth index, based on grave goods, for the provided data. The script "supplementary_material_3_population_estimation.py" models the living population of Dunaújváros-Duna-dűlő. Both can be run by double-click. Requirements to be installed to run the scripts: Python 3 (https://www.python.org/) with the packages numpy (https://numpy.org/), pandas (https://pandas.pydata.org/), matplotlib (https://matplotlib.org/), seaborn (https://seaborn.pydata.org/) and scipy (https://scipy.org/); all included in Ancaonda (Python-Distribution, https://www.anaconda.com/).

  11. A Waiter's Tips

    • kaggle.com
    Updated Mar 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe Young (2019). A Waiter's Tips [Dataset]. https://www.kaggle.com/jsphyg/tipping/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Joe Young
    Description

    Context

    Ever wonder how much to tip your waiter? One dedicated waiter meticulously recorded information about each tip he received over a few months while working at a restaurant. In total, he documented 244 tips. Now, the challenge is: can you predict the tip amount?

    Stacking and ensembling techniques seem to work wonders with this dataset!

    Acknowledgements

    This dataset is a treasure trove of information from a collection of case studies for business statistics. Special thanks to Bryant and Smith for their diligent work:

    Bryant, P. G. and Smith, M (1995) Practical Data Analysis: Case Studies in Business Statistics. Homewood, IL: Richard D. Irwin Publishing.

    You can also access this dataset now through the Python package Seaborn. Happy tipping (prediction)!

  12. u

    Analysis of network performance when confirmed traffic is present in Long...

    • researchdata.up.ac.za
    zip
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaco Marais; Gerhardus Hancke; Adnan Abu Mahfouz (2024). Analysis of network performance when confirmed traffic is present in Long Range Wide Area Networks (LoRaWANs) [Dataset]. http://doi.org/10.25403/UPresearchdata.22113050.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    University of Pretoria
    Authors
    Jaco Marais; Gerhardus Hancke; Adnan Abu Mahfouz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quantitative data of figures and graphing scripts from the thesis titled 'Developing a congestion management scheme to reduce the impact of congestion in mixed traffic LoRaWANs'. The files contain the processed output of simulations conducted with a modified version of the ns-3 plugin lorawan. Processed simulation output was Pandas dataframes stored in text files. Software used: ns-3 (version 3.30), Jupyter notebooks, Python with packages sem, pandas, seaborn, modified version of lorawan module from signetlabdei. Python scripts refer to Std and Ex, std refers to the standard LoRaWAN module and Ex refers to the Extended version of the module with the algorithms presented in the thesis. Text files contain a legend at the top of all of the fields present in the dataframe.

  13. Data from: Can LLMs Replace Manual Annotation of Software Engineering...

    • zenodo.org
    pdf, text/x-python +1
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blinded; Blinded (2024). Can LLMs Replace Manual Annotation of Software Engineering Artifacts? [Dataset]. http://doi.org/10.5281/zenodo.13917054
    Explore at:
    zip, text/x-python, pdfAvailable download formats
    Dataset updated
    Oct 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Blinded; Blinded
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Required Libraries

    The following libraries are required to run the scripts in this repository. You can install them using `pip`:

    ```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai

    Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:

    • openai
    • anthropic
    • together

    All the experiments were done using python 3.10.11

    For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.

    File Description:

    1. data_result: This folder contains the file with the dataset and few-shot samples. After running process.py, all the results will be accumuted to data_result folder. Note that this folder is already containing all the data and model generated results in .jsonl fomat files. You do not need to run process.py to generate the results.
    2. Plots: This folder is containing the generated plots which can be generated by running heatmap.py and ira_sample.py.
    3. process.py: This file will generate the results/annotations from the model based on the given parameters. We have shared the necessary command to run this file at the bottom. Note that you need API keys from different organizations to run the script. However, we have shared all the model generated results on data_result folder.
    4. heatmap.py: Running this file will generate the heatmap that we presented from Figure 1-5 in the paper. The generated plots will be stored in "Plots" folder.
    5. ira_sample.py: Running this file will generate the plots that we presented from Figure 7-10 in the paper. The generated plots will be stored in "Plots" folder.

    Commands for datasets (Except Code Summarization):

    Generating samples for different models:

    python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    For Figure (1-5):

    python heatmap.py

    For Figure (7-10):

    python ira_sample.py

    Commands for datasets (Code Summarization):

    python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    For Figure (1-5):

    python heatmap.py

    For Figure (7-10):

    python ira_sample.py

    What="accurate", "adequate", "concise", "similarity"

    For Figure 6:

    python scatter.py

    For Figure 12 & 13, please copy majority.py and probability.py outside the shared folders.

    For Figure 12:

    python probability.py

    For Figure 6:

    python majority.py

    We also provided sample prompts from all datasets in Prompts.pdf

  14. Bird Migration Dataset (Data Visualization / EDA)

    • kaggle.com
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahir Maharaj (2025). Bird Migration Dataset (Data Visualization / EDA) [Dataset]. https://www.kaggle.com/datasets/sahirmaharajj/bird-migration-dataset-data-visualization-eda/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahir Maharaj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains 10,000 synthetic records simulating the migratory behavior of various bird species across global regions. Each entry represents a single bird tagged with a tracking device and includes detailed information such as flight distance, speed, altitude, weather conditions, tagging information, and migration outcomes.

    The data was entirely synthetically generated using randomized yet realistic values based on known ranges from ornithological studies. It is ideal for practicing data analysis and visualization techniques without privacy concerns or real-world data access restrictions. Because it’s artificial, the dataset can be freely used in education, portfolio projects, demo dashboards, machine learning pipelines, or business intelligence training.

    With over 40 columns, this dataset supports a wide array of analysis types. Analysts can explore questions like “Do certain species migrate in larger flocks?”, “How does weather impact nesting success?”, or “What conditions lead to migration interruptions?”. Users can also perform geospatial mapping of start and end locations, cluster birds by behavior, or build time series models based on migration months and environmental factors.

    For data visualization, tools like Power BI, Python (Matplotlib/Seaborn/Plotly), or Excel can be used to create insightful dashboards and interactive charts.

    Join the Fabric Community DataViz Contest | May 2025: https://community.fabric.microsoft.com/t5/Power-BI-Community-Blog/%EF%B8%8F-Fabric-Community-DataViz-Contest-May-2025/ba-p/4668560

  15. Dataset for "Fmmgen: Automatic Code Generation of Operators for Cartesian...

    • zenodo.org
    • eprints.soton.ac.uk
    application/gzip
    Updated May 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Ryan Alexander Pepper; Ryan Ryan Alexander Pepper; Hans Hans Fangohr; Hans Hans Fangohr (2020). Dataset for "Fmmgen: Automatic Code Generation of Operators for Cartesian Fast Multipole and Barnes-Hut Methods" [Dataset]. http://doi.org/10.5281/zenodo.3842584
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 25, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ryan Ryan Alexander Pepper; Ryan Ryan Alexander Pepper; Hans Hans Fangohr; Hans Hans Fangohr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository provides the dataset, build and run scripts for the paper "Fmmgen: Automatic Code Generation of Cartesian Fast Multipole and Barnes-Hut Operators", by Ryan Alexander Pepper and Hans Fangohr.


    Organisation
    The repository is organised as follows:

    sim-scripts

    sim-scripts/ contains the source code, build and run scripts for running the FMM calculations described in the paper. To reproduce the results from the paper, you need as prerequisites:
    * An installation and license of the Intel Compiler (Parallel Studio 2019 Update 3 was used for the paper).
    * An installation of the GNU compiler suite.
    * GNU Make
    * Python 3.6
    * A copy of fmmgen v1.0 (available at https://github.com/rpep/fmmgen or https://zenodo.org/record/3842591)
    * An installation of Fidimag v3.0 (available at http://github.com/computationalmodelling/fidimag or http://dx.doi.org/10.5281/zenodo.3841935)

    With these prerequisites, simply run from the sim-scripts directory:
    ```
    # To build the executables
    make build
    # To run the studies
    make run
    ```

    The four scripts are:
    * run-harmonic-cse-comparison.sh - Runs the Fast Multipole Method for 50000 Coulomb particles, varying the order of expansion, and evaluating the performance benefits of various optimisation strategies introduced in the code generation stage.

    * run-scaling-comparison.sh - Runs comparisons between the Barnes-Hut and Fast Multipole Methods for different expansion orders and values of theta, the 'opening angle' parameter.

    * run-error-comparison.sh - Runs the FMM and Barnes-Hut calculations, saving the fields and performing the direct calculation, allowing evaluation of the errors for the two methods.

    * run-fidimag-tests.sh - Runs the Fidimag scaling tests for a series of magnetic dipoles placed on a lattice.

    results

    results contains the output data from the sim-scripts scripts. Note: running the scripts will overwrite this data!


    figure-scripts

    This contains Python scripts needed to reproduce the figures from the paper. These generated figures are included in the repository for convenience. To run these scripts, you require:

    * Python >= 3.6
    * Matplotlib >= 3.1.1
    * Seaborn >= 0.9.1
    * NumPy >= 1.17.1

    figures

    This contains the output figures included in the paper.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ranjeet Jain (2018). seaborn_tips_dataset [Dataset]. https://www.kaggle.com/datasets/ranjeetjain3/seaborn-tips-dataset
Organization logo

seaborn_tips_dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ranjeet Jain
Description

Dataset

This dataset was created by Ranjeet Jain

Contents

Search
Clear search
Close search
Google apps
Main menu