23 datasets found
  1. Vezora/Tested-188k-Python-Alpaca: Functional

    • kaggle.com
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Vezora/Tested-188k-Python-Alpaca: Functional [Dataset]. https://www.kaggle.com/datasets/thedevastator/vezora-tested-188k-python-alpaca-functional-pyth
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Vezora/Tested-188k-Python-Alpaca: Functional Python Code Dataset

    188k Functional Python Code Samples

    By Vezora (From Huggingface) [source]

    About this dataset

    The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, specifically designed for training and analysis purposes. With 188,000 samples, this dataset offers an extensive range of examples that cater to the research needs of Python programming enthusiasts.

    This valuable resource consists of various columns, including input, which represents the input or parameters required for executing the Python code sample. The instruction column describes the task or objective that the Python code sample aims to solve. Additionally, there is an output column that showcases the resulting output generated by running the respective Python code.

    By utilizing this dataset, researchers can effectively study and analyze real-world scenarios and applications of Python programming. Whether for educational purposes or development projects, this dataset serves as a reliable reference for individuals seeking practical examples and solutions using Python

    How to use the dataset

    The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, containing 188,000 samples in total. This dataset can be a valuable resource for researchers and programmers interested in exploring various aspects of Python programming.

    Contents of the Dataset

    The dataset consists of several columns:

    • output: This column represents the expected output or result that is obtained when executing the corresponding Python code sample.
    • instruction: It provides information about the task or instruction that each Python code sample is intended to solve.
    • input: The input parameters or values required to execute each Python code sample.

    Exploring the Dataset

    To make effective use of this dataset, it is essential to understand its structure and content properly. Here are some steps you can follow:

    • Importing Data: Load the dataset into your preferred environment for data analysis using appropriate tools like pandas in Python.
    import pandas as pd
    
    # Load the dataset
    df = pd.read_csv('train.csv')
    
    • Understanding Column Names: Familiarize yourself with the column names and their meanings by referring to the provided description.
    # Display column names
    print(df.columns)
    
    • Sample Exploration: Get an initial understanding of the data structure by examining a few random samples from different columns.
    # Display random samples from 'output' column
    print(df['output'].sample(5))
    
    • Analyzing Instructions: Analyze different instructions or tasks present in the 'instruction' column to identify specific areas you are interested in studying or learning about.
    # Count unique instructions and display top ones with highest occurrences
    instruction_counts = df['instruction'].value_counts()
    print(instruction_counts.head(10))
    

    Potential Use Cases

    The Vezora/Tested-188k-Python-Alpaca dataset can be utilized in various ways:

    • Code Analysis: Analyze the code samples to understand common programming patterns and best practices.
    • Code Debugging: Use code samples with known outputs to test and debug your own Python programs.
    • Educational Purposes: Utilize the dataset as a teaching tool for Python programming classes or tutorials.
    • Machine Learning Applications: Train machine learning models to predict outputs based on given inputs.

    Remember that this dataset provides a plethora of diverse Python coding examples, allowing you to explore different

    Research Ideas

    • Code analysis: Researchers and developers can use this dataset to analyze various Python code samples and identify patterns, best practices, and common mistakes. This can help in improving code quality and optimizing performance.
    • Language understanding: Natural language processing techniques can be applied to the instruction column of this dataset to develop models that can understand and interpret natural language instructions for programming tasks.
    • Code generation: The input column of this dataset contains the required inputs for executing each Python code sample. Researchers can build models that generate Python code based on specific inputs or task requirements using the examples provided in this dataset. This can be useful in automating repetitive programming tasks o...
  2. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Oct 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 10/29/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the
    `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and
    supporting analyses. This script generates the key figures and summary statistics
    used in the study that then get saved in the manuscript_figures folder. Note that all
    maps were produced using Python code found in the "supporting_code"" folder.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  3. c

    ckanext-salford

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-salford [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-salford
    Explore at:
    Dataset updated
    Jun 4, 2025
    Area covered
    Salford
    Description

    The Salford extension for CKAN is designed to enhance CKAN's functionality for specific use cases, particularly involving the management and import of datasets relevant to the Salford City Council. By incorporating custom configurations and an ETL script, this extension streamlines the process of integrating external data sources, especially from data.gov.uk, into a CKAN instance. It also provides a structured approach to configuring CKAN for specific data management needs. Key Features: Custom Plugin Integration: Enables the addition of 'salford' and 'esd' plugins to extend CKAN's core functionality, addressing specific data management requirements. Configurable Licenses Group URL: Allows administrators to specify a licenses group URL in the CKAN configuration, streamlining access to license information pertinent to the dataset. ETL Script for Data.gov.uk Import: Includes a Python script (etl.py) to import datasets specifically from the Salford City Council publisher on data.gov.uk. Non-UKLP Dataset Compatibility: The ETL script is designed to filter and import non-UKLP datasets, excluding INSPIRE datasets from the data.gov.uk import process at this time. Bower Component Installation: Simplifies asset management by providing instructions of installing bower components. Technical Integration: The Salford extension requires modifications to the CKAN configuration file (production.ini). Specifically, it involves adding salford and esd to the ckan.plugins setting, defining the licensesgroupurl, and potentially configuring other custom options. The ETL script leverages the CKAN API (ckanapi) for data import. Additionally, Bower components must be installed. Benefits & Impact: Using the Salford CKAN extension, organizations can establish a more streamlined data ingestion process tailored to Salford City Council datasets, enhance data accessibility, improve asset management and facilitate better data governance aligned with specific licensing requirements. By selectively importing datasets and offering custom plugin support, it caters to specialized data management needs.

  4. Z

    Data from: F-DATA: A Fugaku Workload Dataset for Job-centric Predictive...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antici, Francesco (2024). F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11467482
    Explore at:
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Yamamoto, Keiji
    Kiziltan, Zeynep
    Domke, Jens
    Bartolini, Andrea
    Antici, Francesco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    F-DATA is a novel workload dataset containing the data of around 24 million jobs executed on Supercomputer Fugaku, over the three years of public system usage (March 2021-April 2024). Each job data contains an extensive set of features, such as exit code, duration, power consumption and performance metrics (e.g. #flops, memory bandwidth, operational intensity and memory/compute bound label), which allows for a multitude of job characteristics prediction. The full list of features can be found in the file feature_list.csv.

    The sensitive data appears both in anonymized and encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes, without violating data privacy. The scripts used to generate the dataset are available in the F-DATA GitHub repository, along with a series of plots and instruction on how to load the data.

    F-DATA is composed of 38 files, with each YY_MM.parquet file containing the data of the jobs submitted in the month MM of the year YY.

    The files of F-DATA are saved as .parquet files. It is possible to load such files as dataframes by leveraging the pandas APIs, after installing pyarrow (pip install pyarrow). A single file can be read with the following Python instrcutions:

    Importing pandas library

    import pandas as pd

    Read the 21_01.parquet file in a dataframe format

    df = pd.read_parquet("21_01.parquet")

    df.head()

  5. Cleaned Contoso Dataset

    • kaggle.com
    Updated Aug 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanu (2023). Cleaned Contoso Dataset [Dataset]. https://www.kaggle.com/datasets/bhanuthakurr/cleaned-contoso-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhanu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data was imported from the BAK file found here into SQL Server, and then individual tables were exported as CSV. Jupyter Notebook containing the code used to clean the data can be found here

    Version 6 has a some more cleaning and structuring that was noticed after importing in Power BI. Changes were made by adding code in python notebook to export new cleaned dataset, such as adding MonthNumber for sorting by month number, similar for WeekDayNumber.

    Cleaning was done in python while also using SQL Server to quickly find things. Headers were added separately, ensuring no data loss.Data was cleaned for NaN, garbage values and other columns.

  6. T

    mnist

    • tensorflow.org
    • universe.roboflow.com
    • +3more
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    The MNIST database of handwritten digits.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">

  7. T

    fashion_mnist

    • tensorflow.org
    • opendatalab.com
    • +3more
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). fashion_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/fashion_mnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('fashion_mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">

  8. c

    ckanext-ldap

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-ldap [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-ldap
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The LDAP Authentication extension for CKAN provides a method for authenticating users against an LDAP (Lightweight Directory Access Protocol) server, enhancing security and simplifying user management. This extension allows CKAN to leverage existing LDAP infrastructure for user authentication, account information, and group management. Allowing administrators to configure CKAN to authenticate via LDAP and utilize the credentials of LDAP users. Key Features: LDAP Authentication: Enables CKAN to authenticate users against an LDAP server, using usernames and passwords stored within the LDAP directory. User Data Import: Imports user attributes, such as username, full name, email address, and description, from the LDAP directory into the CKAN user profile. Flexible LDAP Search: Supports matching against multiple LDAP fields (e.g., username, full name) using configurable search filters. An alternative search filter can be specified for cases where the primary filter returns no results, allowing for flexible matching. Combined Authentication: Allows combining LDAP authentication with basic CKAN authentication, providing a fallback mechanism for users not present in the LDAP directory. Automatic Organization Assignment: Automatically adds LDAP users to a specified organization within CKAN upon their first login, simplifying organizational role management. The role of the user within the organization can also be specified. Active Directory Support: Compatible with Active Directory, allowing seamless integration with existing Windows-based directory services. Profile Edit Restriction: Provides an option to prevent LDAP users from editing their profiles within CKAN, centralizing user data management in the LDAP directory. Password Reset Functionality: Allows LDAP users to reset their CKAN passwords (not their LDAP passwords), providing a way to recover access to their CKAN accounts. This functionality can be disabled, preventing user-initiated password resets for these accounts. Existing user migration: Facilitates migration from CKAN authentication to LDAP authentication by mapping any existing CKAN user with the same username to the login LDAP user. Referral Ignore: Ignores any referral results to avoid queries containing more than one result. Debug/Trace level option: sets the debug level of python-ldap and python-ldap trace level to allow for debugging. Technical Integration: The extension integrates with CKAN by adding 'ldap' to the list of plugins in the CKAN configuration file. It overrides the default CKAN login form, redirecting login requests to the LDAP authentication handler. Configuration options can specified in the CKAN .ini config file, including the LDAP server URI, base DN, search filters, and attribute mappings. Benefits & Impact: Implementing the LDAP Authentication extension simplifies user management by centralizing authentication within an LDAP directory, reducing administrative overhead. It enhances security by leveraging existing LDAP security policies and credentials. The extension streamlines user onboarding by automatically importing user data and assigning organizational roles, improving user experience and data consistency.

  9. Z

    Gaussian Process Model and Sensor Placement for Detroit Green...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mason, Brooke (2023). Gaussian Process Model and Sensor Placement for Detroit Green Infrastructure: Datasets and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7667043
    Explore at:
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    Kerkez, Branko
    Mason, Brooke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Detroit
    Description

    code.zip: Zip folder containing a folder titled "code" which holds:

    csv file titled "MonitoredRainGardens.csv" containing the 14 monitored green infrastructure (GI) sites with their design and physiographic features;

    csv file titled "storm_constants.csv" which contain the computed decay constants for every storm in every GI during the measurement period;

    csv file titled "newGIsites_AllData.csv" which contain the other 130 GI sites in Detroit and their design and physiographic features;

    csv file titled "Detroit_Data_MeanDesignFeatures.csv" which contain the design and physiographic features for all of Detroit;

    Jupyter notebook titled "GI_GP_SensorPlacement.ipynb" which provides the code for training the GP models and displaying the sensor placement results;

    a folder titled "MATLAB" which contains the following:

    folder titled "SFO" which contains the SFO toolbox for the sensor placement work

    file titled "sensor_placement.mlx" that contains the code for the sensor placement work

    several .mat files created in Python for importing into Matlab for the sensor placement work: "constants_sigma.mat", "constants_coords.mat", "GInew_sigma.mat", "GInew_coords.mat", and "R1_sensor.mat" through "R6_sensor.mat"

    several .mat files created in Matalb for importing into Python for visualizing the results: "MI_DETselectedGI.mat" and "DETselectedGI.mat"

  10. c

    Research Data Supporting "Nuclear Wavefunctions of Dispersion Bound Systems:...

    • repository.cam.ac.uk
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Panchagnula, Kripa; Graf, Daniel; Johnson, Erin (2025). Research Data Supporting "Nuclear Wavefunctions of Dispersion Bound Systems: Endohedral Eigenstates of Endofullerenes" [Dataset]. http://doi.org/10.17863/CAM.114388
    Explore at:
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Panchagnula, Kripa; Graf, Daniel; Johnson, Erin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the root directory containing data files, bash scripts, and Python scripts to generate the
    data for the tables and figures in my PhD thesis, titled "Nuclear Wavefunctions of Dispersion
    Bound Systems: Endohedral Eigenstates of Endofullerenes". This thesis was submitted in September
    2024, with corrections (no additional calculations) approved in December 2024. The electronic
    structure data is provided raw, as outputs from FermiONs++ and FHI-aims. The machine learned PESs
    are constructed from Python scripts. These are then used to calculate the nuclear eigenstates, which is achieved using a self written library, "EPEE" available on GitLab at
    https://gitlab.developers.cam.ac.uk/ksp31/epee.

    Author: Kripa Panchagnula
    Date: January 2025

    To run the machine learning, nuclear diagonalisation, and plotting scripts the "thesis_calcs" branch (commit SHA: 100d79600aae7668d4ceaeafc6274a89f019283c) or "main" branch (commit SHA:
    4e4d677f609028710fbc8e4f48dc4895543340db) of EPEE is required alongside NumPy, SciPy, scikit-learn, matplotlib and the "development" branch of QSym2 from https://qsym2.dev/. Any Python script importing from src is referring to the EPEE library. Each Python script must be run from within its containing directory.

    The data is separated into the following folders:
    - background/
    This folder contains a Python script to generate figures for Chapters 1-3.
    - He@C60/
    This folder contains electronic structure data from FermiONs++ with Python scripts
    to generate data for Chapter 4.
    - X@C70/
    This folder contains Python scripts to generate data for Chapter 5.
    - Ne@C70/
    This folder contains electronic structure data from FermiONs++ and FHI-aims with
    Python scripts to generate data for Chapter 6.
    - H2@C70/
    This folder contains Python scripts to generate data for Chapter 7.
    - peapods/
    This folder contains Python scripts to generate data for Chapter 8.

    Each folder contains its own README, with more details about its structure. File types include text files (.txt, .dat, .cube), scripts (.bash, .py) and NumPy compressed data files (.npz).

  11. e

    HURRECON Model for Estimating Hurricane Wind Speed, Direction, and Damage (R...

    • portal.edirepository.org
    • search.dataone.org
    zip
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emery Boose (2024). HURRECON Model for Estimating Hurricane Wind Speed, Direction, and Damage (R and Python) [Dataset]. http://doi.org/10.6073/pasta/1416f19219e3824de0c372a143a73f3e
    Explore at:
    zip(45124 byte)Available download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    EDI
    Authors
    Emery Boose
    License

    https://spdx.org/licenses/CC0-1.0https://spdx.org/licenses/CC0-1.0

    Area covered
    Earth
    Description

    The HURRECON model estimates wind speed, wind direction, enhanced Fujita scale wind damage, and duration of EF0 to EF5 winds as a function of hurricane location and maximum sustained wind speed. Results may be generated for a single site or an entire region. Hurricane track and intensity data may be imported directly from the US National Hurricane Center's HURDAT2 database. HURRECON is available in R and Python. The R version is available on CRAN as HurreconR. The model is an updated version of the original HURRECON model written in Borland Pascal for use with Idrisi (see HF025). New features include support for: (1) estimating wind damage on the enhanced Fujita scale, (2) importing hurricane track and intensity data directly from HURDAT2, (3) creating a land-water file with user-selected geographic coordinates and spatial resolution, and (4) creating plots of site and regional results. The model equations for estimating wind speed and direction, including parameter values for inflow angle, friction factor, and wind gust factor (over land and water), are unchanged from the original HURRECON model. For more details and sample datasets, see the project website on GitHub (https://github.com/hurrecon-model).

  12. Z

    Linux Kernel binary size

    • data.niaid.nih.gov
    Updated Jun 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathieu ACHER (2021). Linux Kernel binary size [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4943883
    Explore at:
    Dataset updated
    Jun 14, 2021
    Dataset provided by
    Mathieu ACHER
    Hugo MARTIN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset containing measurements of Linux Kernel binary size after compilation. The reported size, in the column "perf", is the size in bytes of the vmlinux file. In contains also a column "active_options" reporting the number of activated options (set at "y"). All other columns, the list being reported in the file "Linux_options.json", are Linux kernel options. The sampling have been made using randconfig. The version of Linux used is 4.13.3.

    Not all available options are present. First, it only contains options about the x86 and 64 bits version. Then, all non-tristate options have been ignored. Finally, options not having multiple value through the whole dataset, due to not enough variability in the sampling, are ignored. All options are encoded as 0 for "n" and "m" options value, and 1 for "y".

    In python, importing the dataset using pandas will attribute all columns to int64, which will lead to a great consumption of memory (~50GB). We provide this way to import it using less than 1 GB of memory by setting options columns to int8.

    import pandas as pd import json import numpy

    with open("Linux_options.json","r") as f: linux_options = json.load(f)

    Load csv by setting options as int8 to save a lot of memory

    return pd.read_csv("Linux.csv", dtype={f:numpy.int8 for f in linux_options})

  13. T

    tiny_shakespeare

    • tensorflow.org
    • huggingface.co
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). tiny_shakespeare [Dataset]. https://www.tensorflow.org/datasets/catalog/tiny_shakespeare
    Explore at:
    Dataset updated
    Feb 11, 2023
    Description

    40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/.

    To use for e.g. character modelling:

    d = tfds.load(name='tiny_shakespeare')['train']
    d = d.map(lambda x: tf.strings.unicode_split(x['text'], 'UTF-8'))
    # train split includes vocabulary for other splits
    vocabulary = sorted(set(next(iter(d)).numpy()))
    d = d.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]})
    d = d.unbatch()
    seq_len = 100
    batch_size = 2
    d = d.batch(seq_len)
    d = d.batch(batch_size)
    

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('tiny_shakespeare', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  14. Data from: A large synthetic dataset for machine learning applications in...

    • zenodo.org
    csv, json, png, zip
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Gillioz; Marc Gillioz; Guillaume Dubuis; Philippe Jacquod; Philippe Jacquod; Guillaume Dubuis (2025). A large synthetic dataset for machine learning applications in power transmission grids [Dataset]. http://doi.org/10.5281/zenodo.13378476
    Explore at:
    zip, png, csv, jsonAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marc Gillioz; Marc Gillioz; Guillaume Dubuis; Philippe Jacquod; Philippe Jacquod; Guillaume Dubuis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the ongoing energy transition, power grids are evolving fast. They operate more and more often close to their technical limit, under more and more volatile conditions. Fast, essentially real-time computational approaches to evaluate their operational safety, stability and reliability are therefore highly desirable. Machine Learning methods have been advocated to solve this challenge, however they are heavy consumers of training and testing data, while historical operational data for real-world power grids are hard if not impossible to access.

    This dataset contains long time series for production, consumption, and line flows, amounting to 20 years of data with a time resolution of one hour, for several thousands of loads and several hundreds of generators of various types representing the ultra-high-voltage transmission grid of continental Europe. The synthetic time series have been statistically validated agains real-world data.

    Data generation algorithm

    The algorithm is described in a Nature Scientific Data paper. It relies on the PanTaGruEl model of the European transmission network -- the admittance of its lines as well as the location, type and capacity of its power generators -- and aggregated data gathered from the ENTSO-E transparency platform, such as power consumption aggregated at the national level.

    Network

    The network information is encoded in the file europe_network.json. It is given in PowerModels format, which it itself derived from MatPower and compatible with PandaPower. The network features 7822 power lines and 553 transformers connecting 4097 buses, to which are attached 815 generators of various types.

    Time series

    The time series forming the core of this dataset are given in CSV format. Each CSV file is a table with 8736 rows, one for each hourly time step of a 364-day year. All years are truncated to exactly 52 weeks of 7 days, and start on a Monday (the load profiles are typically different during weekdays and weekends). The number of columns depends on the type of table: there are 4097 columns in load files, 815 for generators, and 8375 for lines (including transformers). Each column is described by a header corresponding to the element identifier in the network file. All values are given in per-unit, both in the model file and in the tables, i.e. they are multiples of a base unit taken to be 100 MW.

    There are 20 tables of each type, labeled with a reference year (2016 to 2020) and an index (1 to 4), zipped into archive files arranged by year. This amount to a total of 20 years of synthetic data. When using loads, generators, and lines profiles together, it is important to use the same label: for instance, the files loads_2020_1.csv, gens_2020_1.csv, and lines_2020_1.csv represent a same year of the dataset, whereas gens_2020_2.csv is unrelated (it actually shares some features, such as nuclear profiles, but it is based on a dispatch with distinct loads).

    Usage

    The time series can be used without a reference to the network file, simply using all or a selection of columns of the CSV files, depending on the needs. We show below how to select series from a particular country, or how to aggregate hourly time steps into days or weeks. These examples use Python and the data analyis library pandas, but other frameworks can be used as well (Matlab, Julia). Since all the yearly time series are periodic, it is always possible to define a coherent time window modulo the length of the series.

    Selecting a particular country

    This example illustrates how to select generation data for Switzerland in Python. This can be done without parsing the network file, but using instead gens_by_country.csv, which contains a list of all generators for any country in the network. We start by importing the pandas library, and read the column of the file corresponding to Switzerland (country code CH):

    import pandas as pd
    CH_gens = pd.read_csv('gens_by_country.csv', usecols=['CH'], dtype=str)

    The object created in this way is Dataframe with some null values (not all countries have the same number of generators). It can be turned into a list with:

    CH_gens_list = CH_gens.dropna().squeeze().to_list()

    Finally, we can import all the time series of Swiss generators from a given data table with

    pd.read_csv('gens_2016_1.csv', usecols=CH_gens_list)

    The same procedure can be applied to loads using the list contained in the file loads_by_country.csv.

    Averaging over time

    This second example shows how to change the time resolution of the series. Suppose that we are interested in all the loads from a given table, which are given by default with a one-hour resolution:

    hourly_loads = pd.read_csv('loads_2018_3.csv')

    To get a daily average of the loads, we can use:

    daily_loads = hourly_loads.groupby([t // 24 for t in range(24 * 364)]).mean()

    This results in series of length 364. To average further over entire weeks and get series of length 52, we use:

    weekly_loads = hourly_loads.groupby([t // (24 * 7) for t in range(24 * 364)]).mean()

    Source code

    The code used to generate the dataset is freely available at https://github.com/GeeeHesso/PowerData. It consists in two packages and several documentation notebooks. The first package, written in Python, provides functions to handle the data and to generate synthetic series based on historical data. The second package, written in Julia, is used to perform the optimal power flow. The documentation in the form of Jupyter notebooks contains numerous examples on how to use both packages. The entire workflow used to create this dataset is also provided, starting from raw ENTSO-E data files and ending with the synthetic dataset given in the repository.

    Funding

    This work was supported by the Cyber-Defence Campus of armasuisse and by an internal research grant of the Engineering and Architecture domain of HES-SO.

  15. Dioptra Test Platform

    • catalog.data.gov
    • data.nist.gov
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). Dioptra Test Platform [Dataset]. https://catalog.data.gov/dataset/dioptra-test-platform
    Explore at:
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Source code, documentation, and examples of use of the source code for the Dioptra Test Platform.Dioptra is a software test platform for assessing the trustworthy characteristics of artificial intelligence (AI). Trustworthy AI is: valid and reliable, safe, secure and resilient, accountable and transparent, explainable and interpretable, privacy-enhanced, and fair - with harmful bias managed1. Dioptra supports the Measure function of the NIST AI Risk Management Framework by providing functionality to assess, analyze, and track identified AI risks.Dioptra provides a REST API, which can be controlled via an intuitive web interface, a Python client, or any REST client library of the user's choice for designing, managing, executing, and tracking experiments. Details are available in the project documentation available at https://pages.nist.gov/dioptra/.Use CasesWe envision the following primary use cases for Dioptra:- Model Testing: -- 1st party - Assess AI models throughout the development lifecycle -- 2nd party - Assess AI models during acquisition or in an evaluation lab environment -- 3rd party - Assess AI models during auditing or compliance activities- Research: Aid trustworthy AI researchers in tracking experiments- Evaluations and Challenges: Provide a common platform and resources for participants- Red-Teaming: Expose models and resources to a red team in a controlled environmentKey PropertiesDioptra strives for the following key properties:- Reproducible: Dioptra automatically creates snapshots of resources so experiments can be reproduced and validated- Traceable: The full history of experiments and their inputs are tracked- Extensible: Support for expanding functionality and importing existing Python packages via a plugin system- Interoperable: A type system promotes interoperability between plugins- Modular: New experiments can be composed from modular components in a simple yaml file- Secure: Dioptra provides user authentication with access controls coming soon- Interactive: Users can interact with Dioptra via an intuitive web interface- Shareable and Reusable: Dioptra can be deployed in a multi-tenant environment so users can share and reuse components

  16. IN2020_E01 Tasmanian Coast Bathymetry 10m - 210m Multi-resolution AusSeabed...

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davina Gifford; Bernadette Heaney; Cisco Navidad; Phil Vandenbossche; Chris Berry; Amy Nau; Jason Fazey (2024). IN2020_E01 Tasmanian Coast Bathymetry 10m - 210m Multi-resolution AusSeabed products [Dataset]. http://doi.org/10.25919/17WM-DK49
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Davina Gifford; Bernadette Heaney; Cisco Navidad; Phil Vandenbossche; Chris Berry; Amy Nau; Jason Fazey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 29, 2020 - Aug 6, 2020
    Area covered
    Description

    This layer group describes multibeam echosounder data collected on RV Investigator voyage IN2020_E01 titled "Trials and Calibration". The voyage took place between July 29 and August 6, 2020, departing from Hobart (TAS) and arriving in Hobart (TAS).

    The purpose of this voyage was to undertake post port-period equipment calibrations and commissioning, sea trials as well as personnel training.

    This dataset is published with the permission of CSIRO. Not to be used for navigational purposes.

    The dataset contains bathymetry grids of 10m to 210m resolution of the Tasmanian Coast produced from the processed EM122 and EM710 bathymetry data. Lineage: Multibeam data was logged from the EM’s in Kongsberg’s proprietary *.all format and was converted to be processed within CARIS HIPS and SIPS version 10.4. Initial data conversion and processing was performed using the GSM python batch utility, with the manual method used for importing patch test data. Once the raw files were converted into the HIPS and SIPS format, the data was analysed for noise. With the exception of EM710 reference surface lines (files 0083, 0085, 0087, 0090, 0092, 0096, 0098, 0100 & 0103) and EM710 patch test lines (files 0031, 0033, 0035, 0037, 0040, 0042, 0044) that had GPS tide applied, no tide was applied to the remaining lines. All lines were merged using the vessel file appropriate for either the EM122 or EM710. Because the angular offsets were zeroed in SIS prior to the EM710 and EM122 patch tests, the vessel files for each were edited to apply the calibration values to those lines.

    The data was then gridded at the highest resolution possible and further inspected for outliers.

    The data was then gridded at multiple resolutions in python Caris batch script using a Depth filter Vs Resolution guideline derived from AusSeabed Multibeam guidelines v2 and further inspected for outliers. Final raster products are available in L3 folder of this collection. Final processed data were also exported per line as GSF and ASCII format and available in the L2 folder of this collection.

  17. T

    qm9

    • tensorflow.org
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). qm9 [Dataset]. http://doi.org/10.6084/m9.figshare.c.978904.v5
    Explore at:
    Dataset updated
    Dec 11, 2024
    Description

    QM9 consists of computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of C, H, O, N, and F. As usual, we remove the uncharacterized molecules and provide the remaining 130,831.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('qm9', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  18. Korean Translation Dataset for NLP Models

    • kaggle.com
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Korean Translation Dataset for NLP Models [Dataset]. https://www.kaggle.com/datasets/thedevastator/korean-translation-dataset-for-nlp-models
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    캐글http://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Korean Translation Dataset for NLP Models

    Translated Instructions and Input-Output Pairs in Korean

    By nlpai-lab (From Huggingface) [source]

    About this dataset

    This dataset provides a collection of translations from English to Korean for NLP models such as GPT4ALL, Dolly, and Vicuna Data. The translations were generated using the DeepL API. It contains three columns: instruction represents the instruction given to the model for the translation task, input is the input text that needs to be translated from English to Korean, and output is the corresponding translated text in Korean. The dataset aims to facilitate research and development in natural language processing tasks by providing a reliable source of translated data

    How to use the dataset

    This dataset contains Korean translations of instructions, inputs, and outputs for various NLP models including GPT4ALL, Dolly, and Vicuna Data. The translations were generated using the DeepL API.

    Description of Columns

    The dataset consists of the following columns:

    • instruction: This column contains the original instruction given to the model for the translation task.

    • input: This column contains the input text in English that needs to be translated to Korean.

    • output: This column contains the translated text in Korean.

    How to Utilize this Dataset

    You can use this dataset for various natural language processing (NLP) tasks such as machine translation or training language models specifically focused on English-Korean translation.

    Here are a few steps on how you can utilize this dataset effectively:

    • Importing Data: Load or import the provided train.csv file into your Python environment or preferred programming language.

    • Data Preprocessing: Clean and preprocess both input and output texts if needed. You may consider tokenization, removing stopwords, or any other preprocessing techniques that align with your specific task requirements.

    • Model Training: Utilize deep learning frameworks like PyTorch or TensorFlow to develop your NLP model focused on English-Korean translation using this prepared dataset as training data.

    • Evaluation & Fine-tuning: Evaluate your trained model's performance using suitable metrics such as BLEU score or perplexity measurement techniques specific to machine translation tasks. Fine-tune your model by iterating over different architectures and hyperparameters based on evaluation results until desired performance is achieved.

    • Inference & Deployment: Once you are satisfied with your trained model's performance, use it for making predictions on unseen English texts which need translation into Korean within any application where it can provide meaningful value.

    Remember that this dataset was translated using DeepL API; thus, you can leverage these translations as a starting point for your NLP projects. However, it is essential to validate and further refine the translations according to your specific use case or domain requirements.

    Good luck with your NLP projects using this Korean Translation Dataset!

    Research Ideas

    • Training and evaluating machine translation models: This dataset can be used to train and evaluate machine translation models for translating English text to Korean. The instruction column provides specific instructions given to the model, while the input column contains the English text that needs to be translated. The output column contains the corresponding translations in Korean.
    • Language learning and practice: This dataset can be used by language learners who want to practice translating English text into Korean. Users can compare their own translations with the provided translations in the output column to improve their language skills.
    • Benchmarking different translation APIs or models: The dataset includes translations generated using the DeepL API, but it can also be used as a benchmark for comparing other translation APIs or models. By comparing the performance of different systems on this dataset, researchers and developers can gain insights into the strengths and weaknesses of different translation approaches

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](https:/...

  19. T

    curated_breast_imaging_ddsm

    • tensorflow.org
    Updated Jun 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). curated_breast_imaging_ddsm [Dataset]. https://www.tensorflow.org/datasets/catalog/curated_breast_imaging_ddsm
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    The CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information.

    The default config is made of patches extracted from the original mammograms, following the description from (http://arxiv.org/abs/1708.09427), in order to frame the task to solve in a traditional image classification setting.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('curated_breast_imaging_ddsm', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/curated_breast_imaging_ddsm-patches-3.0.0.png" alt="Visualization" width="500px">

  20. Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial

    • zenodo.org
    csv
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller (2025). Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial [Dataset]. http://doi.org/10.5281/zenodo.15263830
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."

    This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Vezora/Tested-188k-Python-Alpaca: Functional [Dataset]. https://www.kaggle.com/datasets/thedevastator/vezora-tested-188k-python-alpaca-functional-pyth
Organization logo

Vezora/Tested-188k-Python-Alpaca: Functional

188k Functional Python Code Samples

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Vezora/Tested-188k-Python-Alpaca: Functional Python Code Dataset

188k Functional Python Code Samples

By Vezora (From Huggingface) [source]

About this dataset

The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, specifically designed for training and analysis purposes. With 188,000 samples, this dataset offers an extensive range of examples that cater to the research needs of Python programming enthusiasts.

This valuable resource consists of various columns, including input, which represents the input or parameters required for executing the Python code sample. The instruction column describes the task or objective that the Python code sample aims to solve. Additionally, there is an output column that showcases the resulting output generated by running the respective Python code.

By utilizing this dataset, researchers can effectively study and analyze real-world scenarios and applications of Python programming. Whether for educational purposes or development projects, this dataset serves as a reliable reference for individuals seeking practical examples and solutions using Python

How to use the dataset

The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, containing 188,000 samples in total. This dataset can be a valuable resource for researchers and programmers interested in exploring various aspects of Python programming.

Contents of the Dataset

The dataset consists of several columns:

  • output: This column represents the expected output or result that is obtained when executing the corresponding Python code sample.
  • instruction: It provides information about the task or instruction that each Python code sample is intended to solve.
  • input: The input parameters or values required to execute each Python code sample.

Exploring the Dataset

To make effective use of this dataset, it is essential to understand its structure and content properly. Here are some steps you can follow:

  • Importing Data: Load the dataset into your preferred environment for data analysis using appropriate tools like pandas in Python.
import pandas as pd

# Load the dataset
df = pd.read_csv('train.csv')
  • Understanding Column Names: Familiarize yourself with the column names and their meanings by referring to the provided description.
# Display column names
print(df.columns)
  • Sample Exploration: Get an initial understanding of the data structure by examining a few random samples from different columns.
# Display random samples from 'output' column
print(df['output'].sample(5))
  • Analyzing Instructions: Analyze different instructions or tasks present in the 'instruction' column to identify specific areas you are interested in studying or learning about.
# Count unique instructions and display top ones with highest occurrences
instruction_counts = df['instruction'].value_counts()
print(instruction_counts.head(10))

Potential Use Cases

The Vezora/Tested-188k-Python-Alpaca dataset can be utilized in various ways:

  • Code Analysis: Analyze the code samples to understand common programming patterns and best practices.
  • Code Debugging: Use code samples with known outputs to test and debug your own Python programs.
  • Educational Purposes: Utilize the dataset as a teaching tool for Python programming classes or tutorials.
  • Machine Learning Applications: Train machine learning models to predict outputs based on given inputs.

Remember that this dataset provides a plethora of diverse Python coding examples, allowing you to explore different

Research Ideas

  • Code analysis: Researchers and developers can use this dataset to analyze various Python code samples and identify patterns, best practices, and common mistakes. This can help in improving code quality and optimizing performance.
  • Language understanding: Natural language processing techniques can be applied to the instruction column of this dataset to develop models that can understand and interpret natural language instructions for programming tasks.
  • Code generation: The input column of this dataset contains the required inputs for executing each Python code sample. Researchers can build models that generate Python code based on specific inputs or task requirements using the examples provided in this dataset. This can be useful in automating repetitive programming tasks o...
Search
Clear search
Close search
Google apps
Main menu