52 datasets found
  1. CPU and GPU Stats

    • kaggle.com
    zip
    Updated Jan 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baraa Zaid (2023). CPU and GPU Stats [Dataset]. https://www.kaggle.com/datasets/baraazaid/cpu-and-gpu-stats
    Explore at:
    zip(81304 bytes)Available download formats
    Dataset updated
    Jan 10, 2023
    Authors
    Baraa Zaid
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Techpowerup datasets

    Dataset consists of two datasets, cpus and gpus scraped using Python scrapy from https://www.techpowerup.com/.

    CPU Dataset

    The CPU dataset contains information about various CPU models and their specifications. The dataset includes the following columns:

    • Name: The name of the CPU model.
    • Codename: The codename used by the manufacturer for the CPU model.
    • Cores: The number of cores in the CPU.
    • Clock: The base clock speed of the CPU, measured in GHz.
    • Socket: The socket type that the CPU is compatible with.
    • Process: The manufacturing process used to create the CPU, measured in nanometers.
    • L3 Cache: The size of the L3 cache in the CPU, measured in MB.
    • TDP: The thermal design power of the CPU, measured in watts.
    • Released: The release date of the CPU.

    GPU Dataset

    The GPU dataset contains information about various GPU models and their specifications. The dataset includes the following columns:

    • Product_Name: The name of the GPU model.
    • GPU_Chip: The GPU Chip that is used in the GPU Model
    • Released: The release date of the GPU.
    • Bus: The bus width of the GPU.
    • Memory: The memory capacity of the GPU, measured in GB.
    • GPU_clock: The base clock speed of the GPU, measured in MHz.
    • Memory_clock: The memory clock speed of the GPU, measured in MHz.
    • Shaders_TMUs_ROPs: The number of shaders, texture mapping units, and raster operations pipelines in the GPU.

    Both of the datasets are useful for comparing the performance and features of different CPU and GPU models. They can be used for a variety of applications such as gaming, content creation, AI, Machine learning, and more. It could be used by researchers to study the evolution of the technology in a specific period of time and make predictions for future advancements. It could also be used by professionals in the tech industry, to make informed decisions when choosing components for a build or a system.

    The code for the scraper can be found here

  2. UNLIMITED GPU

    • kaggle.com
    zip
    Updated Apr 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gpulab (2021). UNLIMITED GPU [Dataset]. https://www.kaggle.com/datasets/jamessteinman/unlimited-gpu
    Explore at:
    zip(9627 bytes)Available download formats
    Dataset updated
    Apr 7, 2021
    Authors
    gpulab
    Description

    Dataset

    This dataset was created by gpulab

    Contents

  3. 🟩NVIDIA & AMD🟥 GPUs Full Specs💠

    • kaggle.com
    zip
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    💥Alien💥 (2024). 🟩NVIDIA & AMD🟥 GPUs Full Specs💠 [Dataset]. https://www.kaggle.com/datasets/alanjo/graphics-card-full-specs/code
    Explore at:
    zip(70870 bytes)Available download formats
    Dataset updated
    Aug 27, 2024
    Authors
    💥Alien💥
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Related Dataset: GPU Benchmarks Compilation

    Context

    A Graphics Card is nothing more than another processor that is specially design and made to handle graphics. These are referred to as a Graphics Processing Unit (GPU). Adding one of these to your computer will take the load of processing graphics away from your CPU, allowing your CPU to handle other tasks. Due to the detail and sheer amount of graphics in modern games, a GPU is a must to play these games smoothly.

    Content

    When choosing a GPU, it’s important to take note of individual specs and to also make sure that the other components in your build are compatible.

    • Bus Interface
    • Memory Size (VRAM)
    • Memory Bus Width
    • Memory Type
    • GPU Clock Speed
    • Memory Clock Speed
    • Unified Shaders
    • Texture Mapping Units
    • Render Output Units
    • and more!

    Acknowledgements

    Web scraped from TechPowerUp, sourced from NVIDIA, AMD and Intel official websites

    If you enjoyed this dataset, here's some similar datasets you may like 😎

  4. lightgbm420-cuda

    • kaggle.com
    zip
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail Golubchik (2024). lightgbm420-cuda [Dataset]. https://www.kaggle.com/datasets/mikhailgolubchik/lightgbm420-cuda
    Explore at:
    zip(56261508 bytes)Available download formats
    Dataset updated
    Jan 23, 2024
    Authors
    Mikhail Golubchik
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Lightgbm-4.2.0 compliled with cuda can be installed this way

    !pip uninstall -y lightgbm
    !pip install /kaggle/input/lightgbm420-cuda/lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl
    import lightgbm as lgb
    

    lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl was compliled with cuda for kaggle:

    !pip uninstall -y lightgbm
    !pip install \
      --no-binary lightgbm \
      --config-settings=cmake.define.USE_CUDA=ON \
      lightgbm
    

    Then save comiled: lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl to this dataset, for faster insatll and use without internet

    Version lightgbm with cuda is working more smooth with parallelization and multiply subporocess. No need to restrict n_jobs, and n_jobs could be set to None. In version lightgbm for gpu, on kaggle if sum of n_jobs exeeds 4, it work several times slower.

    So Lightgbm-4.2.0 compliled with cuda works faster with parallelization and multiply subporocess.

  5. Multi-Resolution Frames with Rendering Information

    • kaggle.com
    zip
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2025). Multi-Resolution Frames with Rendering Information [Dataset]. https://www.kaggle.com/datasets/uhecoms/super-resolution-for-real-time-computer-graphics
    Explore at:
    zip(11017136044 bytes)Available download formats
    Dataset updated
    Feb 11, 2025
    Authors
    Anonymous
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Usage Instructions

    The dataset is available at: https://kaggle.com/datasets/19684b7cee0ea0e51589d1a064446c2ac72e5167a3da9732f082463e2da84821

    This dataset is organized into multiple traces directories, each containing data at various resolutions. Supported resolutions include 320, 640, 1280, and 1600. The dataset is provided as a compressed .zip file for ease of distribution. Detailed descriptions are provided below.

    Directory Structure

    The dataset is structured as follows: plaintext Dataset │ ├── traces1/ │ ├── 320/ │ │ ├── feature001.bmp │ │ ├── frame001.sim.ppm │ │ └── ... │ ├── 640/ │ ├── 1280/ │ └── 1600/ │ ├── traces2/ │ ├── 320/ │ ├── 640/ │ ├── 1280/ │ └── 1600/ │ └── ...

    • Each traces directory corresponds to a unique trace.
    • Subdirectories represent the resolution, with 320, 640, 1280, and 1600 pixels being the supported sizes.
    • Inside each resolution folder:
      • feature+number.bmp files store rendering information.
      • frame+number.sim.ppm files store corresponding simulation frame data.
    • File naming conventions ensure pixel-level alignment between rendering information and frame data.

    File Naming Convention

    • Rendering Information:

      • Stored in files named in the format: feature+number.bmp.
      • Example: feature001.bmp, feature002.bmp, etc.
      • Contains the encoded information in RGB channels.
    • Frame Files:

      • Stored in files named in the format: frame+number.sim.ppm.
      • Example: frame001.sim.ppm, frame002.sim.ppm, etc.
      • Represents the raw simulation frame data.
    • Alignment:

      • Each feature+number.bmp file corresponds to a frame+number.sim.ppm file.
      • The number in both filenames must match to ensure pixel-level alignment.

    Channel Description

    • R-channel: Represents object edges.

      • Object edges are encoded as boolean values.
      • A pixel is marked as an edge pixel if it meets the specified edge criteria.
    • G-channel: Encodes depth information.

      • Depth values are normalized to the range [0, 1].
    • B-channel: Contains normal vectors.

      • The values represent the angle of the normal vector relative to the camera.
      • The camera-facing direction (0 degrees) is mapped to 0.
      • The side-facing direction (90 degrees) is mapped to 1.
      • All values are normalized accordingly.

    Raw Data Processing

    In the raw version of the feature files, there may be some edge information originating from the rendering process, which includes tiles. If you want to remove these extra edges and only retain the object boundaries, you can use the provided dataenhance.py script. Note: The algorithm is not yet perfect. We are actively working on optimizing it to achieve more accurate boundary cleaning and improved overall performance. Your feedback and suggestions are valuable as we continue to refine this process.

    Usage

    The dataenhance.py script processes the feature files.
    It cleans up and retains only the object boundaries, removing unwanted edges present due to rendering tiles.

    Command Example

    python dataenhance.py --input_path ./traces1/320/ --output_path ./traces1/320_clean/
    
  6. tensorflow-gpu-2.6.0

    • kaggle.com
    zip
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joni Juvonen (2021). tensorflow-gpu-2.6.0 [Dataset]. https://www.kaggle.com/datasets/qitvision/tensorflowgpu260
    Explore at:
    zip(465382232 bytes)Available download formats
    Dataset updated
    Sep 30, 2021
    Authors
    Joni Juvonen
    Description

    Dataset

    This dataset was created by Joni Juvonen

    Contents

  7. Predict Student Performance: XGB + KMeans Cluster

    • kaggle.com
    zip
    Updated Apr 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos A. S. de Souza (2023). Predict Student Performance: XGB + KMeans Cluster [Dataset]. https://www.kaggle.com/datasets/carlosasdesouza/kmeansxgbpredictstudentperformance/code
    Explore at:
    zip(2306452 bytes)Available download formats
    Dataset updated
    Apr 10, 2023
    Authors
    Carlos A. S. de Souza
    Description

    Dataset

    This dataset was created by Carlos A. S. de Souza

    Contents

  8. 💥GPU - CUDA, Metal, OpenCL, Vulkan Scores📊

    • kaggle.com
    zip
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    💥Alien💥 (2024). 💥GPU - CUDA, Metal, OpenCL, Vulkan Scores📊 [Dataset]. https://www.kaggle.com/datasets/alanjo/gpu-scores-with-cuda-metal-opencl-vulkan/discussion
    Explore at:
    zip(28855 bytes)Available download formats
    Dataset updated
    Aug 27, 2024
    Authors
    💥Alien💥
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Related Dataset: GPU Benchmarks Compilation

    Context

    Graphics APIs such as CUDA, Metal, OpenCL, and Vulkan, are converging to a model similar to the way GPUs are currently built. Graphics Processing Units (GPUs) are asynchronous compute units that can handle large quantities of data, such as complex mesh geometry, image textures, output frame buffers, transformation matrices, or anything you want computed. Benchmarks allow for easy comparison between multiple graphics cards by scoring their performance on a standardized series of tests, and they are useful in many instances, such as buying or building a new PC.

    Content

    Newest data as of Aug 27th, 2024 This dataset contains benchmarks of GPUs

    The scores in the dataset were calculated from the average of all Geekbench 5 results users have uploaded to the Geekbench Browser. To make sure the results accurately reflect the average performance of each GPU, the dataset only includes GPUs with at least five unique results in the Geekbench Browser.

    Article about Modern Graphics APIs: https://macfinder.co.uk/blog/2020-gpgpu-roundup-metal-vs-cuda-vs-opencl-amd-vs-nvidia/

    Article contents:

    AMD vs. Nvidia in 2022

    Back in 2015, there was a huge performance gap between Nvidia and AMD. If you read our previous article our recommendation was “In our view, Nvidia GPUs (especially newer ones) are usually the best choice for users, with built-in CUDA support as well as strong OpenCL performance for when CUDA is not supported. The only situation in which we would recommend an AMD GPU to professionals is when they are exclusively using apps that support OpenCL and have no CUDA option”. Nowadays, whilst AMD is still ever so slightly behind when it comes to raw GPU power, the two are now much more closely aligned. So, what was once an easy decision has been made a little more difficult. Fortunately (or in some cases, unfortunately) for us, Nvidia has made this decision a little easier by cutting support for their cards in newer versions of macOS. This means that for most, the choice is between AMD and it’s ease of use and Metal prowess, or figuring out whether the hoops you’re required to jump through make the potential benefits of using an Nvidia card are worth it. Let’s take a look at the current strengths of each GPGPU framework to see what factors might impact your choice of GPU.

    CUDA/Nvidia

    CUDA, despite not currently being supported in macOS, is as strong as ever. The Nvidia cards that support it are powerful and CUDA is supported by the widest variety of applications. Something to keep a note of is that CUDA, unlike OpenCL, is Nvidia’s own proprietary framework. This means that unlike other open-source frameworks, CUDA is constantly being worked on by its own team and Nvidia are constantly providing resources to further this development. Having this consistent and well-resourced team is certainly positive for CUDA. So which users should go for Nvidia cards? In our opinion, due to compatibility issues, we would only recommend Nvidia cards to users who use applications that support CUDA exclusively. Some popular apps and plugins that only support CUDA are; Adobe SpeedGrade, Avid Media Composer & Motion Graphics, RED Giant Effects Suite & Magic Bullet Looks, The Foundry HIERO, NUKE, NUKEX & Mari, as well as industry favourite OTOY Octane Render.

    OpenCL

    OpenCL, open-source and now widely supported, bolstered by the great line up of AMD cards currently available is a very compatible and powerful GPGPU framework currently. OpenCL is available to both AMD and Nvidia GPUs. Unlike CUDA, the fact that OpenCL is open-source means it doesn’t necessarily have the same consistent development team or funding as CUDA, but with this in mind, it has certainly achieved a lot with what it does have at its disposal. It would be remiss of us to neglect to mention that Metal has in many ways rendered OpenCL a little irrelevant. Metal is supported by the same AMD cards that OpenCL performs best on and in most cases, when both frameworks are supported, Metal is the best option. However, there are a few select apps, such as Capture One, which support only OpenCL, so the framework does have a little life in it still.

    Metal

    The new kid on the block, but certainly not one to underestimate, Metal has been the rising star of the GPGPU scene in the last few years. Metal has sought to combine OpenCL and OpenGL in a single low-level API. As Metal is embedded within macOS at the lowest level, it’s super-efficient and provides huge performance benefits. Like CUDA, Metal has its own consistent development team and as part of Apple has access to huge resources, this means steady updates and more great things to come in the future. Currently, you’ll need an AMD card to take advantage of Metal in macOS. This i...

  9. AI Platform Performance Dataset

    • kaggle.com
    zip
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Prakash Swain (2024). AI Platform Performance Dataset [Dataset]. https://www.kaggle.com/datasets/satyaprakashswain/ai-platform-performance-dataset
    Explore at:
    zip(8734 bytes)Available download formats
    Dataset updated
    Sep 20, 2024
    Authors
    Satya Prakash Swain
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset compares the performance of various AI platforms across different tasks and metrics. It is designed for use in Kaggle competitions and analysis.

    Columns

    • Platform Name: Name of the AI platform or framework
    • Task Type: Type of AI task (e.g., Image Classification, Natural Language Processing, Object Detection)
    • Dataset: Name of the benchmark dataset used
    • Model Architecture: The specific model architecture used for the task
    • Accuracy: Accuracy score for the given task (percentage)
    • Training Time: Time taken to train the model (in hours)
    • Inference Time: Time taken for inference (in milliseconds)
    • GPU Memory Usage: GPU memory consumed during training (in GB)
    • Energy Consumption: Energy consumed during training (in kWh)
    • Date: Date of the performance measurement

    Notes

    • This dataset is synthetic and for demonstration purposes. Real-world performance may vary.
    • Performance metrics are collected under standardized conditions, but may not reflect all use cases.
    • Regular updates are recommended to keep the dataset current with the latest AI advancements.

    Potential Uses

    • Comparing AI platform performance across different tasks
    • Analyzing trade-offs between accuracy, speed, and resource consumption
    • Tracking improvements in AI platforms over time
    • Helping data scientists choose the most suitable platform for their specific needs
  10. Laptop-Price-In-India

    • kaggle.com
    zip
    Updated Oct 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Kaif Tahir (2023). Laptop-Price-In-India [Dataset]. https://www.kaggle.com/datasets/mohammadkaiftahir/laptop-price-in-india
    Explore at:
    zip(24972 bytes)Available download formats
    Dataset updated
    Oct 14, 2023
    Authors
    Mohammad Kaif Tahir
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    India
    Description

    Certainly! Here's a brief description of each column in the laptop dataset:

    Company: Description: The manufacturer or brand name of the laptop. Example Values: Dell, HP, Lenovo, Apple, Acer, Asus, etc.

    TypeName: Description: The general type or category of the laptop. Example Values: Ultrabook, Notebook, Gaming, Netbook, etc.

    Inches: Description: The size of the laptop screen in inches. Example Values: 13.3, 15.6, 17.3, etc.

    ScreenResolution: Description: The display resolution of the laptop. Example Values: Full HD, 4K Ultra HD, HD, etc.

    Cpu: Description: The central processing unit (CPU) or processor of the laptop. Example Values: Intel Core i5, AMD, Intel Core i7, etc.

    Ram: Description: The random access memory (RAM) size of the laptop. Example Values: 4GB, 8GB, 16GB, etc.

    Memory: Description: The storage capacity of the laptop, usually referring to the hard disk drive (HDD) or solid-state drive (SSD). Example Values: 256GB SSD, 1TB HDD, 512GB SSD, etc.

    Gpu: Description: The graphics processing unit (GPU) or graphics card of the laptop. Example Values: NVIDIA, AMD, Intel HD Graphics 620, etc.

    OpSys: Description: The operating system installed on the laptop. Example Values: Windows 10, macOS, Linux, etc. Weight:

    Description: The weight of the laptop, often in kilograms. Example Values: 1.5 kg, 2.2 kg, 1.8 kg, etc. Price:

    Description: The price of the laptop. Example Values: $80,000, €30,000, RS10,00,00, etc.

    These columns provide a comprehensive overview of the key specifications and characteristics of each laptop in the dataset, enabling detailed analysis and comparison.

  11. Computer Hardware Dataset

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dilshaan Sandhu (2023). Computer Hardware Dataset [Dataset]. https://www.kaggle.com/datasets/dilshaansandhu/general-computer-hardware-dataset
    Explore at:
    zip(273153 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    Dilshaan Sandhu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains information about various computer hardware components and their specs. This dataset is a work in progress and will be improved in the upcoming months.

    Please leave a comment if you would like to suggest a way to improve the quality of the data or would like to assist in the process of collecting this information.

  12. llama-server-slim-GPU

    • kaggle.com
    zip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joshua Gompert (2025). llama-server-slim-GPU [Dataset]. https://www.kaggle.com/datasets/joshuagompert/llama-cpp-bin/discussion
    Explore at:
    zip(28023120 bytes)Available download formats
    Dataset updated
    Apr 22, 2025
    Authors
    Joshua Gompert
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ⚙️ llama-cpp GPU Server + CLI (Slim Build)

    Contents:
    - llama-server: GPU-accelerated HTTP inference server
    - llama-cli: Lightweight terminal interface for prompt execution
    - Shared libraries: libggml*.so, libllama.so

    📦 Description

    This dataset contains a minimal build of the llama.cpp project, compiled with CUDA acceleration and optimized for use in GPU-based Kaggle notebooks or offline environments. It includes both the llama-server binary for persistent inference and the llama-cli binary for simple one-off prompt execution.

    All unnecessary examples, tests, and dev dependencies have been stripped to reduce size and loading time.

    This dataset does not include any model weights. Use with a separate GGUF-formatted model file (e.g. LLaMA 4 Scout 17B 16E) mounted via another dataset.

    🛠️ Usage Example (Server)

    /kaggle/input/llama-cpp-server-build/llama-server \
     --model /kaggle/input/llama-models/Llama-4-Scout-17B.gguf \
     --ctx-size 4096 --n-gpu-layers 40 --port 8080
    

    Then POST to http://localhost:8080/completion.

    💻 Usage Example (CLI)

    /kaggle/input/llama-cpp-server-build/llama-cli \
     -m /kaggle/input/llama-models/Llama-4-Scout-17B.gguf \
     -p "Write a haiku about GPU memory" --ctx-size 4096 --n-gpu-layers 40
    

    📁 Included Files

    llama-server
    llama-cli
    libggml-base.so
    libggml-cpu.so
    libggml-cuda.so
    libllama.so
    

    🔄 Notes

    • Built on Kaggle with CUDA 12.5 and T4 GPU support
    • Works out-of-the-box in both CLI and HTTP server mode
    • Pair with a quantized .gguf model for inference
  13. IceVision for CUDA11

    • kaggle.com
    zip
    Updated Dec 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron B. (2021). IceVision for CUDA11 [Dataset]. https://www.kaggle.com/abee82/icevision
    Explore at:
    zip(4278676246 bytes)Available download formats
    Dataset updated
    Dec 23, 2021
    Authors
    Aaron B.
    Description

    Dataset

    This dataset was created by Aaron B.

    Contents

  14. TabPFN

    • kaggle.com
    zip
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Inzhirov (2023). TabPFN [Dataset]. https://www.kaggle.com/datasets/neutrino404/tabpfn
    Explore at:
    zip(95945799 bytes)Available download formats
    Dataset updated
    Jun 14, 2023
    Authors
    Mark Inzhirov
    Description

    Use this data set when submitting code offline for competitions otherwise just use !pip install tabpfn for online use. Usage for offline code submissions within Kaggle notebooks is as follows:

    1**.First add the dataset by selecting "add data" and searching for this dataset and adding it to your input. **

    2.**Next add the following code to a code block in your notebook ** python !pip install tabpfn --no-index --find-links=file:///kaggle/input/tabpfn !mkdir -p /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff !cp /kaggle/input/tabpfn/prior_diff_real_checkpoint_n_0_epoch_100.cpkt /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff/ 3.** Import** :
    from tabpfn import TabPFNClassifier

    4.**Now you are all set you can create a classifier and run it offline for submission in offline kaggle code competitions:** python classifier = TabPFNClassifier(device='cpu',N_ensemble_configurations=64) classifier.fit(X_train, Y_train) y_eval, p_eval = classifier.predict(X_cv, return_winning_probability=True)

    If you want to use TabPFN with GPU use the following code when you make the model: classifier = TabPFNClassifier(device='cuda',N_ensemble_configurations=32)

    You can find documentation for this package on GitHub: https://github.com/automl/TabPFN.git Original paper on TabPFN can be found at: https://arxiv.org/abs/2207.01848 License Copyright 2022 Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

  15. BUTTER-E: Energy Data for Deep Learning Models

    • kaggle.com
    zip
    Updated Jan 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavan Kumar S (2025). BUTTER-E: Energy Data for Deep Learning Models [Dataset]. https://www.kaggle.com/datasets/pavankumar4757/butter-e-energy-data-for-deep-learning-models
    Explore at:
    zip(2940491 bytes)Available download formats
    Dataset updated
    Jan 11, 2025
    Authors
    Pavan Kumar S
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning Dataset provides node-level energy consumption data collected via watt-meters, complementing the primary BUTTER dataset. This dataset records energy consumption and performance metrics for 1,059,206 experimental runs across diverse configurations of fully connected neural networks. Key attributes include:

    1.timestamp: The precise time of the energy consumption measurement. 2.node:The hardware node identifier (e.g., r103u05) where the experiment was conducted. 3.watts: The energy consumption (in watts) recorded for the corresponding node at the given timestamp.

    Highlights Data spans 30,582 distinct configurations, including variations across 13 datasets, 20 network sizes, 8 network shapes, and 14 depths. Measurements were taken on CPU and GPU hardware, offering insights into the relationship between neural network parameters and energy consumption. The dataset provides valuable information for analyzing the energy efficiency of deep learning models, particularly in relation to cache effects, dataset size, and network architecture.

    Use Cases This dataset is ideal for: Energy-efficient AI research: Understanding how energy consumption scales with model size, dataset properties, and network configurations. Performance optimization: Identifying configurations with optimal trade-offs between performance and energy usage. Sustainability analysis: Evaluating the carbon footprint of training and deploying deep learning models.

  16. Alibaba GPU Cluster Spot Resource Dataset

    • kaggle.com
    zip
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sultanul Ovi (2025). Alibaba GPU Cluster Spot Resource Dataset [Dataset]. https://www.kaggle.com/datasets/mdsultanulislamovi/alibaba-gpu-cluster-spot-resource-dataset
    Explore at:
    zip(5189979 bytes)Available download formats
    Dataset updated
    Aug 13, 2025
    Authors
    Sultanul Ovi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset provides a comprehensive trace of AI workloads running on a large-scale GPU cluster with spot resource provisioning capabilities. It captures real-world operational characteristics from a production environment, managing both high-priority workloads with strict Service Level Objectives (SLOs) and opportunistic spot workloads.

    Key Characteristics

    • Infrastructure Scale: 4,278 GPU nodes with 6 different GPU card types
    • Workload Volume: 466,867 job submissions tracked
    • Organization Diversity: 119 unique organizations/departments
    • Workload Types: Mixed high-priority (HP) and spot instance workloads

    🔬 Research Applications

    This dataset is valuable for:

    1. Scheduling Algorithm Development

      • Spot instance prediction models
      • Multi-resource scheduling optimization
      • SLO-aware preemption strategies
    2. Cluster Design Studies

      • GPU provisioning optimization
      • Heterogeneous resource planning
      • Cost-performance trade-off analysis
    3. Workload Characterization

      • AI/ML job pattern analysis
      • Organization behavior modeling
      • Resource demand forecasting
    4. Economic Analysis

      • Spot pricing strategies
      • Resource allocation fairness
      • Cost optimization for mixed workloads

    📝 Dataset Limitations and Considerations

    1. Temporal Coverage: Observation period spans approximately 113 days
    2. Anonymization: Organization and GPU model names are partially anonymized
    3. Missing Metrics: No information on job success/failure rates, actual vs requested resources, or pricing
    4. Static Infrastructure: Node configuration assumed constant throughout observation period

    🎯 Recommended Analysis Extensions

    1. Temporal Analysis: Job arrival patterns, peak usage periods, seasonal trends
    2. Failure Analysis: Spot preemption impact on job completion
    3. Efficiency Metrics: Resource waste, fragmentation, and utilization rates
    4. Predictive Modeling: Spot availability forecasting, job duration prediction
    5. Fair Sharing: Organization-level resource allocation and priority analysis

    This dataset represents a significant contribution to the understanding of large-scale GPU cluster operations and spot resource management in production AI/ML environments.

  17. faiss-gpu 1.7.3 python3.10

    • kaggle.com
    zip
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomoki Hirose (2023). faiss-gpu 1.7.3 python3.10 [Dataset]. https://www.kaggle.com/datasets/tomokihirose/faiss-gpu-173-python310
    Explore at:
    zip(85342564 bytes)Available download formats
    Dataset updated
    May 8, 2023
    Authors
    Tomoki Hirose
    Description

    About

    This dataset is for using Faiss in an offline kernel. Packages compatible with Python 3.10 kernel are available as a dataset.

    The license for Faiss is held by Meta and is released under the MIT License. https://github.com/facebookresearch/faiss/blob/main/LICENSE

    Usage

    !pip install -U /kaggle/input/faiss-gpu-173-python310/faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

  18. Data Scientists vs Size of Datasets

    • kaggle.com
    zip
    Updated Oct 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurae (2016). Data Scientists vs Size of Datasets [Dataset]. https://www.kaggle.com/laurae2/data-scientists-vs-size-of-datasets
    Explore at:
    zip(1191 bytes)Available download formats
    Dataset updated
    Oct 18, 2016
    Authors
    Laurae
    Description

    This research study was conducted to analyze the (potential) relationship between hardware and data set sizes. 100 data scientists from France between Jan-2016 and Aug-2016 were interviewed in order to have exploitable data. Therefore, this sample might not be representative of the true population.

    What can you do with the data?

    • Look up whether Kagglers has "stronger" hardware than non-Kagglers
    • Whether there is a correlation between a preferred data set size and hardware
    • Is proficiency a predictor of specific preferences?
    • Are data scientists more Intel or AMD?
    • How spread is GPU computing, and is there any relationship with Kaggling?
    • Are you able to predict the amount of euros a data scientist might invest, provided their current workstation details?

    I did not find any past research on a similar scale. You are free to play with this data set. For re-usage of this data set out of Kaggle, please contact the author directly on Kaggle (use "Contact User"). Please mention:

    • Your intended usage (research? business use? blogging?...)
    • Your first/last name

    Arbitrarily, we chose characteristics to describe Data Scientists and data set sizes.

    Data set size:

    • Small: under 1 million values
    • Medium: between 1 million and 1 billion values
    • Large: over 1 billion values

    For the data, it uses the following fields (DS = Data Scientist, W = Workstation):

    • DS_1 = Are you working with "large" data sets at work? (large = over 1 billion values) => Yes or No
    • DS_2 = Do you enjoy working with large data sets? => Yes or No
    • DS_3 = Would you rather have small, medium, or large data sets for work? => Small, Medium, or Large
    • DS_4 = Do you have any presence at Kaggle or any other Data Science platforms? => Yes or No
    • DS_5 = Do you view yourself proficient at working in Data Science? => Yes, A bit, or No
    • W_1 = What is your CPU brand? => Intel or AMD
    • W_2 = Do you have access to a remote server to perform large workloads? => Yes or No
    • W_3 = How much Euros would you invest in Data Science brand new hardware? => numeric output, rounded by 100s
    • W_4 = How many cores do you have to work with data sets? => numeric output
    • W_5 = How much RAM (in GB) do you have to work with data sets? => numeric output
    • W_6 = Do you do GPU computing? => Yes or No
    • W_7 = What programming languages do you use for Data Science? => R or Python (any other answer accepted)
    • W_8 = What programming languages do you use for pure statistical analysis? => R or Python (any other answer accepted)
    • W_9 = What programming languages do you use for training models? => R or Python (any other answer accepted)

    You should expect potential noise in the data set. It might not be "free" of internal contradictions, as with all researches.

  19. Computer Parts Sales Dataset for demand forcasting

    • kaggle.com
    zip
    Updated Jul 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Al Maruf Bin Alam (2025). Computer Parts Sales Dataset for demand forcasting [Dataset]. https://www.kaggle.com/datasets/maruf99alam/computer-parts-sales-dataset-for-demand-forcasting
    Explore at:
    zip(564 bytes)Available download formats
    Dataset updated
    Jul 19, 2025
    Authors
    Al Maruf Bin Alam
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    "Computer Parts Sales Dataset for demand forcasting", is an educational custom dataset. The purpose of this dataset is to provide a realistic, synthetic sample of computer hardware part sales across different regions in Bangladesh. It is designed to help data scientists, students, and analysts:

    1. Build and train supervised machine learning models (especially linear regression)
    2. Predict future demand for specific hardware parts such as CPUs, GPUs, RAM, etc.
    3. Analyze sales trends across time and regions
    4. Make stock planning decisions based on sales behavior
    5. Practice feature engineering and demand forecasting with a structured dataset

    This dataset is especially useful for educational purposes, time-series regression tasks, and retail demand modeling experiments. You will also see the result of Linear regrssion on Notebook and how I proved Linear Regression is not a good option for Randomized data.

  20. Algerian Laptop Market Dataset (cleaned)

    • kaggle.com
    zip
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kadouci Abdelhak (2025). Algerian Laptop Market Dataset (cleaned) [Dataset]. https://www.kaggle.com/datasets/kadouciabdelhak/algeria-laptop-price-prediction-dataset-cleaned
    Explore at:
    zip(5437880 bytes)Available download formats
    Dataset updated
    Oct 25, 2025
    Authors
    Kadouci Abdelhak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Algeria
    Description

    Algerian Laptop Market Dataset

    note: this description only applies to the laptop_price_prediction_cleaned.csv file. Also, there are other versions you can check: the RAW data if you want to extract features yourself, another version without removing the nonsensical price listings.

    Overview

    This dataset contains information about laptop listings from the Algerian marketplace, providing insights into the local laptop market. The data was scraped from Ouedkniss, Algeria's leading classifieds platform, and cleaned through a mix of automated (LLM-assisted) and manual work over one month.

    Data Source & Processing

    Data Cleaning

    • Normalized price formats (different units like Dinars, centimes, million centimes, or sometimes we refer to 23000 by 23, meaning (23*1000DA)), removed some of the listings containing weird prices (0,1,123,1111111 ..... )
    • Extracted missing specifications (CPU, GPU, RAM, etc.) from titles and descriptions,
    • Specification standardization
    • Created new columns such as LAPTOP_MODEL, SCREEN_FREQUENCY, SCREEN_RESOLUTION, and RAM_TYPE extracted from titles and descriptions ## Cleaning Methodology
    • The dataset underwent automated processing to obtain laptop_brand and laptop_model (using regular expressions), followed by the GPT-OSS-120B model to extract standardized laptop specifications (CPU, GPU, RAM, storage, display features) from unstructured product descriptions and titles. Final manual cleaning was performed to remove non-laptop listings and filter out entries with unreliable pricing information.

    Notes:

    • I deleted "m" suffix from the dedicated GPU and CPU because it usually means that the CPU/GPU is a laptop CPU/GPU.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Baraa Zaid (2023). CPU and GPU Stats [Dataset]. https://www.kaggle.com/datasets/baraazaid/cpu-and-gpu-stats
Organization logo

CPU and GPU Stats

CPU and GPU stats and info from techpowerup

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(81304 bytes)Available download formats
Dataset updated
Jan 10, 2023
Authors
Baraa Zaid
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Techpowerup datasets

Dataset consists of two datasets, cpus and gpus scraped using Python scrapy from https://www.techpowerup.com/.

CPU Dataset

The CPU dataset contains information about various CPU models and their specifications. The dataset includes the following columns:

  • Name: The name of the CPU model.
  • Codename: The codename used by the manufacturer for the CPU model.
  • Cores: The number of cores in the CPU.
  • Clock: The base clock speed of the CPU, measured in GHz.
  • Socket: The socket type that the CPU is compatible with.
  • Process: The manufacturing process used to create the CPU, measured in nanometers.
  • L3 Cache: The size of the L3 cache in the CPU, measured in MB.
  • TDP: The thermal design power of the CPU, measured in watts.
  • Released: The release date of the CPU.

GPU Dataset

The GPU dataset contains information about various GPU models and their specifications. The dataset includes the following columns:

  • Product_Name: The name of the GPU model.
  • GPU_Chip: The GPU Chip that is used in the GPU Model
  • Released: The release date of the GPU.
  • Bus: The bus width of the GPU.
  • Memory: The memory capacity of the GPU, measured in GB.
  • GPU_clock: The base clock speed of the GPU, measured in MHz.
  • Memory_clock: The memory clock speed of the GPU, measured in MHz.
  • Shaders_TMUs_ROPs: The number of shaders, texture mapping units, and raster operations pipelines in the GPU.

Both of the datasets are useful for comparing the performance and features of different CPU and GPU models. They can be used for a variety of applications such as gaming, content creation, AI, Machine learning, and more. It could be used by researchers to study the evolution of the technology in a specific period of time and make predictions for future advancements. It could also be used by professionals in the tech industry, to make informed decisions when choosing components for a build or a system.

The code for the scraper can be found here

Search
Clear search
Close search
Google apps
Main menu