52 datasets found

CPU and GPU Stats
kaggle.com
zip
Updated Jan 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Baraa Zaid (2023). CPU and GPU Stats [Dataset]. https://www.kaggle.com/datasets/baraazaid/cpu-and-gpu-stats
Explore at:
zip(81304 bytes)Available download formats
Dataset updated
Jan 10, 2023
Authors
Baraa Zaid
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Techpowerup datasets

Dataset consists of two datasets, cpus and gpus scraped using Python scrapy from https://www.techpowerup.com/.

CPU Dataset

The CPU dataset contains information about various CPU models and their specifications. The dataset includes the following columns:

Name: The name of the CPU model.

Codename: The codename used by the manufacturer for the CPU model.

Cores: The number of cores in the CPU.

Clock: The base clock speed of the CPU, measured in GHz.

Socket: The socket type that the CPU is compatible with.

Process: The manufacturing process used to create the CPU, measured in nanometers.

L3 Cache: The size of the L3 cache in the CPU, measured in MB.

TDP: The thermal design power of the CPU, measured in watts.

Released: The release date of the CPU.

GPU Dataset

The GPU dataset contains information about various GPU models and their specifications. The dataset includes the following columns:

Product_Name: The name of the GPU model.

GPU_Chip: The GPU Chip that is used in the GPU Model

Released: The release date of the GPU.

Bus: The bus width of the GPU.

Memory: The memory capacity of the GPU, measured in GB.

GPU_clock: The base clock speed of the GPU, measured in MHz.

Memory_clock: The memory clock speed of the GPU, measured in MHz.

Shaders_TMUs_ROPs: The number of shaders, texture mapping units, and raster operations pipelines in the GPU.

Both of the datasets are useful for comparing the performance and features of different CPU and GPU models. They can be used for a variety of applications such as gaming, content creation, AI, Machine learning, and more. It could be used by researchers to study the evolution of the technology in a specific period of time and make predictions for future advancements. It could also be used by professionals in the tech industry, to make informed decisions when choosing components for a build or a system.

The code for the scraper can be found here
UNLIMITED GPU
kaggle.com
zip
Updated Apr 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
gpulab (2021). UNLIMITED GPU [Dataset]. https://www.kaggle.com/datasets/jamessteinman/unlimited-gpu
Explore at:
zip(9627 bytes)Available download formats
Dataset updated
Apr 7, 2021
Authors
gpulab
Description
Dataset

This dataset was created by gpulab

Contents
🟩NVIDIA & AMD🟥 GPUs Full Specs💠
kaggle.com
zip
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
💥Alien💥 (2024). 🟩NVIDIA & AMD🟥 GPUs Full Specs💠 [Dataset]. https://www.kaggle.com/datasets/alanjo/graphics-card-full-specs/code
Explore at:
zip(70870 bytes)Available download formats
Dataset updated
Aug 27, 2024
Authors
💥Alien💥
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Related Dataset: GPU Benchmarks Compilation

Context

A Graphics Card is nothing more than another processor that is specially design and made to handle graphics. These are referred to as a Graphics Processing Unit (GPU). Adding one of these to your computer will take the load of processing graphics away from your CPU, allowing your CPU to handle other tasks. Due to the detail and sheer amount of graphics in modern games, a GPU is a must to play these games smoothly.

Content

When choosing a GPU, it’s important to take note of individual specs and to also make sure that the other components in your build are compatible.

Bus Interface

Memory Size (VRAM)

Memory Bus Width

Memory Type

GPU Clock Speed

Memory Clock Speed

Unified Shaders

Texture Mapping Units

Render Output Units

and more!

Acknowledgements

Web scraped from TechPowerUp, sourced from NVIDIA, AMD and Intel official websites

If you enjoyed this dataset, here's some similar datasets you may like 😎
lightgbm420-cuda
kaggle.com
zip
Updated Jan 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikhail Golubchik (2024). lightgbm420-cuda [Dataset]. https://www.kaggle.com/datasets/mikhailgolubchik/lightgbm420-cuda
Explore at:
zip(56261508 bytes)Available download formats
Dataset updated
Jan 23, 2024
Authors
Mikhail Golubchik
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Lightgbm-4.2.0 compliled with cuda can be installed this way

!pip uninstall -y lightgbm !pip install /kaggle/input/lightgbm420-cuda/lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl import lightgbm as lgb

lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl was compliled with cuda for kaggle:

!pip uninstall -y lightgbm !pip install \ --no-binary lightgbm \ --config-settings=cmake.define.USE_CUDA=ON \ lightgbm

Then save comiled: lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl to this dataset, for faster insatll and use without internet

Version lightgbm with cuda is working more smooth with parallelization and multiply subporocess. No need to restrict n_jobs, and n_jobs could be set to None. In version lightgbm for gpu, on kaggle if sum of n_jobs exeeds 4, it work several times slower.

So Lightgbm-4.2.0 compliled with cuda works faster with parallelization and multiply subporocess.
Multi-Resolution Frames with Rendering Information
kaggle.com
zip
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2025). Multi-Resolution Frames with Rendering Information [Dataset]. https://www.kaggle.com/datasets/uhecoms/super-resolution-for-real-time-computer-graphics
Explore at:
zip(11017136044 bytes)Available download formats
Dataset updated
Feb 11, 2025
Authors
Anonymous
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Usage Instructions

The dataset is available at: https://kaggle.com/datasets/19684b7cee0ea0e51589d1a064446c2ac72e5167a3da9732f082463e2da84821

This dataset is organized into multiple traces directories, each containing data at various resolutions. Supported resolutions include 320, 640, 1280, and 1600. The dataset is provided as a compressed .zip file for ease of distribution. Detailed descriptions are provided below.

Directory Structure

The dataset is structured as follows: plaintext Dataset │ ├── traces1/ │ ├── 320/ │ │ ├── feature001.bmp │ │ ├── frame001.sim.ppm │ │ └── ... │ ├── 640/ │ ├── 1280/ │ └── 1600/ │ ├── traces2/ │ ├── 320/ │ ├── 640/ │ ├── 1280/ │ └── 1600/ │ └── ...

Each traces directory corresponds to a unique trace.

Subdirectories represent the resolution, with 320, 640, 1280, and 1600 pixels being the supported sizes.

Inside each resolution folder:

feature+number.bmp files store rendering information.

frame+number.sim.ppm files store corresponding simulation frame data.

File naming conventions ensure pixel-level alignment between rendering information and frame data.

File Naming Convention

Rendering Information:

Stored in files named in the format: feature+number.bmp.

Example: feature001.bmp, feature002.bmp, etc.

Contains the encoded information in RGB channels.

Frame Files:

Stored in files named in the format: frame+number.sim.ppm.

Example: frame001.sim.ppm, frame002.sim.ppm, etc.

Represents the raw simulation frame data.

Alignment:

Each feature+number.bmp file corresponds to a frame+number.sim.ppm file.

The number in both filenames must match to ensure pixel-level alignment.

Channel Description

R-channel: Represents object edges.

Object edges are encoded as boolean values.

A pixel is marked as an edge pixel if it meets the specified edge criteria.

G-channel: Encodes depth information.

Depth values are normalized to the range [0, 1].

B-channel: Contains normal vectors.

The values represent the angle of the normal vector relative to the camera.

The camera-facing direction (0 degrees) is mapped to 0.

The side-facing direction (90 degrees) is mapped to 1.

All values are normalized accordingly.

Raw Data Processing

In the raw version of the feature files, there may be some edge information originating from the rendering process, which includes tiles. If you want to remove these extra edges and only retain the object boundaries, you can use the provided dataenhance.py script. Note: The algorithm is not yet perfect. We are actively working on optimizing it to achieve more accurate boundary cleaning and improved overall performance. Your feedback and suggestions are valuable as we continue to refine this process.

Usage

The dataenhance.py script processes the feature files.
It cleans up and retains only the object boundaries, removing unwanted edges present due to rendering tiles.

Command Example

python dataenhance.py --input_path ./traces1/320/ --output_path ./traces1/320_clean/
tensorflow-gpu-2.6.0
kaggle.com
zip
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joni Juvonen (2021). tensorflow-gpu-2.6.0 [Dataset]. https://www.kaggle.com/datasets/qitvision/tensorflowgpu260
Explore at:
zip(465382232 bytes)Available download formats
Dataset updated
Sep 30, 2021
Authors
Joni Juvonen
Description
Dataset

This dataset was created by Joni Juvonen

Contents
Predict Student Performance: XGB + KMeans Cluster
kaggle.com
zip
Updated Apr 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos A. S. de Souza (2023). Predict Student Performance: XGB + KMeans Cluster [Dataset]. https://www.kaggle.com/datasets/carlosasdesouza/kmeansxgbpredictstudentperformance/code
Explore at:
zip(2306452 bytes)Available download formats
Dataset updated
Apr 10, 2023
Authors
Carlos A. S. de Souza
Description
Dataset

This dataset was created by Carlos A. S. de Souza

Contents
💥GPU - CUDA, Metal, OpenCL, Vulkan Scores📊
kaggle.com
zip
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
💥Alien💥 (2024). 💥GPU - CUDA, Metal, OpenCL, Vulkan Scores📊 [Dataset]. https://www.kaggle.com/datasets/alanjo/gpu-scores-with-cuda-metal-opencl-vulkan/discussion
Explore at:
zip(28855 bytes)Available download formats
Dataset updated
Aug 27, 2024
Authors
💥Alien💥
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Related Dataset: GPU Benchmarks Compilation

Context

Graphics APIs such as CUDA, Metal, OpenCL, and Vulkan, are converging to a model similar to the way GPUs are currently built. Graphics Processing Units (GPUs) are asynchronous compute units that can handle large quantities of data, such as complex mesh geometry, image textures, output frame buffers, transformation matrices, or anything you want computed. Benchmarks allow for easy comparison between multiple graphics cards by scoring their performance on a standardized series of tests, and they are useful in many instances, such as buying or building a new PC.

Content

Newest data as of Aug 27th, 2024 This dataset contains benchmarks of GPUs

The scores in the dataset were calculated from the average of all Geekbench 5 results users have uploaded to the Geekbench Browser. To make sure the results accurately reflect the average performance of each GPU, the dataset only includes GPUs with at least five unique results in the Geekbench Browser.

Article about Modern Graphics APIs: https://macfinder.co.uk/blog/2020-gpgpu-roundup-metal-vs-cuda-vs-opencl-amd-vs-nvidia/

Article contents:

AMD vs. Nvidia in 2022

Back in 2015, there was a huge performance gap between Nvidia and AMD. If you read our previous article our recommendation was “In our view, Nvidia GPUs (especially newer ones) are usually the best choice for users, with built-in CUDA support as well as strong OpenCL performance for when CUDA is not supported. The only situation in which we would recommend an AMD GPU to professionals is when they are exclusively using apps that support OpenCL and have no CUDA option”. Nowadays, whilst AMD is still ever so slightly behind when it comes to raw GPU power, the two are now much more closely aligned. So, what was once an easy decision has been made a little more difficult. Fortunately (or in some cases, unfortunately) for us, Nvidia has made this decision a little easier by cutting support for their cards in newer versions of macOS. This means that for most, the choice is between AMD and it’s ease of use and Metal prowess, or figuring out whether the hoops you’re required to jump through make the potential benefits of using an Nvidia card are worth it. Let’s take a look at the current strengths of each GPGPU framework to see what factors might impact your choice of GPU.

CUDA/Nvidia

CUDA, despite not currently being supported in macOS, is as strong as ever. The Nvidia cards that support it are powerful and CUDA is supported by the widest variety of applications. Something to keep a note of is that CUDA, unlike OpenCL, is Nvidia’s own proprietary framework. This means that unlike other open-source frameworks, CUDA is constantly being worked on by its own team and Nvidia are constantly providing resources to further this development. Having this consistent and well-resourced team is certainly positive for CUDA. So which users should go for Nvidia cards? In our opinion, due to compatibility issues, we would only recommend Nvidia cards to users who use applications that support CUDA exclusively. Some popular apps and plugins that only support CUDA are; Adobe SpeedGrade, Avid Media Composer & Motion Graphics, RED Giant Effects Suite & Magic Bullet Looks, The Foundry HIERO, NUKE, NUKEX & Mari, as well as industry favourite OTOY Octane Render.

OpenCL

OpenCL, open-source and now widely supported, bolstered by the great line up of AMD cards currently available is a very compatible and powerful GPGPU framework currently. OpenCL is available to both AMD and Nvidia GPUs. Unlike CUDA, the fact that OpenCL is open-source means it doesn’t necessarily have the same consistent development team or funding as CUDA, but with this in mind, it has certainly achieved a lot with what it does have at its disposal. It would be remiss of us to neglect to mention that Metal has in many ways rendered OpenCL a little irrelevant. Metal is supported by the same AMD cards that OpenCL performs best on and in most cases, when both frameworks are supported, Metal is the best option. However, there are a few select apps, such as Capture One, which support only OpenCL, so the framework does have a little life in it still.

Metal

The new kid on the block, but certainly not one to underestimate, Metal has been the rising star of the GPGPU scene in the last few years. Metal has sought to combine OpenCL and OpenGL in a single low-level API. As Metal is embedded within macOS at the lowest level, it’s super-efficient and provides huge performance benefits. Like CUDA, Metal has its own consistent development team and as part of Apple has access to huge resources, this means steady updates and more great things to come in the future. Currently, you’ll need an AMD card to take advantage of Metal in macOS. This i...
AI Platform Performance Dataset
kaggle.com
zip
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satya Prakash Swain (2024). AI Platform Performance Dataset [Dataset]. https://www.kaggle.com/datasets/satyaprakashswain/ai-platform-performance-dataset
Explore at:
zip(8734 bytes)Available download formats
Dataset updated
Sep 20, 2024
Authors
Satya Prakash Swain
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset compares the performance of various AI platforms across different tasks and metrics. It is designed for use in Kaggle competitions and analysis.

Columns

Platform Name: Name of the AI platform or framework

Task Type: Type of AI task (e.g., Image Classification, Natural Language Processing, Object Detection)

Dataset: Name of the benchmark dataset used

Model Architecture: The specific model architecture used for the task

Accuracy: Accuracy score for the given task (percentage)

Training Time: Time taken to train the model (in hours)

Inference Time: Time taken for inference (in milliseconds)

GPU Memory Usage: GPU memory consumed during training (in GB)

Energy Consumption: Energy consumed during training (in kWh)

Date: Date of the performance measurement

Notes

This dataset is synthetic and for demonstration purposes. Real-world performance may vary.

Performance metrics are collected under standardized conditions, but may not reflect all use cases.

Regular updates are recommended to keep the dataset current with the latest AI advancements.

Potential Uses

Comparing AI platform performance across different tasks

Analyzing trade-offs between accuracy, speed, and resource consumption

Tracking improvements in AI platforms over time

Helping data scientists choose the most suitable platform for their specific needs
Laptop-Price-In-India
kaggle.com
zip
Updated Oct 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Kaif Tahir (2023). Laptop-Price-In-India [Dataset]. https://www.kaggle.com/datasets/mohammadkaiftahir/laptop-price-in-india
Explore at:
zip(24972 bytes)Available download formats
Dataset updated
Oct 14, 2023
Authors
Mohammad Kaif Tahir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
India
Description
Certainly! Here's a brief description of each column in the laptop dataset:

Company: Description: The manufacturer or brand name of the laptop. Example Values: Dell, HP, Lenovo, Apple, Acer, Asus, etc.

TypeName: Description: The general type or category of the laptop. Example Values: Ultrabook, Notebook, Gaming, Netbook, etc.

Inches: Description: The size of the laptop screen in inches. Example Values: 13.3, 15.6, 17.3, etc.

ScreenResolution: Description: The display resolution of the laptop. Example Values: Full HD, 4K Ultra HD, HD, etc.

Cpu: Description: The central processing unit (CPU) or processor of the laptop. Example Values: Intel Core i5, AMD, Intel Core i7, etc.

Ram: Description: The random access memory (RAM) size of the laptop. Example Values: 4GB, 8GB, 16GB, etc.

Memory: Description: The storage capacity of the laptop, usually referring to the hard disk drive (HDD) or solid-state drive (SSD). Example Values: 256GB SSD, 1TB HDD, 512GB SSD, etc.

Gpu: Description: The graphics processing unit (GPU) or graphics card of the laptop. Example Values: NVIDIA, AMD, Intel HD Graphics 620, etc.

OpSys: Description: The operating system installed on the laptop. Example Values: Windows 10, macOS, Linux, etc. Weight:

Description: The weight of the laptop, often in kilograms. Example Values: 1.5 kg, 2.2 kg, 1.8 kg, etc. Price:

Description: The price of the laptop. Example Values: $80,000, €30,000, RS10,00,00, etc.

These columns provide a comprehensive overview of the key specifications and characteristics of each laptop in the dataset, enabling detailed analysis and comparison.
Computer Hardware Dataset
kaggle.com
zip
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dilshaan Sandhu (2023). Computer Hardware Dataset [Dataset]. https://www.kaggle.com/datasets/dilshaansandhu/general-computer-hardware-dataset
Explore at:
zip(273153 bytes)Available download formats
Dataset updated
Dec 19, 2023
Authors
Dilshaan Sandhu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains information about various computer hardware components and their specs. This dataset is a work in progress and will be improved in the upcoming months.

Please leave a comment if you would like to suggest a way to improve the quality of the data or would like to assist in the process of collecting this information.
llama-server-slim-GPU
kaggle.com
zip
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua Gompert (2025). llama-server-slim-GPU [Dataset]. https://www.kaggle.com/datasets/joshuagompert/llama-cpp-bin/discussion
Explore at:
zip(28023120 bytes)Available download formats
Dataset updated
Apr 22, 2025
Authors
Joshua Gompert
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
⚙️ llama-cpp GPU Server + CLI (Slim Build)

Contents:
- llama-server: GPU-accelerated HTTP inference server
- llama-cli: Lightweight terminal interface for prompt execution
- Shared libraries: libggml*.so, libllama.so

📦 Description

This dataset contains a minimal build of the llama.cpp project, compiled with CUDA acceleration and optimized for use in GPU-based Kaggle notebooks or offline environments. It includes both the llama-server binary for persistent inference and the llama-cli binary for simple one-off prompt execution.

All unnecessary examples, tests, and dev dependencies have been stripped to reduce size and loading time.

This dataset does not include any model weights. Use with a separate GGUF-formatted model file (e.g. LLaMA 4 Scout 17B 16E) mounted via another dataset.

🛠️ Usage Example (Server)

/kaggle/input/llama-cpp-server-build/llama-server \ --model /kaggle/input/llama-models/Llama-4-Scout-17B.gguf \ --ctx-size 4096 --n-gpu-layers 40 --port 8080

Then POST to http://localhost:8080/completion.

💻 Usage Example (CLI)

/kaggle/input/llama-cpp-server-build/llama-cli \ -m /kaggle/input/llama-models/Llama-4-Scout-17B.gguf \ -p "Write a haiku about GPU memory" --ctx-size 4096 --n-gpu-layers 40

📁 Included Files

llama-server llama-cli libggml-base.so libggml-cpu.so libggml-cuda.so libllama.so

🔄 Notes

Built on Kaggle with CUDA 12.5 and T4 GPU support

Works out-of-the-box in both CLI and HTTP server mode

Pair with a quantized .gguf model for inference
IceVision for CUDA11
kaggle.com
zip
Updated Dec 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron B. (2021). IceVision for CUDA11 [Dataset]. https://www.kaggle.com/abee82/icevision
Explore at:
zip(4278676246 bytes)Available download formats
Dataset updated
Dec 23, 2021
Authors
Aaron B.
Description
Dataset

This dataset was created by Aaron B.

Contents
TabPFN
kaggle.com
zip
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Inzhirov (2023). TabPFN [Dataset]. https://www.kaggle.com/datasets/neutrino404/tabpfn
Explore at:
zip(95945799 bytes)Available download formats
Dataset updated
Jun 14, 2023
Authors
Mark Inzhirov
Description
Use this data set when submitting code offline for competitions otherwise just use !pip install tabpfn for online use. Usage for offline code submissions within Kaggle notebooks is as follows:

1**.First add the dataset by selecting "add data" and searching for this dataset and adding it to your input. **

2.**Next add the following code to a code block in your notebook ** python !pip install tabpfn --no-index --find-links=file:///kaggle/input/tabpfn !mkdir -p /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff !cp /kaggle/input/tabpfn/prior_diff_real_checkpoint_n_0_epoch_100.cpkt /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff/ 3.** Import** :
from tabpfn import TabPFNClassifier

4.**Now you are all set you can create a classifier and run it offline for submission in offline kaggle code competitions:** python classifier = TabPFNClassifier(device='cpu',N_ensemble_configurations=64) classifier.fit(X_train, Y_train) y_eval, p_eval = classifier.predict(X_cv, return_winning_probability=True)

If you want to use TabPFN with GPU use the following code when you make the model: classifier = TabPFNClassifier(device='cuda',N_ensemble_configurations=32)

You can find documentation for this package on GitHub: https://github.com/automl/TabPFN.git Original paper on TabPFN can be found at: https://arxiv.org/abs/2207.01848 License Copyright 2022 Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
BUTTER-E: Energy Data for Deep Learning Models
kaggle.com
zip
Updated Jan 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavan Kumar S (2025). BUTTER-E: Energy Data for Deep Learning Models [Dataset]. https://www.kaggle.com/datasets/pavankumar4757/butter-e-energy-data-for-deep-learning-models
Explore at:
zip(2940491 bytes)Available download formats
Dataset updated
Jan 11, 2025
Authors
Pavan Kumar S
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning Dataset provides node-level energy consumption data collected via watt-meters, complementing the primary BUTTER dataset. This dataset records energy consumption and performance metrics for 1,059,206 experimental runs across diverse configurations of fully connected neural networks. Key attributes include:

1.timestamp: The precise time of the energy consumption measurement. 2.node:The hardware node identifier (e.g., r103u05) where the experiment was conducted. 3.watts: The energy consumption (in watts) recorded for the corresponding node at the given timestamp.

Highlights Data spans 30,582 distinct configurations, including variations across 13 datasets, 20 network sizes, 8 network shapes, and 14 depths. Measurements were taken on CPU and GPU hardware, offering insights into the relationship between neural network parameters and energy consumption. The dataset provides valuable information for analyzing the energy efficiency of deep learning models, particularly in relation to cache effects, dataset size, and network architecture.

Use Cases This dataset is ideal for: Energy-efficient AI research: Understanding how energy consumption scales with model size, dataset properties, and network configurations. Performance optimization: Identifying configurations with optimal trade-offs between performance and energy usage. Sustainability analysis: Evaluating the carbon footprint of training and deploying deep learning models.
Alibaba GPU Cluster Spot Resource Dataset
kaggle.com
zip
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sultanul Ovi (2025). Alibaba GPU Cluster Spot Resource Dataset [Dataset]. https://www.kaggle.com/datasets/mdsultanulislamovi/alibaba-gpu-cluster-spot-resource-dataset
Explore at:
zip(5189979 bytes)Available download formats
Dataset updated
Aug 13, 2025
Authors
Sultanul Ovi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset provides a comprehensive trace of AI workloads running on a large-scale GPU cluster with spot resource provisioning capabilities. It captures real-world operational characteristics from a production environment, managing both high-priority workloads with strict Service Level Objectives (SLOs) and opportunistic spot workloads.

Key Characteristics

Infrastructure Scale: 4,278 GPU nodes with 6 different GPU card types

Workload Volume: 466,867 job submissions tracked

Organization Diversity: 119 unique organizations/departments

Workload Types: Mixed high-priority (HP) and spot instance workloads

🔬 Research Applications

This dataset is valuable for:

Scheduling Algorithm Development

Spot instance prediction models

Multi-resource scheduling optimization

SLO-aware preemption strategies

Cluster Design Studies

GPU provisioning optimization

Heterogeneous resource planning

Cost-performance trade-off analysis

Workload Characterization

AI/ML job pattern analysis

Organization behavior modeling

Resource demand forecasting

Economic Analysis

Spot pricing strategies

Resource allocation fairness

Cost optimization for mixed workloads

📝 Dataset Limitations and Considerations

Temporal Coverage: Observation period spans approximately 113 days

Anonymization: Organization and GPU model names are partially anonymized

Missing Metrics: No information on job success/failure rates, actual vs requested resources, or pricing

Static Infrastructure: Node configuration assumed constant throughout observation period

🎯 Recommended Analysis Extensions

Temporal Analysis: Job arrival patterns, peak usage periods, seasonal trends

Failure Analysis: Spot preemption impact on job completion

Efficiency Metrics: Resource waste, fragmentation, and utilization rates

Predictive Modeling: Spot availability forecasting, job duration prediction

Fair Sharing: Organization-level resource allocation and priority analysis

This dataset represents a significant contribution to the understanding of large-scale GPU cluster operations and spot resource management in production AI/ML environments.
faiss-gpu 1.7.3 python3.10
kaggle.com
zip
Updated May 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomoki Hirose (2023). faiss-gpu 1.7.3 python3.10 [Dataset]. https://www.kaggle.com/datasets/tomokihirose/faiss-gpu-173-python310
Explore at:
zip(85342564 bytes)Available download formats
Dataset updated
May 8, 2023
Authors
Tomoki Hirose
Description
About

This dataset is for using Faiss in an offline kernel. Packages compatible with Python 3.10 kernel are available as a dataset.

The license for Faiss is held by Meta and is released under the MIT License. https://github.com/facebookresearch/faiss/blob/main/LICENSE

Usage

!pip install -U /kaggle/input/faiss-gpu-173-python310/faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Data Scientists vs Size of Datasets
kaggle.com
zip
Updated Oct 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurae (2016). Data Scientists vs Size of Datasets [Dataset]. https://www.kaggle.com/laurae2/data-scientists-vs-size-of-datasets
Explore at:
zip(1191 bytes)Available download formats
Dataset updated
Oct 18, 2016
Authors
Laurae
Description
This research study was conducted to analyze the (potential) relationship between hardware and data set sizes. 100 data scientists from France between Jan-2016 and Aug-2016 were interviewed in order to have exploitable data. Therefore, this sample might not be representative of the true population.

What can you do with the data?

Look up whether Kagglers has "stronger" hardware than non-Kagglers

Whether there is a correlation between a preferred data set size and hardware

Is proficiency a predictor of specific preferences?

Are data scientists more Intel or AMD?

How spread is GPU computing, and is there any relationship with Kaggling?

Are you able to predict the amount of euros a data scientist might invest, provided their current workstation details?

I did not find any past research on a similar scale. You are free to play with this data set. For re-usage of this data set out of Kaggle, please contact the author directly on Kaggle (use "Contact User"). Please mention:

Your intended usage (research? business use? blogging?...)

Your first/last name

Arbitrarily, we chose characteristics to describe Data Scientists and data set sizes.

Data set size:

Small: under 1 million values

Medium: between 1 million and 1 billion values

Large: over 1 billion values

For the data, it uses the following fields (DS = Data Scientist, W = Workstation):

DS_1 = Are you working with "large" data sets at work? (large = over 1 billion values) => Yes or No

DS_2 = Do you enjoy working with large data sets? => Yes or No

DS_3 = Would you rather have small, medium, or large data sets for work? => Small, Medium, or Large

DS_4 = Do you have any presence at Kaggle or any other Data Science platforms? => Yes or No

DS_5 = Do you view yourself proficient at working in Data Science? => Yes, A bit, or No

W_1 = What is your CPU brand? => Intel or AMD

W_2 = Do you have access to a remote server to perform large workloads? => Yes or No

W_3 = How much Euros would you invest in Data Science brand new hardware? => numeric output, rounded by 100s

W_4 = How many cores do you have to work with data sets? => numeric output

W_5 = How much RAM (in GB) do you have to work with data sets? => numeric output

W_6 = Do you do GPU computing? => Yes or No

W_7 = What programming languages do you use for Data Science? => R or Python (any other answer accepted)

W_8 = What programming languages do you use for pure statistical analysis? => R or Python (any other answer accepted)

W_9 = What programming languages do you use for training models? => R or Python (any other answer accepted)

You should expect potential noise in the data set. It might not be "free" of internal contradictions, as with all researches.
Computer Parts Sales Dataset for demand forcasting
kaggle.com
zip
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Al Maruf Bin Alam (2025). Computer Parts Sales Dataset for demand forcasting [Dataset]. https://www.kaggle.com/datasets/maruf99alam/computer-parts-sales-dataset-for-demand-forcasting
Explore at:
zip(564 bytes)Available download formats
Dataset updated
Jul 19, 2025
Authors
Al Maruf Bin Alam
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
"Computer Parts Sales Dataset for demand forcasting", is an educational custom dataset. The purpose of this dataset is to provide a realistic, synthetic sample of computer hardware part sales across different regions in Bangladesh. It is designed to help data scientists, students, and analysts:

Build and train supervised machine learning models (especially linear regression)

Predict future demand for specific hardware parts such as CPUs, GPUs, RAM, etc.

Analyze sales trends across time and regions

Make stock planning decisions based on sales behavior

Practice feature engineering and demand forecasting with a structured dataset

This dataset is especially useful for educational purposes, time-series regression tasks, and retail demand modeling experiments. You will also see the result of Linear regrssion on Notebook and how I proved Linear Regression is not a good option for Randomized data.
Algerian Laptop Market Dataset (cleaned)
kaggle.com
zip
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kadouci Abdelhak (2025). Algerian Laptop Market Dataset (cleaned) [Dataset]. https://www.kaggle.com/datasets/kadouciabdelhak/algeria-laptop-price-prediction-dataset-cleaned
Explore at:
zip(5437880 bytes)Available download formats
Dataset updated
Oct 25, 2025
Authors
Kadouci Abdelhak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Algeria
Description
Algerian Laptop Market Dataset

note: this description only applies to the laptop_price_prediction_cleaned.csv file. Also, there are other versions you can check: the RAW data if you want to extract features yourself, another version without removing the nonsensical price listings.

Overview

This dataset contains information about laptop listings from the Algerian marketplace, providing insights into the local laptop market. The data was scraped from Ouedkniss, Algeria's leading classifieds platform, and cleaned through a mix of automated (LLM-assisted) and manual work over one month.

Data Source & Processing

Source: Ouedkniss.com

Collection: OuedKniss Scraper (Use at your own responsibility)

Data Cleaning

Normalized price formats (different units like Dinars, centimes, million centimes, or sometimes we refer to 23000 by 23, meaning (23*1000DA)), removed some of the listings containing weird prices (0,1,123,1111111 ..... )

Extracted missing specifications (CPU, GPU, RAM, etc.) from titles and descriptions,

Specification standardization

Created new columns such as LAPTOP_MODEL, SCREEN_FREQUENCY, SCREEN_RESOLUTION, and RAM_TYPE extracted from titles and descriptions ## Cleaning Methodology

The dataset underwent automated processing to obtain laptop_brand and laptop_model (using regular expressions), followed by the GPT-OSS-120B model to extract standardized laptop specifications (CPU, GPU, RAM, storage, display features) from unstructured product descriptions and titles. Final manual cleaning was performed to remove non-laptop listings and filter out entries with unreliable pricing information.

Notes:

I deleted "m" suffix from the dedicated GPU and CPU because it usually means that the CPU/GPU is a laptop CPU/GPU.

Facebook

Twitter

Click to copy link

Link copied

Cite

Baraa Zaid (2023). CPU and GPU Stats [Dataset]. https://www.kaggle.com/datasets/baraazaid/cpu-and-gpu-stats

CPU and GPU Stats

CPU and GPU stats and info from techpowerup

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip(81304 bytes)Available download formats

Dataset updated

Jan 10, 2023

Authors

Baraa Zaid

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Techpowerup datasets

Dataset consists of two datasets, cpus and gpus scraped using Python scrapy from https://www.techpowerup.com/.

CPU Dataset

The CPU dataset contains information about various CPU models and their specifications. The dataset includes the following columns:

Name: The name of the CPU model.
Codename: The codename used by the manufacturer for the CPU model.
Cores: The number of cores in the CPU.
Clock: The base clock speed of the CPU, measured in GHz.
Socket: The socket type that the CPU is compatible with.
Process: The manufacturing process used to create the CPU, measured in nanometers.
L3 Cache: The size of the L3 cache in the CPU, measured in MB.
TDP: The thermal design power of the CPU, measured in watts.
Released: The release date of the CPU.

GPU Dataset

The GPU dataset contains information about various GPU models and their specifications. The dataset includes the following columns:

Product_Name: The name of the GPU model.
GPU_Chip: The GPU Chip that is used in the GPU Model
Released: The release date of the GPU.
Bus: The bus width of the GPU.
Memory: The memory capacity of the GPU, measured in GB.
GPU_clock: The base clock speed of the GPU, measured in MHz.
Memory_clock: The memory clock speed of the GPU, measured in MHz.
Shaders_TMUs_ROPs: The number of shaders, texture mapping units, and raster operations pipelines in the GPU.

Both of the datasets are useful for comparing the performance and features of different CPU and GPU models. They can be used for a variety of applications such as gaming, content creation, AI, Machine learning, and more. It could be used by researchers to study the evolution of the technology in a specific period of time and make predictions for future advancements. It could also be used by professionals in the tech industry, to make informed decisions when choosing components for a build or a system.

The code for the scraper can be found here

Clear search

Close search

Google apps

Main menu

CPU and GPU Stats

Techpowerup datasets

CPU Dataset

GPU Dataset

UNLIMITED GPU

Dataset

Contents

🟩NVIDIA & AMD🟥 GPUs Full Specs💠

Related Dataset: GPU Benchmarks Compilation

Context

Content

Acknowledgements

If you enjoyed this dataset, here's some similar datasets you may like 😎

lightgbm420-cuda

Multi-Resolution Frames with Rendering Information

Dataset Usage Instructions

Directory Structure

File Naming Convention

Channel Description

Raw Data Processing

Usage

Command Example

tensorflow-gpu-2.6.0

Dataset

Contents

Predict Student Performance: XGB + KMeans Cluster

Dataset

Contents

💥GPU - CUDA, Metal, OpenCL, Vulkan Scores📊

Related Dataset: GPU Benchmarks Compilation

Context

Content

Article about Modern Graphics APIs: https://macfinder.co.uk/blog/2020-gpgpu-roundup-metal-vs-cuda-vs-opencl-amd-vs-nvidia/

Article contents:

AMD vs. Nvidia in 2022

CUDA/Nvidia

OpenCL

Metal

AI Platform Performance Dataset

Columns

Notes

Potential Uses

Laptop-Price-In-India

Computer Hardware Dataset

llama-server-slim-GPU

⚙️ llama-cpp GPU Server + CLI (Slim Build)

📦 Description

🛠️ Usage Example (Server)

💻 Usage Example (CLI)

📁 Included Files

🔄 Notes

IceVision for CUDA11

Dataset

Contents

TabPFN

BUTTER-E: Energy Data for Deep Learning Models

Alibaba GPU Cluster Spot Resource Dataset

Key Characteristics

🔬 Research Applications

📝 Dataset Limitations and Considerations

🎯 Recommended Analysis Extensions

faiss-gpu 1.7.3 python3.10

About

Usage

Data Scientists vs Size of Datasets

Computer Parts Sales Dataset for demand forcasting

Algerian Laptop Market Dataset (cleaned)

Algerian Laptop Market Dataset

Overview

Data Source & Processing

Data Cleaning

Notes:

CPU and GPU Stats

CPU and GPU stats and info from techpowerup

Techpowerup datasets

CPU Dataset

GPU Dataset