Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset consists of two datasets, cpus and gpus scraped using Python scrapy from https://www.techpowerup.com/.
The CPU dataset contains information about various CPU models and their specifications. The dataset includes the following columns:
The GPU dataset contains information about various GPU models and their specifications. The dataset includes the following columns:
Both of the datasets are useful for comparing the performance and features of different CPU and GPU models. They can be used for a variety of applications such as gaming, content creation, AI, Machine learning, and more. It could be used by researchers to study the evolution of the technology in a specific period of time and make predictions for future advancements. It could also be used by professionals in the tech industry, to make informed decisions when choosing components for a build or a system.
The code for the scraper can be found here
Facebook
TwitterThis dataset was created by gpulab
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A Graphics Card is nothing more than another processor that is specially design and made to handle graphics. These are referred to as a Graphics Processing Unit (GPU). Adding one of these to your computer will take the load of processing graphics away from your CPU, allowing your CPU to handle other tasks. Due to the detail and sheer amount of graphics in modern games, a GPU is a must to play these games smoothly.
When choosing a GPU, it’s important to take note of individual specs and to also make sure that the other components in your build are compatible.
Web scraped from TechPowerUp, sourced from NVIDIA, AMD and Intel official websites
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Lightgbm-4.2.0 compliled with cuda can be installed this way
!pip uninstall -y lightgbm
!pip install /kaggle/input/lightgbm420-cuda/lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl
import lightgbm as lgb
lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl was compliled with cuda for kaggle:
!pip uninstall -y lightgbm
!pip install \
--no-binary lightgbm \
--config-settings=cmake.define.USE_CUDA=ON \
lightgbm
Then save comiled: lightgbm-4.2.0-py3-none-manylinux_2_35_x86_64.whl to this dataset, for faster insatll and use without internet
Version lightgbm with cuda is working more smooth with parallelization and multiply subporocess. No need to restrict n_jobs, and n_jobs could be set to None. In version lightgbm for gpu, on kaggle if sum of n_jobs exeeds 4, it work several times slower.
So Lightgbm-4.2.0 compliled with cuda works faster with parallelization and multiply subporocess.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is available at: https://kaggle.com/datasets/19684b7cee0ea0e51589d1a064446c2ac72e5167a3da9732f082463e2da84821
This dataset is organized into multiple traces directories, each containing data at various resolutions. Supported resolutions include 320, 640, 1280, and 1600. The dataset is provided as a compressed .zip file for ease of distribution. Detailed descriptions are provided below.
The dataset is structured as follows:
plaintext
Dataset
│
├── traces1/
│ ├── 320/
│ │ ├── feature001.bmp
│ │ ├── frame001.sim.ppm
│ │ └── ...
│ ├── 640/
│ ├── 1280/
│ └── 1600/
│
├── traces2/
│ ├── 320/
│ ├── 640/
│ ├── 1280/
│ └── 1600/
│
└── ...
traces directory corresponds to a unique trace.320, 640, 1280, and 1600 pixels being the supported sizes.feature+number.bmp files store rendering information.frame+number.sim.ppm files store corresponding simulation frame data.Rendering Information:
feature+number.bmp.feature001.bmp, feature002.bmp, etc.Frame Files:
frame+number.sim.ppm.frame001.sim.ppm, frame002.sim.ppm, etc.Alignment:
feature+number.bmp file corresponds to a frame+number.sim.ppm file.number in both filenames must match to ensure pixel-level alignment.R-channel: Represents object edges.
G-channel: Encodes depth information.
[0, 1].B-channel: Contains normal vectors.
0.1.In the raw version of the feature files, there may be some edge information originating from the rendering process, which includes tiles. If you want to remove these extra edges and only retain the object boundaries, you can use the provided dataenhance.py script. Note: The algorithm is not yet perfect. We are actively working on optimizing it to achieve more accurate boundary cleaning and improved overall performance. Your feedback and suggestions are valuable as we continue to refine this process.
The dataenhance.py script processes the feature files.
It cleans up and retains only the object boundaries, removing unwanted edges present due to rendering tiles.
python dataenhance.py --input_path ./traces1/320/ --output_path ./traces1/320_clean/
Facebook
TwitterThis dataset was created by Joni Juvonen
Facebook
TwitterThis dataset was created by Carlos A. S. de Souza
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Graphics APIs such as CUDA, Metal, OpenCL, and Vulkan, are converging to a model similar to the way GPUs are currently built. Graphics Processing Units (GPUs) are asynchronous compute units that can handle large quantities of data, such as complex mesh geometry, image textures, output frame buffers, transformation matrices, or anything you want computed. Benchmarks allow for easy comparison between multiple graphics cards by scoring their performance on a standardized series of tests, and they are useful in many instances, such as buying or building a new PC.
Newest data as of Aug 27th, 2024 This dataset contains benchmarks of GPUs
The scores in the dataset were calculated from the average of all Geekbench 5 results users have uploaded to the Geekbench Browser. To make sure the results accurately reflect the average performance of each GPU, the dataset only includes GPUs with at least five unique results in the Geekbench Browser.
Back in 2015, there was a huge performance gap between Nvidia and AMD. If you read our previous article our recommendation was “In our view, Nvidia GPUs (especially newer ones) are usually the best choice for users, with built-in CUDA support as well as strong OpenCL performance for when CUDA is not supported. The only situation in which we would recommend an AMD GPU to professionals is when they are exclusively using apps that support OpenCL and have no CUDA option”. Nowadays, whilst AMD is still ever so slightly behind when it comes to raw GPU power, the two are now much more closely aligned. So, what was once an easy decision has been made a little more difficult. Fortunately (or in some cases, unfortunately) for us, Nvidia has made this decision a little easier by cutting support for their cards in newer versions of macOS. This means that for most, the choice is between AMD and it’s ease of use and Metal prowess, or figuring out whether the hoops you’re required to jump through make the potential benefits of using an Nvidia card are worth it. Let’s take a look at the current strengths of each GPGPU framework to see what factors might impact your choice of GPU.
CUDA, despite not currently being supported in macOS, is as strong as ever. The Nvidia cards that support it are powerful and CUDA is supported by the widest variety of applications. Something to keep a note of is that CUDA, unlike OpenCL, is Nvidia’s own proprietary framework. This means that unlike other open-source frameworks, CUDA is constantly being worked on by its own team and Nvidia are constantly providing resources to further this development. Having this consistent and well-resourced team is certainly positive for CUDA. So which users should go for Nvidia cards? In our opinion, due to compatibility issues, we would only recommend Nvidia cards to users who use applications that support CUDA exclusively. Some popular apps and plugins that only support CUDA are; Adobe SpeedGrade, Avid Media Composer & Motion Graphics, RED Giant Effects Suite & Magic Bullet Looks, The Foundry HIERO, NUKE, NUKEX & Mari, as well as industry favourite OTOY Octane Render.
OpenCL, open-source and now widely supported, bolstered by the great line up of AMD cards currently available is a very compatible and powerful GPGPU framework currently. OpenCL is available to both AMD and Nvidia GPUs. Unlike CUDA, the fact that OpenCL is open-source means it doesn’t necessarily have the same consistent development team or funding as CUDA, but with this in mind, it has certainly achieved a lot with what it does have at its disposal. It would be remiss of us to neglect to mention that Metal has in many ways rendered OpenCL a little irrelevant. Metal is supported by the same AMD cards that OpenCL performs best on and in most cases, when both frameworks are supported, Metal is the best option. However, there are a few select apps, such as Capture One, which support only OpenCL, so the framework does have a little life in it still.
The new kid on the block, but certainly not one to underestimate, Metal has been the rising star of the GPGPU scene in the last few years. Metal has sought to combine OpenCL and OpenGL in a single low-level API. As Metal is embedded within macOS at the lowest level, it’s super-efficient and provides huge performance benefits. Like CUDA, Metal has its own consistent development team and as part of Apple has access to huge resources, this means steady updates and more great things to come in the future. Currently, you’ll need an AMD card to take advantage of Metal in macOS. This i...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset compares the performance of various AI platforms across different tasks and metrics. It is designed for use in Kaggle competitions and analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Certainly! Here's a brief description of each column in the laptop dataset:
Company: Description: The manufacturer or brand name of the laptop. Example Values: Dell, HP, Lenovo, Apple, Acer, Asus, etc.
TypeName: Description: The general type or category of the laptop. Example Values: Ultrabook, Notebook, Gaming, Netbook, etc.
Inches: Description: The size of the laptop screen in inches. Example Values: 13.3, 15.6, 17.3, etc.
ScreenResolution: Description: The display resolution of the laptop. Example Values: Full HD, 4K Ultra HD, HD, etc.
Cpu: Description: The central processing unit (CPU) or processor of the laptop. Example Values: Intel Core i5, AMD, Intel Core i7, etc.
Ram: Description: The random access memory (RAM) size of the laptop. Example Values: 4GB, 8GB, 16GB, etc.
Memory: Description: The storage capacity of the laptop, usually referring to the hard disk drive (HDD) or solid-state drive (SSD). Example Values: 256GB SSD, 1TB HDD, 512GB SSD, etc.
Gpu: Description: The graphics processing unit (GPU) or graphics card of the laptop. Example Values: NVIDIA, AMD, Intel HD Graphics 620, etc.
OpSys: Description: The operating system installed on the laptop. Example Values: Windows 10, macOS, Linux, etc. Weight:
Description: The weight of the laptop, often in kilograms. Example Values: 1.5 kg, 2.2 kg, 1.8 kg, etc. Price:
Description: The price of the laptop. Example Values: $80,000, €30,000, RS10,00,00, etc.
These columns provide a comprehensive overview of the key specifications and characteristics of each laptop in the dataset, enabling detailed analysis and comparison.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about various computer hardware components and their specs. This dataset is a work in progress and will be improved in the upcoming months.
Please leave a comment if you would like to suggest a way to improve the quality of the data or would like to assist in the process of collecting this information.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Contents:
- llama-server: GPU-accelerated HTTP inference server
- llama-cli: Lightweight terminal interface for prompt execution
- Shared libraries: libggml*.so, libllama.so
This dataset contains a minimal build of the llama.cpp project, compiled with CUDA acceleration and optimized for use in GPU-based Kaggle notebooks or offline environments. It includes both the llama-server binary for persistent inference and the llama-cli binary for simple one-off prompt execution.
All unnecessary examples, tests, and dev dependencies have been stripped to reduce size and loading time.
This dataset does not include any model weights. Use with a separate GGUF-formatted model file (e.g. LLaMA 4 Scout 17B 16E) mounted via another dataset.
/kaggle/input/llama-cpp-server-build/llama-server \
--model /kaggle/input/llama-models/Llama-4-Scout-17B.gguf \
--ctx-size 4096 --n-gpu-layers 40 --port 8080
Then POST to http://localhost:8080/completion.
/kaggle/input/llama-cpp-server-build/llama-cli \
-m /kaggle/input/llama-models/Llama-4-Scout-17B.gguf \
-p "Write a haiku about GPU memory" --ctx-size 4096 --n-gpu-layers 40
llama-server
llama-cli
libggml-base.so
libggml-cpu.so
libggml-cuda.so
libllama.so
.gguf model for inference
Facebook
TwitterThis dataset was created by Aaron B.
Facebook
TwitterUse this data set when submitting code offline for competitions otherwise just use !pip install tabpfn for online use. Usage for offline code submissions within Kaggle notebooks is as follows:
1**.First add the dataset by selecting "add data" and searching for this dataset and adding it to your input. **
2.**Next add the following code to a code block in your notebook **
python
!pip install tabpfn --no-index --find-links=file:///kaggle/input/tabpfn
!mkdir -p /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff
!cp /kaggle/input/tabpfn/prior_diff_real_checkpoint_n_0_epoch_100.cpkt /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff/
3.** Import** :
from tabpfn import TabPFNClassifier
4.**Now you are all set you can create a classifier and run it offline for submission in offline kaggle code competitions:**
python
classifier = TabPFNClassifier(device='cpu',N_ensemble_configurations=64)
classifier.fit(X_train, Y_train)
y_eval, p_eval = classifier.predict(X_cv, return_winning_probability=True)
If you want to use TabPFN with GPU use the following code when you make the model:
classifier = TabPFNClassifier(device='cuda',N_ensemble_configurations=32)
You can find documentation for this package on GitHub: https://github.com/automl/TabPFN.git Original paper on TabPFN can be found at: https://arxiv.org/abs/2207.01848 License Copyright 2022 Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning Dataset provides node-level energy consumption data collected via watt-meters, complementing the primary BUTTER dataset. This dataset records energy consumption and performance metrics for 1,059,206 experimental runs across diverse configurations of fully connected neural networks. Key attributes include:
1.timestamp: The precise time of the energy consumption measurement. 2.node:The hardware node identifier (e.g., r103u05) where the experiment was conducted. 3.watts: The energy consumption (in watts) recorded for the corresponding node at the given timestamp.
Highlights Data spans 30,582 distinct configurations, including variations across 13 datasets, 20 network sizes, 8 network shapes, and 14 depths. Measurements were taken on CPU and GPU hardware, offering insights into the relationship between neural network parameters and energy consumption. The dataset provides valuable information for analyzing the energy efficiency of deep learning models, particularly in relation to cache effects, dataset size, and network architecture.
Use Cases This dataset is ideal for: Energy-efficient AI research: Understanding how energy consumption scales with model size, dataset properties, and network configurations. Performance optimization: Identifying configurations with optimal trade-offs between performance and energy usage. Sustainability analysis: Evaluating the carbon footprint of training and deploying deep learning models.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive trace of AI workloads running on a large-scale GPU cluster with spot resource provisioning capabilities. It captures real-world operational characteristics from a production environment, managing both high-priority workloads with strict Service Level Objectives (SLOs) and opportunistic spot workloads.
This dataset is valuable for:
Scheduling Algorithm Development
Cluster Design Studies
Workload Characterization
Economic Analysis
This dataset represents a significant contribution to the understanding of large-scale GPU cluster operations and spot resource management in production AI/ML environments.
Facebook
TwitterThis dataset is for using Faiss in an offline kernel. Packages compatible with Python 3.10 kernel are available as a dataset.
The license for Faiss is held by Meta and is released under the MIT License. https://github.com/facebookresearch/faiss/blob/main/LICENSE
!pip install -U /kaggle/input/faiss-gpu-173-python310/faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Facebook
TwitterThis research study was conducted to analyze the (potential) relationship between hardware and data set sizes. 100 data scientists from France between Jan-2016 and Aug-2016 were interviewed in order to have exploitable data. Therefore, this sample might not be representative of the true population.
What can you do with the data?
I did not find any past research on a similar scale. You are free to play with this data set. For re-usage of this data set out of Kaggle, please contact the author directly on Kaggle (use "Contact User"). Please mention:
Arbitrarily, we chose characteristics to describe Data Scientists and data set sizes.
Data set size:
For the data, it uses the following fields (DS = Data Scientist, W = Workstation):
You should expect potential noise in the data set. It might not be "free" of internal contradictions, as with all researches.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
"Computer Parts Sales Dataset for demand forcasting", is an educational custom dataset. The purpose of this dataset is to provide a realistic, synthetic sample of computer hardware part sales across different regions in Bangladesh. It is designed to help data scientists, students, and analysts:
This dataset is especially useful for educational purposes, time-series regression tasks, and retail demand modeling experiments. You will also see the result of Linear regrssion on Notebook and how I proved Linear Regression is not a good option for Randomized data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
note: this description only applies to the laptop_price_prediction_cleaned.csv file. Also, there are other versions you can check: the RAW data if you want to extract features yourself, another version without removing the nonsensical price listings.
This dataset contains information about laptop listings from the Algerian marketplace, providing insights into the local laptop market. The data was scraped from Ouedkniss, Algeria's leading classifieds platform, and cleaned through a mix of automated (LLM-assisted) and manual work over one month.
LAPTOP_MODEL, SCREEN_FREQUENCY, SCREEN_RESOLUTION, and RAM_TYPE extracted from titles and descriptions
## Cleaning Methodology
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset consists of two datasets, cpus and gpus scraped using Python scrapy from https://www.techpowerup.com/.
The CPU dataset contains information about various CPU models and their specifications. The dataset includes the following columns:
The GPU dataset contains information about various GPU models and their specifications. The dataset includes the following columns:
Both of the datasets are useful for comparing the performance and features of different CPU and GPU models. They can be used for a variety of applications such as gaming, content creation, AI, Machine learning, and more. It could be used by researchers to study the evolution of the technology in a specific period of time and make predictions for future advancements. It could also be used by professionals in the tech industry, to make informed decisions when choosing components for a build or a system.
The code for the scraper can be found here