Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset records latency-sensitive inference instances for GPU-disaggregated serving of deep learning recommendation models. It contains per-instance resource reservations and life cycle timestamps for scheduling analysis and capacity planning.
This dataset represents a groundbreaking trace collection from production GPU-disaggregated serving systems for Deep Learning Recommendation Models (DLRMs), accompanying the NSDI'25 paper on GPU-disaggregated serving at scale. The dataset captures real-world operational characteristics of inference services in a large-scale production environment, providing invaluable insights into resource allocation patterns, temporal dynamics, and system behavior for latency-sensitive ML workloads.
Instance ID. Role. Application group. Requests and limits for CPU, GPU, RDMA, memory, and disk. Density cap per node. Creation, scheduling, and deletion timestamps relative to the trace start.
This dataset enables research in:
This dataset represents one of the first publicly available production traces for GPU-disaggregated DLRM serving, providing:
This dataset provides a unique window into production GPU-disaggregated systems, offering researchers and practitioners valuable insights for advancing the field of large-scale ML serving infrastructure.
Facebook
TwitterThis dataset was created by Alexis T.
Facebook
TwitterThis dataset was created by Aaron B.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.
https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">
Facebook
TwitterUse this data set when submitting code offline for competitions otherwise just use !pip install tabpfn for online use. Usage for offline code submissions within Kaggle notebooks is as follows:
1**.First add the dataset by selecting "add data" and searching for this dataset and adding it to your input. **
2.**Next add the following code to a code block in your notebook **
python
!pip install tabpfn --no-index --find-links=file:///kaggle/input/tabpfn
!mkdir -p /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff
!cp /kaggle/input/tabpfn/prior_diff_real_checkpoint_n_0_epoch_100.cpkt /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff/
3.** Import** :
from tabpfn import TabPFNClassifier
4.**Now you are all set you can create a classifier and run it offline for submission in offline kaggle code competitions:**
python
classifier = TabPFNClassifier(device='cpu',N_ensemble_configurations=64)
classifier.fit(X_train, Y_train)
y_eval, p_eval = classifier.predict(X_cv, return_winning_probability=True)
If you want to use TabPFN with GPU use the following code when you make the model:
classifier = TabPFNClassifier(device='cuda',N_ensemble_configurations=32)
You can find documentation for this package on GitHub: https://github.com/automl/TabPFN.git Original paper on TabPFN can be found at: https://arxiv.org/abs/2207.01848 License Copyright 2022 Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Facebook
TwitterComplete llama.cpp build folder compiled with CUDA support for compute capability 7.5 (Turing architecture GPUs: RTX 2060/2070/2080, Tesla T4, Quadro RTX).
Build Configuration: - CUDA 12.5 (compatibility with CUDA 12.x) - Compute Capability: SM_75 - Optimized for Nvidia Turing GPUs - Complete build directory with CMake files
Contents:
- Complete build/ directory
- build/bin/ - All compiled executables and shared libraries
- build/CMakeCache.txt - CMake configuration
- build/compile_commands.json - Compilation database
- All build artifacts and intermediate files
Usage: 1. Extract the build folder 2. Ensure CUDA 12.x runtime is installed on target system 3. Set LD_LIBRARY_PATH to include build/bin directory 4. Run executables from build/bin/
Requirements on target machine: - Nvidia GPU with compute capability 7.5 - CUDA 12.x runtime libraries - Linux x86_64
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
# Installation
!pip install -qq --no-index --find-links=../input/torch-tensorrt-v2-2-0 torch-tensorrt==2.2.0
import torch
import torch_tensorrt
trt_model_fp16 = torch_tensorrt.compile(mlm_model,
inputs= [torch_tensorrt.Input(shape=[batch_size, 1024], dtype=torch.int32), # input_ids
torch_tensorrt.Input(shape=[batch_size, 1024], dtype=torch.int32)], # attention_mask
enabled_precisions= {torch.float32}, # Run with 32-bit precision
workspace_size=2000000000,
truncate_long_and_double=True
)
torch.jit.save(trt_model_fp16, 'kaggle-mlm_model-1024-gpu-aug0-01-swa.trt_fp16.ts')
trt_model_fp16 = torch.jit.load(model_path)
.
.
.
inputs = {k: v.type(torch.int32).cuda() for k, v in inputs.items()}
output_trt = trt_model_fp16(inputs['input_ids'], inputs['attention_mask'])
output_trt
Facebook
TwitterNemotron-3-8B-Base-4k Model Overview License
The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement. Description
Nemotron-3-8B-Base-4k is a large language foundation model for enterprises to build custom LLMs. This foundation model has 8 billion parameters, and supports a context length of 4,096 tokens. Nemotron-3-8B-Base-4k is part of Nemotron-3, which is a family of enterprise ready generative text models compatible with NVIDIA NeMo Framework. For other models in this collection, see the collections page.
NVIDIA NeMo is an end-to-end, cloud-native platform to build, customize, and deploy generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI. To get access to NeMo Framework, please sign up at this link. References
Announcement Blog Model Architecture
Architecture Type: Transformer
Network Architecture: Generative Pre-Trained Transformer (GPT-3) Software Integration
Runtime Engine(s): NVIDIA AI Enterprise
Toolkit: NeMo Framework
To get access to NeMo Framework, please sign up at this link. See NeMo inference container documentation for details on how to setup and deploy an inference server with NeMo.
Sample Inference Code:
from nemo.deploy import NemoQuery
nq = NemoQuery(url="localhost:8000", model_name="Nemotron-3-8B-4K")
output = nq.query_llm(prompts=["The meaning of life is"], max_output_token=200, top_k=1, top_p=0.0, temperature=0.1) print(output)
Supported Hardware:
H100
A100 80GB, A100 40GB
Model Version(s)
Nemotron-3-8B-base-4k-BF16-1 Dataset & Training
The model uses a learning rate of 3e-4 with a warm-up period of 500M tokens and a cosine learning rate annealing schedule for 95% of the total training tokens. The decay stops at a minimum learning rate of 3e-5. The model is trained with a sequence length of 4096 and uses FlashAttention’s Multi-Head Attention implementation. 1,024 A100s were used for 19 days to train the model.
NVIDIA models are trained on a diverse set of public and proprietary datasets. This model was trained on a dataset containing 3.8 Trillion tokens of text. The dataset contains 53 different human languages (including English, German, Russian, Spanish, French, Japanese, Chinese, Italian, and Dutch) and 37 programming languages. The model also uses the training subsets of downstream academic benchmarks from sources like FLANv2, P3, and NaturalInstructions v2. NVIDIA is committed to the responsible development of large language models and conducts reviews of all datasets included in training. Evaluation Task Num-shot Score MMLU* 5 54.4 WinoGrande 0 70.9 Hellaswag 0 76.4 ARC Easy 0 72.9 TyDiQA-GoldP** 1 49.2 Lambada 0 70.6 WebQS 0 22.9 PiQA 0 80.4 GSM8K 8-shot w/ maj@8 39.4
** The languages used are Arabic, Bangla, Finnish, Indonesian, Korean, Russian and Swahili. Intended use
This is a completion model. For best performance, users are encouraged to customize the completion model using NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA), and SFT/RLHF. For chat use cases, please consider using Nemotron-3-8B chat variants. Ethical use
Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the NVIDIA AI Foundation Models Community License Agreement. Limitations
The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The BUTTER-E - Energy Consumption Data for the BUTTER Empirical Deep Learning Dataset provides node-level energy consumption data collected via watt-meters, complementing the primary BUTTER dataset. This dataset records energy consumption and performance metrics for 1,059,206 experimental runs across diverse configurations of fully connected neural networks. Key attributes include:
1.timestamp: The precise time of the energy consumption measurement. 2.node:The hardware node identifier (e.g., r103u05) where the experiment was conducted. 3.watts: The energy consumption (in watts) recorded for the corresponding node at the given timestamp.
Highlights Data spans 30,582 distinct configurations, including variations across 13 datasets, 20 network sizes, 8 network shapes, and 14 depths. Measurements were taken on CPU and GPU hardware, offering insights into the relationship between neural network parameters and energy consumption. The dataset provides valuable information for analyzing the energy efficiency of deep learning models, particularly in relation to cache effects, dataset size, and network architecture.
Use Cases This dataset is ideal for: Energy-efficient AI research: Understanding how energy consumption scales with model size, dataset properties, and network configurations. Performance optimization: Identifying configurations with optimal trade-offs between performance and energy usage. Sustainability analysis: Evaluating the carbon footprint of training and deploying deep learning models.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a beginner-friendly SQLite database designed to help users practice SQL and relational database concepts. The dataset represents a basic business model inspired by NVIDIA and includes interconnected tables covering essential aspects like products, customers, sales, suppliers, employees, and projects. It's perfect for anyone new to SQL or data analytics who wants to learn and experiment with structured data.
Includes details of 15 products (e.g., GPUs, AI accelerators). Attributes: product_id, product_name, category, release_date, price.
Lists 20 fictional customers with their industry and contact information. Attributes: customer_id, customer_name, industry, contact_email, contact_phone.
Contains 100 sales records tied to products and customers. Attributes: sale_id, product_id, customer_id, sale_date, region, quantity_sold, revenue.
Features 50 suppliers and the materials they provide. Attributes: supplier_id, supplier_name, material_supplied, contact_email.
Tracks materials supplied to produce products, proportional to sales. Attributes: supply_chain_id, supplier_id, product_id, supply_date, quantity_supplied.
Lists 5 departments within the business. Attributes: department_id, department_name, location.
Contains data on 30 employees and their roles in different departments. Attributes: employee_id, first_name, last_name, department_id, hire_date, salary.
Describes 10 projects handled by different departments. Attributes: project_id, project_name, department_id, start_date, end_date, budget.
Number of Tables: 8 Total Rows: Around 230 across all tables, ensuring quick queries and easy exploration.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Learn: train_cube3_qtm_MLP2RB_04M_1735350897.csv 4ч 19 мин train_cube3_qtm_MLP2RB_04M_1735378399.csv 4ч 19 мин
1729775349
1729775893
1729776146
1729776388
1729776568
47/69 оптимально
subprocess.run([ "nvcc", "--version" ])
subprocess.run( [ "nvidia-smi" ])
subprocess.run([ "cat" , "/etc/os-release" ])
subprocess.run([ "uname" , "-srm" ])
subprocess.run([ "cat" , "/proc/version" ])
subprocess.run([ "lspci" , "-k" ])
subprocess.run([ "cat" , "/proc/cpuinfo" ])
subprocess.run([ "nvidia-smi" , "-q" ])
subprocess.run([ "arch" ])
Facebook
TwitterThis Dockerfile creates an environment that meets all the dependencies in the example submission file in Image Matching Challenge 2023. (including colmap and pycolmap)
Warning: You should replace YOURIMAGENAME with the name you desire for your environment at the later steps.
Edit / Create the /etc/docker/daemon.json with content:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
Just go to the root folder of the repository and execute the following lines in your terminal.
DOCKER_BUILDKIT=0 docker build -f Dockerfile -t YOURIMAGENAME .
You can simply run your environment with:
docker run --gpus all -it --rm YOURIMAGENAME
You can add these lines to your ~/.bashrc:
djupyter() {
docker run -v $PWD:/tmp/working -v ${HOME}/.cache:/container_cache -w=/tmp/working -e "XDG_CACHE_HOME=/container_cache" -p 8888:8888 --gpus all --rm -it YOURIMAGENAME jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working --allow-root
}
After resetting all your terminals you can simply open a new terminal then type:
djupyter
and voila! You've just opened your docker, mounted your current folder and started a Jupyter server!
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive trace of AI workloads running on a large-scale GPU cluster with spot resource provisioning capabilities. It captures real-world operational characteristics from a production environment, managing both high-priority workloads with strict Service Level Objectives (SLOs) and opportunistic spot workloads.
This dataset is valuable for:
Scheduling Algorithm Development
Cluster Design Studies
Workload Characterization
Economic Analysis
This dataset represents a significant contribution to the understanding of large-scale GPU cluster operations and spot resource management in production AI/ML environments.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset provides detailed metrics on the performance and power consumption of various GPUs when running different cryptocurrency mining algorithms. It includes information about hash rates, power usage, and other technical specifications for multiple GPUs. The data was scraped from the Hashrate.no GPU Calculator.
The dataset has 111 columns, providing extensive insights into GPU performance. Key features include:
kwh ($/kWh): Cost of electricity.This dataset is useful for: - Comparing GPU efficiency and profitability in cryptocurrency mining. - Analyzing power consumption for various algorithms. - Building predictive models for mining profitability. - Optimizing hardware selection for miners.
Here’s a preview of the dataset:
| Name | AbelHashPower | AbelHash (Mh/s) | zkSNARKPower (Watt) | zkSNARK (Mproof/s) |
|---|---|---|---|---|
| 4090 | 249.0 | 124.70 | 310.0 | 1.43 |
| 4080S | 168.0 | 87.95 | 219.0 | 0.92 |
| 4070TI | 108.0 | 64.54 | 141.0 | 0.64 |
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains the code and documentation for our ACSAC 2023 paper From Attachments to SEO: Click Here to Learn More about Clickbait PDFs!. With this artifact, we hope to foster future research on this subject.
We provide the screenshots and file hashes of the PDFs in our dataset, allowing inspection of the images and download (from external sources, e.g. VirusTotal) of the same files. Moreover, we also share the URLs contained in the PDFs and the code to reproduce most of the findings of our paper (we are not allowed to share VirusTotal data due to their Terms of Service).
We recommend inspecting our code from the Kaggle platform, as this does not involve any setup nor download of the data. Nonetheless, all our code can be executed in a regular laptop. We used Ubuntu 18.0 and Python 3.6.9. The dependencies for this code are minimal: Pandas 1.3.3 or higher, Numpy 1.21.2 and Matplotlib 3.4.3.
Part of our experiments involve developing and training a deep learning model (based on DeepCluster). We created an additional Github repository containing the scripts that can help reproduce the clustering procedure. This code was developed using Ubuntu 19.0 and run on a TITAN RTX GPU. To support future research, we have shared the input and output data used in the clustering process. We have also provided the pairwise distances of the embeddings used in the second clustering step (input to DBSCAN), uploaded on Kaggle due to size restrictions of files on Github. The results of this specific experiment cannot be repeated due to manual analysis checks, but we have shared the input, output, and code to make it as reproducible as possible.
Please feel free to leave a comment or reach out in case of any question or issue :)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
If you want to import Unsloth while turning off the internet:
!pip install --no-index --find-links=/kaggle/input/unsloth-for-offline torch torchvision torchaudio
!pip install --no-index --find-links=/kaggle/input/unsloth-for-offline xformers
!pip install --no-index --find-links=/kaggle/input/unsloth-for-offline unsloth
!pip install --no-index --find-links=/kaggle/input/unsloth-for-offline bitsandbytes
Then you can follow the stardard notebook in unsloth document to fine tune your model.
Pipeline / model splitting loading is also allowed, so if you do not have enough VRAM for 1 GPU to load say Llama 70B, no worries - we will split the model for you on each GPU! To enable this, use the device_map = "balanced" flag:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Llama-3.3-70B-Instruct",
load_in_4bit = True,
device_map = "balanced",
)
Contributors have also created a repos to enable or improve multi-GPU support with Unsloth. If you want to use opensloth while turning off internet, run the following code step-by-step:
```
import tarfile
import os
source_dir = "/kaggle/input/unsloth-for-offline/fire-0.7.0/fire-0.7.0" output_path = "/kaggle/working/fire-0.7.0.tar.gz" # You can change this path
with tarfile.open(output_path, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
print(f"Created: {output_path}")
!pip install --no-index --find-links=/kaggle/working/ fire
!pip install --no-index --find-links=/kaggle/input/unsloth-for-offline opensloth==0.1.7
```
Facebook
TwitterWhen I owned my Zotac GEFORCE RTX 3070 TWIN EDGE OC 8GB I was curious about power tuning under Ubuntu. After several manual iterations I decided to create this set of tests that allows to run load tests using different power levels. As a result, CSV files are generated that can then be analyzed to find the best performance / consumption ratio.
The script generates files that, after converting to CSV, represent the performance of the card in different test scenarios and power levels (which depend on each model).
Two csv files have been added, one with the raw data and the other with more features that allow analyzing the performance obtained.
My thanks to the guys at Lambda Labs because their tests are the basis for these tests. They can be found at https://github.com/lambdal/lambda-tensorflow-benchmark/tree/tf2
The interesting thing for this dataset would be to have much more data, both from the same card models and from other models, to generate a reliable knowledge base on the information generated.
It would be interesting to obtain the best power levels depending on the Tensorflow models to run.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information on the PC requirements for a variety of different games. Each row represents a single game, and the following attributes are included:**
This dataset can be used to help determine whether a PC meets the requirements to run a particular game. It is important to note that meeting the minimum requirements does not guarantee optimal performance and that higher specifications may be needed for the best gaming experience.
The code used to scrape the data can be found here.
Facebook
TwitterSGEMM GPU kernel performance Dataset
The data set is of SGEMM GPU kernel performance which consists of 14 features and 241600 records. This data set measures the running time of a matrix-matrix product A*B = C, where all matrices have size 2048 x 2048, using a parameterizable SGEMM GPU kernel with 261400 possible parameter combinations. Out of 14 features, the first 10 are ordinal and can only take up to 4 different powers of two values, and the 4 last variables are binary.
https://archive.ics.uci.edu/ml/datasets/SGEMM+GPU+kernel+performance
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The released trace contains a hybrid of training and inference jobs running state-of-the-art ML algorithms. It is collected from a large production cluster with over 6,500 GPUs (on ~1800 machines) in Alibaba PAI (Platform for Artificial Intelligence), spanning the July and August of 2020. We also include a Jupyter notebook that parses the trace and highlights some of the main characteristics (see section 3 Demo of Data Analysis).
We also present a characterization study of the trace in a paper, "MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters", published in NSDI ’22.
https://github.com/alibaba/clusterdata/tree/master/cluster-trace-gpu-v2020
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset records latency-sensitive inference instances for GPU-disaggregated serving of deep learning recommendation models. It contains per-instance resource reservations and life cycle timestamps for scheduling analysis and capacity planning.
This dataset represents a groundbreaking trace collection from production GPU-disaggregated serving systems for Deep Learning Recommendation Models (DLRMs), accompanying the NSDI'25 paper on GPU-disaggregated serving at scale. The dataset captures real-world operational characteristics of inference services in a large-scale production environment, providing invaluable insights into resource allocation patterns, temporal dynamics, and system behavior for latency-sensitive ML workloads.
Instance ID. Role. Application group. Requests and limits for CPU, GPU, RDMA, memory, and disk. Density cap per node. Creation, scheduling, and deletion timestamps relative to the trace start.
This dataset enables research in:
This dataset represents one of the first publicly available production traces for GPU-disaggregated DLRM serving, providing:
This dataset provides a unique window into production GPU-disaggregated systems, offering researchers and practitioners valuable insights for advancing the field of large-scale ML serving infrastructure.