Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
Twitterminimizes errors
Facebook
TwitterPython International Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Python Copilot Audio Training using Imports with Knowledge Graphs
This dataset is a subset of the matlok python copilot datasets. Please refer to the Multimodal Python Copilot Training Overview for more details on how to use this dataset.
Details
Each imported module for each unique class in each module file has a question and answer mp3 where one voice reads the question and another voice reads the answer. Both mp3s are stored in the parquet dbytes column and the… See the full description on the dataset page: https://huggingface.co/datasets/matlok/python-audio-copilot-training-using-import-knowledge-graphs.
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
✅ Step 1: Mount to Dataset
Search for my dataset pytorch-models and add it — this will mount it at:
/kaggle/input/pytorch-models/
✅ Step 2: Check file paths Once mounted, the four files will be available at:
/kaggle/input/pytorch-models/base_models.py
/kaggle/input/pytorch-models/ext_base_models.py
/kaggle/input/pytorch-models/ext_hybrid_models.py
/kaggle/input/pytorch-models/hybrid_models.py
✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):
import shutil
shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')
✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:
import base_models
import ext_base_models
import ext_hybrid_models
import hybrid_models
Or, if you only want to import specific classes or functions:
from base_models import YourModelClass
from ext_base_models import AnotherModelClass
✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:
model = base_models.YourModelClass()
output = model(input_data)
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Python Copilot Image Training using Import Knowledge Graphs
This dataset is a subset of the matlok python copilot datasets. Please refer to the Multimodal Python Copilot Training Overview for more details on how to use this dataset.
Details
Each row contains a png file in the dbytes column.
Rows: 216642 Size: 211.2 GB Data type: png Format: Knowledge graph using NetworkX with alpaca text box
Schema
The png is in the dbytes column: { "dbytes": "binary"… See the full description on the dataset page: https://huggingface.co/datasets/matlok/python-image-copilot-training-using-import-knowledge-graphs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. FeltonDate: 5/5/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably in this project.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a particular function:
01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.
02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.
03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.
04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.
05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.
06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.
| Most Imported R Packages | Most Imported Python Packages |
|---|---|
We perform this extraction using the following three regex patterns:
PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')
This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Vezora (From Huggingface) [source]
The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, specifically designed for training and analysis purposes. With 188,000 samples, this dataset offers an extensive range of examples that cater to the research needs of Python programming enthusiasts.
This valuable resource consists of various columns, including input, which represents the input or parameters required for executing the Python code sample. The instruction column describes the task or objective that the Python code sample aims to solve. Additionally, there is an output column that showcases the resulting output generated by running the respective Python code.
By utilizing this dataset, researchers can effectively study and analyze real-world scenarios and applications of Python programming. Whether for educational purposes or development projects, this dataset serves as a reliable reference for individuals seeking practical examples and solutions using Python
The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, containing 188,000 samples in total. This dataset can be a valuable resource for researchers and programmers interested in exploring various aspects of Python programming.
Contents of the Dataset
The dataset consists of several columns:
- output: This column represents the expected output or result that is obtained when executing the corresponding Python code sample.
- instruction: It provides information about the task or instruction that each Python code sample is intended to solve.
- input: The input parameters or values required to execute each Python code sample.
Exploring the Dataset
To make effective use of this dataset, it is essential to understand its structure and content properly. Here are some steps you can follow:
- Importing Data: Load the dataset into your preferred environment for data analysis using appropriate tools like pandas in Python.
import pandas as pd # Load the dataset df = pd.read_csv('train.csv')
- Understanding Column Names: Familiarize yourself with the column names and their meanings by referring to the provided description.
# Display column names print(df.columns)
- Sample Exploration: Get an initial understanding of the data structure by examining a few random samples from different columns.
# Display random samples from 'output' column print(df['output'].sample(5))
- Analyzing Instructions: Analyze different instructions or tasks present in the 'instruction' column to identify specific areas you are interested in studying or learning about.
# Count unique instructions and display top ones with highest occurrences instruction_counts = df['instruction'].value_counts() print(instruction_counts.head(10))Potential Use Cases
The Vezora/Tested-188k-Python-Alpaca dataset can be utilized in various ways:
- Code Analysis: Analyze the code samples to understand common programming patterns and best practices.
- Code Debugging: Use code samples with known outputs to test and debug your own Python programs.
- Educational Purposes: Utilize the dataset as a teaching tool for Python programming classes or tutorials.
- Machine Learning Applications: Train machine learning models to predict outputs based on given inputs.
Remember that this dataset provides a plethora of diverse Python coding examples, allowing you to explore different
- Code analysis: Researchers and developers can use this dataset to analyze various Python code samples and identify patterns, best practices, and common mistakes. This can help in improving code quality and optimizing performance.
- Language understanding: Natural language processing techniques can be applied to the instruction column of this dataset to develop models that can understand and interpret natural language instructions for programming tasks.
- Code generation: The input column of this dataset contains the required inputs for executing each Python code sample. Researchers can build models that generate Python code based on specific inputs or task requirements using the examples provided in this dataset. This can be useful in automating repetitive programming tasks o...
Facebook
TwitterIndia is one of the fastest developing nations of the world and trade between nations is the major component of any developing nation. This dataset includes the trade data for India for commodities in the HS2 basket.
For more, visit GitHub
The dataset consists of trade values for export and import of commodities in million US$. The dataset is tidy and each row consists of a single observation.
The data is requested from the Department of Commerce, Government of India using Python
A few questions that can be answered using this dataset are: 1. What did India export the most in any given year? 2. Which commodity forms a major chunk of trade? Does it conform to theories of international trade? 3. How has the trade between India and any given country grown over time?
A visualization of this dataset would be a great way to explore more such questions.
Facebook
TwitterDataset Card for Python-DPO
This dataset is the larger version of Python-DPO dataset and has been created using Argilla.
Load with datasets
To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset
ds = load_dataset("NextWealth/Python-DPO")
Data Fields
Each data instance contains:
instruction: The problem description/requirements chosen_code:… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO-Large.
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
TwitterDataset Card for Census Income (Adult)
This dataset is a precise version of Adult or Census Income. This dataset from UCI somehow happens to occupy two links, but we checked and confirm that they are identical. We used the following python script to create this Hugging Face dataset. import pandas as pd from datasets import Dataset, DatasetDict, Features, Value, ClassLabel
url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data" url2 =… See the full description on the dataset page: https://huggingface.co/datasets/cestwc/census-income.
Facebook
TwitterTo use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('spoc_robot', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterDataset Card for "Magicoder-Evol-Instruct-110K-python"
from datasets import load_dataset
dataset = load_dataset("pxyyy/Magicoder-Evol-Instruct-110K", split="train") # Replace with your dataset and split
def contains_python(entry): for c in entry["messages"]: if "python" in c['content'].lower(): return True # return "python" in entry["messages"].lower() # Replace 'column_name' with the column to search
Facebook
TwitterThe CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('cifar10', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/cifar10-3.0.2.png" alt="Visualization" width="500px">
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.
Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes
| Metric | Value |
|---|---|
| Total Images | 1,004 |
| Dataset Size | 12 MB |
| Image Format | PNG |
| Annotation Format | YOLO (.txt) |
| Classes | 4 |
| Train/Val Split | 711/260 (73%/27%) |
| Class ID | Class Name | Count | Description |
|---|---|---|---|
| 0 | dog | 252 | The hunting dog in various poses (jumping, laughing, sniffing, etc.) |
| 1 | duck_dead | 256 | Dead ducks (both black and red variants) |
| 2 | duck_shot | 248 | Ducks in the moment of being shot |
| 3 | duck_flying | 248 | Flying ducks in all directions (left, right, diagonal) |
yolo_dataset_augmented/
├── images/
│ ├── train/ # 711 training images
│ └── val/ # 260 validation images
├── labels/
│ ├── train/ # 711 YOLO annotation files
│ └── val/ # 260 YOLO annotation files
├── classes.txt # Class names mapping
├── dataset.yaml # YOLO configuration file
└── augmented_dataset_stats.json # Detailed statistics
The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:
{
'rotation_range': (-15, 15), # Small rotations for game sprites
'brightness_range': (0.7, 1.3), # Brightness variations
'contrast_range': (0.8, 1.2), # Contrast adjustments
'saturation_range': (0.8, 1.2), # Color saturation
'noise_intensity': 0.02, # Gaussian noise
'horizontal_flip_prob': 0.5, # 50% chance horizontal flip
'scaling_range': (0.8, 1.2), # Scale variations
}
from ultralytics import YOLO
# Load and train
model = YOLO('yolov8n.pt') # Load pretrained model
results = model.train(data='dataset.yaml', epochs=100, imgsz=640)
# Validate
metrics = model.val()
# Predict
results = model('path/to/test/image.png')
import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os
class DuckHuntDataset(Dataset):
def _init_(self, images_dir, labels_dir, transform=None):
self.images_dir = images_dir
self.labels_dir = labels_dir
self.transform = transform
self.images = os.listdir(images_dir)
def _len_(self):
return len(self.images)
def _getitem_(self, idx):
img_path = os.path.join(self.images_dir, self.images[idx])
label_path = os.path.join(self.labels_dir,
self.images[idx].replace('.png', '.txt'))
image = Image.open(img_path)
# Load YOLO annotations
with open(label_path, 'r') as f:
labels = f.readlines()
if self.transform:
image = self.transform(image)
return image, labels
# Usage
dataset = DuckHuntDataset('images/train', 'labels/train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
Each .txt file contains one line per object:
class_id center_x center_y width height
Example annotation:
0 0.492 0.403 0.212 0.315
Where values are normalized (0-1) relative to image dimensions.
This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:
Facebook
TwitterBallroom Python South Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterQuickstart Usage
This dataset can be loaded into python using the Huggingface datasets library. First, install the datasets library via command line: $ pip install datasets
With datasets installed, the user should then import it into their python script / environment:
import datasets
The user can then load the CF-MS_Homo_sapiens_PPI dataset using datasets.load_dataset(...). There are two configurations, or 'views' for the set. The user can choose between them via the name… See the full description on the dataset page: https://huggingface.co/datasets/viridono/CF-MS_Homo_sapiens_PPI.
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.