100+ datasets found
  1. Python Import Data India – Buyers & Importers List

    • seair.co.in
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    India
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  2. Python Import Data in March - Seair.co.in

    • seair.co.in
    Updated Mar 30, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2016). Python Import Data in March - Seair.co.in [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Mar 30, 2016
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    Tuvalu, Cyprus, Chad, French Polynesia, Israel, Tanzania, Germany, Bermuda, Maldives, Lao People's Democratic Republic
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  3. Vezora/Tested-188k-Python-Alpaca: Functional

    • kaggle.com
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Vezora/Tested-188k-Python-Alpaca: Functional [Dataset]. https://www.kaggle.com/datasets/thedevastator/vezora-tested-188k-python-alpaca-functional-pyth
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Vezora/Tested-188k-Python-Alpaca: Functional Python Code Dataset

    188k Functional Python Code Samples

    By Vezora (From Huggingface) [source]

    About this dataset

    The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, specifically designed for training and analysis purposes. With 188,000 samples, this dataset offers an extensive range of examples that cater to the research needs of Python programming enthusiasts.

    This valuable resource consists of various columns, including input, which represents the input or parameters required for executing the Python code sample. The instruction column describes the task or objective that the Python code sample aims to solve. Additionally, there is an output column that showcases the resulting output generated by running the respective Python code.

    By utilizing this dataset, researchers can effectively study and analyze real-world scenarios and applications of Python programming. Whether for educational purposes or development projects, this dataset serves as a reliable reference for individuals seeking practical examples and solutions using Python

    How to use the dataset

    The Vezora/Tested-188k-Python-Alpaca dataset is a comprehensive collection of functional Python code samples, containing 188,000 samples in total. This dataset can be a valuable resource for researchers and programmers interested in exploring various aspects of Python programming.

    Contents of the Dataset

    The dataset consists of several columns:

    • output: This column represents the expected output or result that is obtained when executing the corresponding Python code sample.
    • instruction: It provides information about the task or instruction that each Python code sample is intended to solve.
    • input: The input parameters or values required to execute each Python code sample.

    Exploring the Dataset

    To make effective use of this dataset, it is essential to understand its structure and content properly. Here are some steps you can follow:

    • Importing Data: Load the dataset into your preferred environment for data analysis using appropriate tools like pandas in Python.
    import pandas as pd
    
    # Load the dataset
    df = pd.read_csv('train.csv')
    
    • Understanding Column Names: Familiarize yourself with the column names and their meanings by referring to the provided description.
    # Display column names
    print(df.columns)
    
    • Sample Exploration: Get an initial understanding of the data structure by examining a few random samples from different columns.
    # Display random samples from 'output' column
    print(df['output'].sample(5))
    
    • Analyzing Instructions: Analyze different instructions or tasks present in the 'instruction' column to identify specific areas you are interested in studying or learning about.
    # Count unique instructions and display top ones with highest occurrences
    instruction_counts = df['instruction'].value_counts()
    print(instruction_counts.head(10))
    

    Potential Use Cases

    The Vezora/Tested-188k-Python-Alpaca dataset can be utilized in various ways:

    • Code Analysis: Analyze the code samples to understand common programming patterns and best practices.
    • Code Debugging: Use code samples with known outputs to test and debug your own Python programs.
    • Educational Purposes: Utilize the dataset as a teaching tool for Python programming classes or tutorials.
    • Machine Learning Applications: Train machine learning models to predict outputs based on given inputs.

    Remember that this dataset provides a plethora of diverse Python coding examples, allowing you to explore different

    Research Ideas

    • Code analysis: Researchers and developers can use this dataset to analyze various Python code samples and identify patterns, best practices, and common mistakes. This can help in improving code quality and optimizing performance.
    • Language understanding: Natural language processing techniques can be applied to the instruction column of this dataset to develop models that can understand and interpret natural language instructions for programming tasks.
    • Code generation: The input column of this dataset contains the required inputs for executing each Python code sample. Researchers can build models that generate Python code based on specific inputs or task requirements using the examples provided in this dataset. This can be useful in automating repetitive programming tasks o...
  4. n

    Storage and Transit Time Data and Code

    • data.niaid.nih.gov
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset authored and provided by
    Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. FeltonDate: 5/5/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably in this project.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

    Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a particular function:

    01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

    02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

    03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

    04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

    05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

    06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.

  5. Python Import Data in January - Seair.co.in

    • seair.co.in
    Updated Jan 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2016). Python Import Data in January - Seair.co.in [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 29, 2016
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    Iceland, Ecuador, Equatorial Guinea, Bosnia and Herzegovina, Chile, Marshall Islands, Indonesia, Congo, Bahrain, Vietnam
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  6. e

    Eximpedia Export Import Trade

    • eximpedia.app
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Curaçao, Bosnia and Herzegovina, Antarctica, United States Minor Outlying Islands, Germany, Romania, Djibouti, Argentina, Eritrea, Malta
    Description

    Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries

  7. Python Import Data in December - Seair.co.in

    • seair.co.in
    Updated Dec 31, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2015). Python Import Data in December - Seair.co.in [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Dec 31, 2015
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    Korea (Democratic People's Republic of), Guinea, Bulgaria, Mauritius, Bhutan, Nicaragua, French Guiana, Tonga, Palau, Pitcairn
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  8. S

    FLUNT simulated trapezoidal PCHE flow and heat transfer data set

    • scidb.cn
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhao zi yan (2024). FLUNT simulated trapezoidal PCHE flow and heat transfer data set [Dataset]. http://doi.org/10.57760/sciencedb.hjs.00106
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    Science Data Bank
    Authors
    zhao zi yan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Build a trapezoidal PCHE two channel model using FLUNT software as a unit for processing, simulate by changing different input conditions, obtain corresponding results, export them as CSV files, use Python for data processing, remove unnecessary information columns, and combine certain information from each file to form a snapshot matrix CSV file. After processing the snapshot matrix CSV file in Python, import it into MATLAB for prediction, and finally export the MATLAB results as a result CSV file.

  9. h

    Python-DPO-Large

    • huggingface.co
    Updated Mar 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NextWealth Entrepreneurs Private Limited (2023). Python-DPO-Large [Dataset]. https://huggingface.co/datasets/NextWealth/Python-DPO-Large
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2023
    Dataset authored and provided by
    NextWealth Entrepreneurs Private Limited
    Description

    Dataset Card for Python-DPO

    This dataset is the larger version of Python-DPO dataset and has been created using Argilla.

      Load with datasets
    

    To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset

    ds = load_dataset("NextWealth/Python-DPO")

      Data Fields
    

    Each data instance contains:

    instruction: The problem description/requirements chosen_code:… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO-Large.

  10. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Oct 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 10/29/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the
    `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and
    supporting analyses. This script generates the key figures and summary statistics
    used in the study that then get saved in the manuscript_figures folder. Note that all
    maps were produced using Python code found in the "supporting_code"" folder.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  11. Z

    polyOne Data Set - 100 million hypothetical polymers including 29 properties...

    • data.niaid.nih.gov
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rampi Ramprasad (2023). polyOne Data Set - 100 million hypothetical polymers including 29 properties [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7124187
    Explore at:
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Christopher Kuenneth
    Rampi Ramprasad
    Description

    polyOne Data Set

    The data set contains 100 million hypothetical polymers each with 29 predicted properties using machine learning models. We use PSMILES strings to represent polymer structures, see here and here. The polymers are generated by decomposing previously synthesized polymers into unique chemical fragments. Random and enumerative compositions of these fragments yield 100 million hypothetical PSMILES strings. All PSMILES strings are chemically valid polymers but, mostly, have never been synthesized before. More information can be found in the paper. Please note the license agreement in the LICENSE file.

    Full data set including the properties

    The data files are in Apache Parquet format. The files start with polyOne_*.parquet.

    I recommend using dask (pip install dask) to load and process the data set. Pandas also works but is slower.

    Load sharded data set with dask python import dask.dataframe as dd ddf = dd.read_parquet("*.parquet", engine="pyarrow")

    For example, compute the description of data set ```python df_describe = ddf.describe().compute() df_describe

    
    
    PSMILES strings only
    
    
    
      
    generated_polymer_smiles_train.txt - 80 million PSMILES strings for training polyBERT. One string per line.
      
    generated_polymer_smiles_dev.txt - 20 million PSMILES strings for testing polyBERT. One string per line.
    
  12. T

    Data from: dolma

    • tensorflow.org
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). dolma [Dataset]. https://www.tensorflow.org/datasets/catalog/dolma
    Explore at:
    Dataset updated
    Mar 14, 2025
    Description

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('dolma', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  13. o

    Population Distribution Workflow using Census API in Jupyter Notebook:...

    • openicpsr.org
    delimited
    Updated Jul 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda (2020). Population Distribution Workflow using Census API in Jupyter Notebook: Dynamic Map of Census Tracts in Boone County, KY, 2000 [Dataset]. http://doi.org/10.3886/E120382V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Texas A&M University
    Authors
    Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2000
    Area covered
    Boone County, Kentucky
    Description

    This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).

  14. H

    Hydroinformatics Instruction Module Example Code: Programmatic Data Access...

    • hydroshare.org
    • search.dataone.org
    zip
    Updated Mar 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones; Jeffery S. Horsburgh (2022). Hydroinformatics Instruction Module Example Code: Programmatic Data Access with USGS Data Retrieval [Dataset]. https://www.hydroshare.org/resource/a58b5d522d7f4ab08c15cd05f3fd2ad3
    Explore at:
    zip(34.5 KB)Available download formats
    Dataset updated
    Mar 3, 2022
    Dataset provided by
    HydroShare
    Authors
    Amber Spackman Jones; Jeffery S. Horsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This resource contains Jupyter Notebooks with examples for accessing USGS NWIS data via web services and performing subsequent analysis related to drought with particular focus on sites in Utah and the southwestern United States (could be modified to any USGS sites). The code uses the Python DataRetrieval package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

    This resources consists of 6 example notebooks: 1. Example 1: Import and plot daily flow data 2. Example 2: Import and plot instantaneous flow data for multiple sites 3. Example 3: Perform analyses with USGS annual statistics data 4. Example 4: Retrieve data and find daily flow percentiles 3. Example 5: Further examination of drought year flows 6. Coding challenge: Assess drought severity

  15. T

    mimic_play

    • tensorflow.org
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). mimic_play [Dataset]. https://www.tensorflow.org/datasets/catalog/mimic_play
    Explore at:
    Dataset updated
    May 31, 2024
    Description

    Real dataset of 14 long horizon manipulation tasks. A mix of human play data and single robot arm data performing the same tasks.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('mimic_play', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  16. 3xm 80 160 (RGB-D Instance Seg. for bin-picking)

    • kaggle.com
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobia Ippolito (2024). 3xm 80 160 (RGB-D Instance Seg. for bin-picking) [Dataset]. https://www.kaggle.com/datasets/tobiaippolito/3xm-80-160/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tobia Ippolito
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    In short

    This dataset used to investigate the influence of the unique amount of 3D-Models (Shapes) and Materials (Textures) towards the shape-textures bias, performance and generalization of deep neural network instance segmentation in my bachelor exam.

    • one of nine datasets created in Unreal Engine 5 with an NVIDIA RTX A4500
    • It uses 80 unique shapes and 160 unique textures
    • RGB, depth and solution masks are available
    • 20.000 Scenes
    • Ready to use Dataloader, training and inference -> see next section

    Usage

    You can load the images like:

    import cv2
    
    image = cv2.imread(img_path)
    if image is None:
      raise FileNotFoundError(f"Error during data loading: there is no '{img_path}'")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
    depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
    if len(depth.shape) > 2:
      _, depth, _, _ = cv2.split(depth)
          
    mask = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)  # cv2.IMREAD_GRAYSCALE)
    

    For easy use I recommend to use my own code. You can directly use it to train Mask R-CNN or just use the dataloader. Both are shown now:

    First: Clone my torch github project into your project terminal cd ./path/to/your/project git clone https://github.com/xXAI-botXx/torch-mask-rcnn-instance-segmentation.git Second: Install the anaconda env (optional) terminal cd ./path/to/your/project cd ./torch-mask-rcnn-instance-segmentation conda env create -f conda_env.yml Third: You are ready to use

    Using only the dataloader for your custom project: ```python import os import numpy as np import matplotlib.pyplot as plt import cv2 from torch.utils.data import DataLoader

    import sys sys.path.append("./torch-mask-rcnn-instance-segmentation")

    from maskrcnn_toolkit import DATA_LOADING_MODE, Dual_Dir_Dataset, collate_fn, extract_and_visualize_mask

    data_mode = DATA_LOADING_MODE.ALL

    dataset = Dual_Dir_Dataset(img_dir="/path/to/rgb-folder", depth_dir="/path/to/depth-folder", mask_dir="/path/to/mask-folder", transform=None, amount=1, start_idx=0, end_idx=0, image_name="...", data_mode=data_mode, use_mask=True, use_depth=False, log_path="./logs", width=1920, height=1080, should_log=True, should_print=True, should_verify=False) data_loader = DataLoader(dataset, batch_size=5, shuffle=True, num_workers=4, collate_fn=collate_fn)

    plot

    for data in data_loader: for batch_idx in range(len(data[0])): if len(data) == 3: image = data[0][batch_idx].cpu().unsqueeze(0) masks = data[1][batch_idx]["masks"] masks = masks.cpu() name = data[2][batch_idx] else: image = data[0][batch_idx].cpu().unsqueeze(0) name = data[1][batch_idx]

      image = image.cpu().numpy().squeeze(0)
      image = np.transpose(image, (1, 2, 0)) # Convert to HWC
    
      # Remove 4.th channel if existing
      if image.shape[2] == 4:
        depth = image[:, :, 3]
        image = image[:, :, :3]
      else:
        depth = None
    
      masks_gt = masks.cpu().numpy()
      masks_gt = np.transpose(masks_gt, (1, 2, 0))
      mask = extract_and_visualize_mask(masks_gt, image=None, ax=None, visualize=False, color_map=None, soft_join=False)
    
      # plot
      cols = 1
      if depth is not None:
        cols += 1
      if mask is not None:
        cols += 1
    
      fig, ax = plt.subplots(nrows=1, ncols=cols, figsize=(20, 15*cols))
      fig.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.05, hspace=0.05)
    
      plot_idx = 0
      ax[plot_idx].imshow(image)
      ax[plot_idx].set_title("RGB Input Image")
      ax[plot_idx].axis("off")
    
      if depth is not None:
        plot_idx += 1
        ax[plot_idx].imshow(depth, cmap="gray")
        ax[plot_idx].set_title("Depth Input Image")
        ax[plot_idx].axis("off")
    
      if mask is not None:
        plot_idx += 1
        ax[plot_idx].imshow(mask)
        ax[plot_idx].set_title("Mask Ground Truth")
        ax[plot_idx].axis("off")
    
      plt.show()
    
    
    **Using the whole Mask R-CNN training pipeline:**
    ```python
    import sys
    sys.path.append("./torch-mask-rcnn-instance-segmentation")
    
    from maskrcnn_toolkit import DATA_LOADING_MODE, train
    
    
    # set the vars as you need
    
    WEIGHTS_PATH = None   # Path to the model weights file
    USE_DEPTH = False      # Whether to include depth information -> as rgb and depth on green channel
    VERIFY_DATA = False     # True is recommended
    
    GROUND_PATH = "D:/3xM"  
    DATASET_NAME = "3xM_Dataset_80_160"
    IMG_DIR = os.path.join(G...
    
  17. R

    Custom Yolov7 On Kaggle On Custom Dataset

    • universe.roboflow.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 29, 2023
    Dataset authored and provided by
    Owais Ahmad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Person Car Bounding Boxes
    Description

    Custom Training with YOLOv7 🔥

    Some Important links

    Contact Information

    Objective

    To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

    Data Acquisition

    The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

    from IPython.display import Markdown, display
    
    display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))
    

    Custom Training with YOLOv7 🔥

    In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

    • Export the dataset to YOLOv7
    • Train YOLOv7 to recognize the objects in our dataset
    • Evaluate our YOLOv7 model's performance
    • Run test inference to view performance of YOLOv7 model at work

    📦 YOLOv7

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

    Image Credit - jinfagang

    Step 1: Install Requirements

    !git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements
    %cd yolov7
    !pip install -qr requirements.txt
    !pip install -q roboflow
    

    Downloading YOLOV7 starting checkpoint

    !wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"
    
    import os
    import glob
    import wandb
    import torch
    from roboflow import Roboflow
    from kaggle_secrets import UserSecretsClient
    from IPython.display import Image, clear_output, display # to display images
    
    
    
    print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")
    

    https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

    I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

    YOLOv7-Car-Person-Custom

    try:
      user_secrets = UserSecretsClient()
      wandb_api_key = user_secrets.get_secret("wandb_api")
      wandb.login(key=wandb_api_key)
      anonymous = None
    except:
      wandb.login(anonymous='must')
      print('To use your W&B account,
    Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. 
    Get your W&B access token from here: https://wandb.ai/authorize')
      
      
      
    wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")
    

    Step 2: Assemble Our Dataset

    https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

    In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

    In Roboflow, We can choose between two paths:

    Version v2 Aug 12, 2022 Looks like this.

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

    user_secrets = UserSecretsClient()
    roboflow_api_key = user_secrets.get_secret("roboflow_api")
    
    rf = Roboflow(api_key=roboflow_api_key)
    project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq")
    dataset = project.version(2).download("yolov7")
    

    Step 3: Training Custom pretrained YOLOv7 model

    Here, I am able to pass a number of arguments: - img: define input image size - batch: determine

  18. Python Import Data in August - Seair.co.in

    • seair.co.in
    Updated Aug 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2016). Python Import Data in August - Seair.co.in [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Aug 20, 2016
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    Belgium, Lebanon, Virgin Islands (U.S.), Gambia, Saint Pierre and Miquelon, Falkland Islands (Malvinas), Nepal, Christmas Island, South Africa, Ecuador
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  19. EMAG2: Earth Magnetic Anomaly Grid (2-arc-minute resolution) compressed for...

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Mather; Ben Mather (2020). EMAG2: Earth Magnetic Anomaly Grid (2-arc-minute resolution) compressed for NumPy [Dataset]. http://doi.org/10.5281/zenodo.3245551
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ben Mather; Ben Mather
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Earth
    Description

    A compressed NumPy version of the EMAG2 (v3) global Earth Magnetic anomaly grid compiled from satellite, ship, and airborne magnetic measurements. The original CSV data was imported, transformed, and saved to a compressed NumPy archive as follows:

    import numpy as np
    
    mag_data = np.loadtxt('EMAG2_V3_20170530.csv', delimiter=',', usecols=(2,3,4,5,7))
    lon_mask = mag_data[:,0] > 180.0
    mag_data[lon_mask,0] -= 360.0
    np.savez_compressed('EMAG2_V3_20170530.npz', data=mag_data.astype(np.float32))

    The NumPy archive (contained in this repository) can be efficiently loaded in Python workflows. It contains the following columns:

    1. Longitude - geographic longitudinal coordinates in decimal degrees (WGS84)
    2. Latitude - geographic latitudinal coordinates in decimal degrees (WGS84)
    3. SeaLevel - magnetic anomaly value at sea level (nT)
    4. UpCont - magnetic anomaly value at continuous 4km altitude (nT)
    5. Error - Error estimate (nT)

    Code 888 is assigned in certain cells on grid edges where the data source is ambiguous and assigned an error of -888 nT.
    Code 999 is assigned in cells where no data is reported with the anomaly value assigned 99999 nT and an error of -999 nT.

    Reference

    Brian Meyer, Richard Saltus, Arnaud Chulliat (2017): EMAG2: Earth Magnetic Anomaly Grid (2-arc-minute resolution) Version 3. National Centers for Environmental Information, NOAA. Model. doi:10.7289/V5H70CVX

  20. T

    Data from: dices

    • tensorflow.org
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). dices [Dataset]. https://www.tensorflow.org/datasets/catalog/dices
    Explore at:
    Dataset updated
    Sep 3, 2024
    Description

    The Diversity in Conversational AI Evaluation for Safety (DICES) dataset

    Machine learning approaches are often trained and evaluated with datasets that require a clear separation between positive and negative examples. This approach overly simplifies the natural subjectivity present in many tasks and content items. It also obscures the inherent diversity in human perceptions and opinions. Often tasks that attempt to preserve the variance in content and diversity in humans are quite expensive and laborious. To fill in this gap and facilitate more in-depth model performance analyses we propose the DICES dataset - a unique dataset with diverse perspectives on safety of AI generated conversations. We focus on the task of safety evaluation of conversational AI systems. The DICES dataset contains detailed demographics information about each rater, extremely high replication of unique ratings per conversation to ensure statistical significance of further analyses and encodes rater votes as distributions across different demographics to allow for in-depth explorations of different rating aggregation strategies.

    This dataset is well suited to observe and measure variance, ambiguity and diversity in the context of safety of conversational AI. The dataset is accompanied by a paper describing a set of metrics that show how rater diversity influences the safety perception of raters from different geographic regions, ethnicity groups, age groups and genders. The goal of the DICES dataset is to be used as a shared benchmark for safety evaluation of conversational AI systems.

    CONTENT WARNING: This dataset contains adversarial examples of conversations that may be offensive.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('dices', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
Organization logo

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD

Explore at:
21 scholarly articles cite this dataset (View in Google Scholar)
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
India
Description

Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Search
Clear search
Close search
Google apps
Main menu