100+ datasets found
  1. s

    Python Import Data India – Buyers & Importers List

    • seair.co.in
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Seair Info Solutions PVT LTD
    Authors
    Seair Exim
    Area covered
    India
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  2. MNIST - HDF5

    • kaggle.com
    zip
    Updated Feb 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benedict Wilkins (2020). MNIST - HDF5 [Dataset]. https://www.kaggle.com/benedictwilkinsai/mnist-hd5f
    Explore at:
    zip(12272528 bytes)Available download formats
    Dataset updated
    Feb 28, 2020
    Authors
    Benedict Wilkins
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The MNIST dataset in HDF5 format.

    Data can be loaded with the h5py package: pip install h5py, see demo

  3. IoT Sports Training Load Dataset

    • kaggle.com
    zip
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python Developer (2025). IoT Sports Training Load Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/iot-sports-training-load-dataset
    Explore at:
    zip(226978 bytes)Available download formats
    Dataset updated
    Oct 9, 2025
    Authors
    Python Developer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 2300 multimodal IoT sensor recordings collected from athletes during traditional sports training sessions, including basketball, soccer, running, and other athletic activities. The dataset includes heart rate, acceleration (X, Y, Z), gyroscope readings (X, Y, Z), speed, step count, jump height, and training load. It is designed to facilitate analysis of athlete performance, training load monitoring, and predictive modeling for sports science applications.

  4. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Oct 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 10/29/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the
    `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and
    supporting analyses. This script generates the key figures and summary statistics
    used in the study that then get saved in the manuscript_figures folder. Note that all
    maps were produced using Python code found in the "supporting_code"" folder.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  5. h

    Turkish_Basketball_Super_League_Dataset

    • huggingface.co
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammed Onur ulu (2025). Turkish_Basketball_Super_League_Dataset [Dataset]. https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset
    Explore at:
    Dataset updated
    Aug 10, 2025
    Authors
    Muhammed Onur ulu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    📥 Load Dataset in Python

    To load this dataset in Google Colab or any Python environment:
    !pip install huggingface_hub pandas openpyxl

    from huggingface_hub import hf_hub_download import pandas as pd

    repo_id = "onurulu17/Turkish_Basketball_Super_League_Dataset"

    files = [ "leaderboard.xlsx", "player_data.xlsx", "team_data.xlsx", "team_matches.xlsx", "player_statistics.xlsx", "technic_roster.xlsx" ]

    datasets = {}

    for f in files: path =… See the full description on the dataset page: https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset.

  6. h

    Python-DPO

    • huggingface.co
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NextWealth Entrepreneurs Private Limited (2024). Python-DPO [Dataset]. https://huggingface.co/datasets/NextWealth/Python-DPO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2024
    Dataset authored and provided by
    NextWealth Entrepreneurs Private Limited
    Description

    Dataset Card for Python-DPO

    This dataset is the smaller version of Python-DPO-Large dataset and has been created using Argilla.

      Load with datasets
    

    To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset

    ds = load_dataset("NextWealth/Python-DPO")

      Data Fields
    

    Each data instance contains:

    instruction: The problem description/requirements… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO.

  7. Linux Terminal Commands Dataset

    • kaggle.com
    zip
    Updated May 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SUNNY THAKUR (2025). Linux Terminal Commands Dataset [Dataset]. https://www.kaggle.com/datasets/cyberprince/linux-terminal-commands-dataset
    Explore at:
    zip(32599 bytes)Available download formats
    Dataset updated
    May 21, 2025
    Authors
    SUNNY THAKUR
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Linux Terminal Commands Dataset Overview The Linux Terminal Commands Dataset is a comprehensive collection of 600 unique Linux terminal commands (cmd-001 to cmd-600), curated for cybersecurity professionals, system administrators, data scientists, and machine learning engineers. This dataset is designed to support advanced use cases such as penetration testing, system administration, forensic analysis, and training machine learning models for command-line automation and anomaly detection. The commands span 10 categories: Navigation, File Management, Viewing, System Info, Permissions, Package Management, Networking, User Management, Process, and Editor. Each entry includes a command, its category, a description, an example output, and a reference to the relevant manual page, ensuring usability for both human users and automated systems. Key Features

    Uniqueness: 600 distinct commands with no overlap, covering basic to unconventional tools. Sophistication: Includes advanced commands for SELinux, eBPF tracing, network forensics, and filesystem debugging. Unconventional Tools: Features obscure utilities like bpftrace, tcpflow, zstd, and aa-status for red teaming and system tinkering. ML-Ready: Structured in JSON Lines (.jsonl) format for easy parsing and integration into machine learning pipelines. Professional Focus: Tailored for cybersecurity (e.g., auditing, hardening), system administration (e.g., performance tuning), and data science (e.g., log analysis).

    Dataset Structure The dataset is stored in a JSON Lines file (linux_terminal_commands_dataset.jsonl), where each line represents a single command with the following fields:

    Field Description

    id Unique identifier (e.g., cmd-001 to cmd-600).

    command The Linux terminal command (e.g., setfacl -m u:user:rw file.txt).

    category One of 10 categories (e.g., Permissions, Networking).

    description A concise explanation of the command's purpose and functionality.

    example_output Sample output or expected behavior (e.g., [No output if successful]).

    man_reference URL to the official manual page (e.g., https://man7.org/linux/man-pages/...).

    Category Distribution

    Category Count

    Navigation 11

    File Management 56

    Viewing 35

    System Info 51

    Permissions 28

    Package Management 12

    Networking 56

    User Management 19

    Process 42

    Editor 10

    Usage Prerequisites

    Python 3.6+: For parsing and analyzing the dataset. Linux Environment: Most commands require a Linux system (e.g., Ubuntu, CentOS, Fedora) for execution. Optional Tools: Install tools like pandas for data analysis or jq for JSON processing.

    Loading the Dataset ```python Use Python to load and explore the dataset: import json import pandas as pd

    Load dataset

    dataset = [] with open("linux_terminal_commands_dataset.jsonl", "r") as file: for line in file: dataset.append(json.loads(line))

    Convert to DataFrame

    df = pd.DataFrame(dataset)

    Example: View category distribution

    print(df.groupby("category").size())

    Example: Filter Networking commands

    networking_cmds = df[df["category"] == "Networking"] print(networking_cmds[["id", "command", "description"]]) ```

    Example Applications

    Cybersecurity: Use bpftrace or tcpdump commands for real-time system and network monitoring. Audit permissions with setfacl, chcon, or aa-status for system hardening.

    System Administration: Monitor performance with slabtop, pidstat, or systemd-analyze. Manage filesystems with btrfs, xfs_repair, or cryptsetup.

    Machine Learning: Train NLP models to predict command categories or generate command sequences. Use example outputs for anomaly detection in system logs.

    Pentesting: Leverage nping, tcpflow, or ngrep for network reconnaissance. Explore find / -perm /u+s to identify potential privilege escalation vectors.

    Executing Commands Warning: Some commands (e.g., mkfs.btrfs, fuser -k, cryptsetup) can modify or destroy data. Always test in a sandboxed environment. To execute a command:

    Example: List SELinux file contexts

    semanage fcontext -l

    Installation

    Clone the repository:git clone https://github.com/sunnythakur25/linux-terminal-commands-dataset.git cd linux-terminal-commands-dataset

    Ensure the dataset file (linux_terminal_commands_dataset.jsonl) is in the project directory. Install dependencies for analysis (optional):pip install pandas

    Contribution Guidelines We welcome contributions to expand the dataset or improve its documentation. To contribute:

    Fork the Repository: Create a fork on GitHub. Add Commands: Ensure new commands are unique, unconventional, and include all required fields (id, command, category, etc.). Test Commands: Verify commands work on a Linux system and provide accurate example outputs. Submit a Pull Request: Include a clear description of your changes and their purpose. Follow Standards: Use JSON Lines format. Reference man7.org for manual pages. Categorize c...

  8. Data from: Duck Hunt

    • kaggle.com
    zip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt
    Explore at:
    zip(7379197 bytes)Available download formats
    Dataset updated
    Jul 26, 2025
    Authors
    Hugo Zanini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Duck Hunt Object Detection Dataset

    This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.

    Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

    🎯 Dataset Statistics

    MetricValue
    Total Images1,004
    Dataset Size12 MB
    Image FormatPNG
    Annotation FormatYOLO (.txt)
    Classes4
    Train/Val Split711/260 (73%/27%)

    Class Distribution

    Class IDClass NameCountDescription
    0dog252The hunting dog in various poses (jumping, laughing, sniffing, etc.)
    1duck_dead256Dead ducks (both black and red variants)
    2duck_shot248Ducks in the moment of being shot
    3duck_flying248Flying ducks in all directions (left, right, diagonal)

    📁 Dataset Structure

    yolo_dataset_augmented/
    ├── images/
    │  ├── train/      # 711 training images
    │  └── val/       # 260 validation images
    ├── labels/
    │  ├── train/      # 711 YOLO annotation files
    │  └── val/       # 260 YOLO annotation files
    ├── classes.txt     # Class names mapping
    ├── dataset.yaml     # YOLO configuration file
    └── augmented_dataset_stats.json # Detailed statistics
    

    🔧 Data Augmentation Details

    The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

    Augmentation Techniques Applied:

    • Geometric Transformations: Rotation (±15°), horizontal/vertical flipping, scaling (0.8-1.2x), translation
    • Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
    • Quality Variations: Gaussian noise, slight blur for robustness
    • Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

    Augmentation Parameters:

    {
      'rotation_range': (-15, 15),    # Small rotations for game sprites
      'brightness_range': (0.7, 1.3),  # Brightness variations
      'contrast_range': (0.8, 1.2),   # Contrast adjustments
      'saturation_range': (0.8, 1.2),  # Color saturation
      'noise_intensity': 0.02,      # Gaussian noise
      'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
      'scaling_range': (0.8, 1.2),    # Scale variations
    }
    

    🚀 Usage Examples

    Loading with YOLOv8 (Ultralytics)

    from ultralytics import YOLO
    
    # Load and train
    model = YOLO('yolov8n.pt') # Load pretrained model
    results = model.train(data='dataset.yaml', epochs=100, imgsz=640)
    
    # Validate
    metrics = model.val()
    
    # Predict
    results = model('path/to/test/image.png')
    

    Loading with PyTorch

    import torch
    from torch.utils.data import Dataset, DataLoader
    from PIL import Image
    import os
    
    class DuckHuntDataset(Dataset):
      def _init_(self, images_dir, labels_dir, transform=None):
        self.images_dir = images_dir
        self.labels_dir = labels_dir
        self.transform = transform
        self.images = os.listdir(images_dir)
      
      def _len_(self):
        return len(self.images)
      
      def _getitem_(self, idx):
        img_path = os.path.join(self.images_dir, self.images[idx])
        label_path = os.path.join(self.labels_dir, 
                     self.images[idx].replace('.png', '.txt'))
        
        image = Image.open(img_path)
        # Load YOLO annotations
        with open(label_path, 'r') as f:
          labels = f.readlines()
        
        if self.transform:
          image = self.transform(image)
          
        return image, labels
    
    # Usage
    dataset = DuckHuntDataset('images/train', 'labels/train')
    dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    

    YOLO Annotation Format

    Each .txt file contains one line per object: class_id center_x center_y width height

    Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

    📊 Technical Specifications

    • Image Dimensions: Variable (original sprite sizes preserved)
    • Color Channels: RGB (3 channels)
    • Annotation Precision: Float32 (normalized coordinates)
    • File Naming: Descriptive names indicating class and augmentation type
    • Quality: High-resolution pixel art sprites

    🎮 Dataset Context

    This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

    • The Dog: Your hunting companion who retrieves ducks and ...
  9. a

    Caltech-101

    • datasets.activeloop.ai
    • huggingface.co
    deeplake
    Updated Feb 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caltech (2022). Caltech-101 [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/caltech-101-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Feb 3, 2022
    Dataset authored and provided by
    Caltech
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Dataset funded by
    National Science Foundation
    Description

    The Caltech-101 dataset is a dataset of 101 categories of objects, each with 30 to 800 images. The images are all 32 x 32 pixels in size and are in grayscale. The dataset is used to train and evaluate machine learning models for the task of object recognition.

  10. T

    spoc_robot

    • tensorflow.org
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). spoc_robot [Dataset]. https://www.tensorflow.org/datasets/catalog/spoc_robot
    Explore at:
    Dataset updated
    Dec 11, 2024
    Description

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('spoc_robot', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  11. Z

    Auditory cortex single unit population activity during natural sound...

    • data.niaid.nih.gov
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pennington, Jacob; David, Stephen (2023). Auditory cortex single unit population activity during natural sound presentation -- dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7796573
    Explore at:
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Oregon Health & Science University
    Washington State University, Vancouver
    Authors
    Pennington, Jacob; David, Stephen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    High-density multi-channel neurophysiology data were collected from primary (A1) and secondary (PEG) fields of auditory cortex of passively listening ferrets during presentation of a large natural sound library. Single unit spikes were sorted using Kilosort. This dataset includes spike times for 849 A1 units and 398 PEG units. Stimulus waveforms were transformed to log-spaced spectrograms for analysis (18 channels, 10 ms time bins). Data set includes raw sound waveforms as well.

    The authors request that any publication using this data cite the following work: https://www.biorxiv.org/content/10.1101/2022.06.10.495698v2

    Data format/description

    Neural data are stored in two files. All recordings were performed during presentation of the same natural sound library.

    recordings/A1_NAT4_ozgf.fs100.ch18.tgz - data from 849 A1 single units and log spectrogram of stimuli aligned with spike times.

    recordings/PEG_NAT4_ozgf.fs100.ch18.tgz - data from 398 PEG single units and log spectrogram of stimuli aligned with spike times.

    wav.zip - raw wav files. Note: Only first 1-sec of each wav file was presented during experiments. Recordings have longer duration

    Example scripts

    Python scripts included with this dataset demonstrate how to load the neural data and perform a CNN model fit. Running the scripts requires the NEMS0 python library, which is available open source at https://github.com/lbhb/NEMS0.

    Quick install

    Create and activate a new conda environment:

    conda create -n NEMS0 python=3.7 conda activate NEMS0

    Download NEMS0:

    git clone https://github.com/lbhb/NEMS0

    Install NEMS0:

    pip install -e NEMS0

    Detailed instructions for installing NEMS0 are available in the Github repository (https://github.com/lbhb/NEMS0).

    Demo scripts

    Once NEMS0 is installed and the data are downloaded, move to the directory where the data and demo scripts are stored and run them in a NEMS0 environment.

    pop_cnn_load.py - Load the A1 data and compare predictions for two neurons (Fig 3) by two population models (stage 1 fit complete). Illustrates how to load the data using Python.

    pop_cnn_fit.py - Load a pre-fit A1 population model (stage 1) and complete stage 2 fit (refinement) for a single neuron. Illustrates use of NEMS0 for CNN model fitting.

    Funding

    Data collection, software development and processing were supported by funding from the NIH (R01DC014950, R01EB028155).

  12. f

    Open data: Frequency mismatch negativity and visual load

    • su.figshare.com
    • researchdata.se
    • +1more
    pdf
    Updated Feb 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Wiens; Erik van Berlekom; Malina Szychowska; Rasmus Eklund (2021). Open data: Frequency mismatch negativity and visual load [Dataset]. http://doi.org/10.17045/sthlmuni.7016369.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 23, 2021
    Dataset provided by
    Stockholm University
    Authors
    Stefan Wiens; Erik van Berlekom; Malina Szychowska; Rasmus Eklund
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Wiens, S., van Berlekom, E., Szychowska, M., & Eklund, R. (2019). Visual Perceptual Load Does Not Affect the Frequency Mismatch Negativity. Frontiers in Psychology, 10(1970). doi:10.3389/fpsyg.2019.01970We manipulated visual perceptual load (high and low load) while we recorded electroencephalography. Event-related potentials (ERPs) were computed from these data.OSF_*.pdf contains the preregistration at open science framework (osf).https://doi.org/10.17605/OSF.IO/EWG9XERP_2019_rawdata_bdf.zip contains the raw eeg data files that were recorded with a biosemi system (www.biosemi.com). The files can be opened in matlab with the fieldtrip toolbox. https://www.mathworks.com/products/matlab.htmlhttp://www.fieldtriptoolbox.org/ERP_2019_visual_load_fieldtrip_scripts.zip contains all the matlab scripts that were used to process the ERP data with the toolbox fieldtrip. http://www.fieldtriptoolbox.org/ERP_2019_fieldtrip_mat_*.zip contain the final, preprocessed individual data files. They can be opened with matlab.ERP_2019_visual_load_python_scripts.zip contains the python scripts for the main task. They need python (https://www.python.org/) and psychopy (http://www.psychopy.org/)ERP_2019_visual_load_wmc_R_scripts.zip contains the R scripts to process the working memory capacity (wmc) data. https://www.r-project.org/.ERP_2019_visual_load_R_scripts.zip contains the R scripts to analyze the data and the output files with figures (eg scatterplots). https://www.r-project.org/.

  13. Z

    3D skeletons UP-Fall Dataset

    • data.niaid.nih.gov
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KOFFI, Tresor (2024). 3D skeletons UP-Fall Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12773012
    Explore at:
    Dataset updated
    Jul 20, 2024
    Dataset provided by
    CESI LINEACT
    Authors
    KOFFI, Tresor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    3D skeletons UP-Fall Dataset

                          Different between Fall and Impact detection 
    

    Overview

    This dataset aims to facilitate research in fall detection, particularly focusing on the precise detection of impact moments within fall events. The 3D skeletons data accuracy and comprehensiveness make it a valuable resource for developing and benchmarking fall detection algorithms. The dataset contains 3D skeletal data extracted from fall events and daily activities of 5 subjects performing fall scenarios

    Data Collection

    The skeletal data was extracted using a pose estimation algorithm, which processes images frames to determine the 3D coordinates of each joint. Sequences with less than 100 frames of extracted data were excluded to ensure the quality and reliability of the dataset. As a result, some subjects may have fewer CSV files.

    CSV Structure

    The data is organized by subjects, and each subject contains CSV files named according to the pattern C1S1A1T1, where:

    C: Camera (1 or 2)

    S: Subject (1 to 5)

    A: Activity (1 to N, representing different activities)

    T: Trial (1 to 3)

    subject1/`: Contains CSV files for Subject 1.

    C1S1A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 1

    C1S1A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 1

    C1S1A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 1

    C2S1A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 1

    C2S1A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 1

    C2S1A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 1

    subject2/`: Contains CSV files for Subject 2.

    C1S2A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 2

    C1S2A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 2

    C1S2A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 2

    C2S2A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 2

    C2S2A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 2

    C2S2A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 2

    subject3/, subject4/, subject5/: Similar structure as above, but may contain fewer CSV files due to the data extraction criteria mentioned above.

    Column Descriptions

    Each CSV file contains the following columns representing different skeletal joints and their respective coordinates in 3D space:

    Column Name

    Description

    joint_1_x

    X coordinate of joint 1

    joint_1_y

    Y coordinate of joint 1

    joint_1_z

    Z coordinate of joint 1

    joint_2_x

    X coordinate of joint 2

    joint_2_y

    Y coordinate of joint 2

    joint_2_z

    Z coordinate of joint 2

    ...

    ...

    joint_n_x

    X coordinate of joint n

    joint_n_y

    Y coordinate of joint n

    joint_n_z

    Z coordinate of joint n

    LABEL

    Label indicating impact (1) or non-impact (0)

    Example

    Here is an example of what a row in one of the CSV files might look like:

    joint_1_x

    joint_1_y

    joint_1_z

    joint_2_x

    joint_2_y

    joint_2_z

    ...

    joint_n_x

    joint_n_y

    joint_n_33

    LABEL

    0.123

    0.456

    0.789

    0.234

    0.567

    0.890

    ...

    0.345

    0.678

    0.901

    0

    Usage

    This data can be used for developing and benchmarking impact fall detection algorithms. It provides detailed information on human posture and movement during falls, making it suitable for machine learning and deep learning applications in impact fall detection and prevention.

    Using github

    1. Clone the repository:

      -bash git clone

    https://github.com/Tresor-Koffi/3D_skeletons-UP-Fall-Dataset

    1. Navigate to the directory:

      -bash -cd 3D_skeletons-UP-Fall-Dataset

    Examples

    Here's a simple example of how to load and inspect a sample data file using Python:```pythonimport pandas as pd

    Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

    data = pd.read_csv('subject1/C1S1A1T1.csv')print(data.head())

  14. Z

    Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Garske, Samuel; Mao, Yiwei (2024). Three Annotated Anomaly Detection Datasets for Line-Scan Algorithms [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13370799
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    University of Sydney
    Authors
    Garske, Samuel; Mao, Yiwei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset contains two hyperspectral and one multispectral anomaly detection images, and their corresponding binary pixel masks. They were initially used for real-time anomaly detection in line-scanning, but they can be used for any anomaly detection task.

    They are in .npy file format (will add tiff or geotiff variants in the future), with the image datasets being in the order of (height, width, channels). The SNP dataset was collected using sentinelhub, and the Synthetic dataset was collected from AVIRIS. The Python code used to analyse these datasets can be found at: https://github.com/WiseGamgee/HyperAD

    How to Get Started

    All that is needed to load these datasets is Python (preferably 3.8+) and the NumPy package. Example code for loading the Beach Dataset if you put it in a folder called "data" with the python script is:

    import numpy as np

    Load image file

    hsi_array = np.load("data/beach_hsi.npy") n_pixels, n_lines, n_bands = hsi_array.shape print(f"This dataset has {n_pixels} pixels, {n_lines} lines, and {n_bands}.")

    Load image mask

    mask_array = np.load("data/beach_mask.npy") m_pixels, m_lines = mask_array.shape print(f"The corresponding anomaly mask is {m_pixels} pixels by {m_lines} lines.")

    Citing the Datasets

    If you use any of these datasets, please cite the following paper:

    @article{garske2024erx, title={ERX - a Fast Real-Time Anomaly Detection Algorithm for Hyperspectral Line-Scanning}, author={Garske, Samuel and Evans, Bradley and Artlett, Christopher and Wong, KC}, journal={arXiv preprint arXiv:2408.14947}, year={2024},}

    If you use the beach dataset please cite the following paper as well (original source):

    @article{mao2022openhsi, title={OpenHSI: A complete open-source hyperspectral imaging solution for everyone}, author={Mao, Yiwei and Betters, Christopher H and Evans, Bradley and Artlett, Christopher P and Leon-Saval, Sergio G and Garske, Samuel and Cairns, Iver H and Cocks, Terry and Winter, Robert and Dell, Timothy}, journal={Remote Sensing}, volume={14}, number={9}, pages={2244}, year={2022}, publisher={MDPI} }

  15. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    bz2
    Updated Mar 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.11
    Python 3.7.2
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-03-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.7:

    conda create -n analyses python=3.7
    conda activate analyses

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • Index.ipynb
    • N0.Repository.ipynb
    • N1.Skip.Notebook.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.Repository.With.Notebook.Restriction.ipynb
    • N12.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    <code

  16. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  17. T

    booksum

    • tensorflow.org
    • opendatalab.com
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). booksum [Dataset]. https://www.tensorflow.org/datasets/catalog/booksum
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    BookSum: A Collection of Datasets for Long-form Narrative Summarization

    This implementation currently only supports book and chapter summaries.

    GitHub: https://github.com/salesforce/booksum

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('booksum', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  18. original : CIFAR 100

    • kaggle.com
    zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shashwat Pandey (2024). original : CIFAR 100 [Dataset]. https://www.kaggle.com/datasets/shashwat90/original-cifar-100
    Explore at:
    zip(168517945 bytes)Available download formats
    Dataset updated
    Dec 28, 2024
    Authors
    Shashwat Pandey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)

    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

    The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

    The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

    Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.

    Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.

    Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.

    The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary: python def unpickle(file): import cPickle with open(file, 'rb') as fo: dict = cPickle.load(fo) return dict And a python3 version: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict Loaded in this way, each of the batch files contains a dictionary with the following elements: data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

    The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.

    Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.

    There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.

    The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...

  19. T

    tidybot

    • tensorflow.org
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). tidybot [Dataset]. https://www.tensorflow.org/datasets/catalog/tidybot
    Explore at:
    Dataset updated
    Dec 11, 2024
    Description

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('tidybot', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  20. Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, zip
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Dec 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

    Background

    This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

    The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

    Usage

    • The data is licensed through the Creative Commons Attribution 4.0 International.
    • If you have used our data and are publishing your work, we ask that you please reference both:
      1. this database through its DOI, and
      2. any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

    Included Files

    • Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.
    • Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.
    • Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data
      • Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.
      • We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Clean_Data_v1-0-0.zip: contains all the downsampled data
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Database_References_v1-0-0.bib
      • Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

    File Format: Downsampled Data

    These are the "LP_

    • The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data
    • Time[s]: time in seconds since the start of the test
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: the surface temperature in degC

    These data files can be easily loaded using the pandas library in Python through:

    import pandas
    data = pandas.read_csv(data_file, index_col=0)

    The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

    File Format: Unreduced Data

    These are the "LP_

    • The first column is the index of each data point
    • S/No: sample number recorded by the DAQ
    • System Date: Date and time of sample
    • Time[s]: time in seconds since the start of the test
    • C_1_Force[kN]: load cell force
    • C_1_Déform1[mm]: extensometer displacement
    • C_1_Déplacement[mm]: cross-head displacement
    • Eng_Stress[MPa]: engineering stress
    • Eng_Strain[]: engineering strain
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: specimen surface temperature in degC

    The data can be loaded and used similarly to the downsampled data.

    File Format: Overall_Summary

    The overall summary file provides data on all the test specimens in the database. The columns include:

    • hidden_index: internal reference ID
    • grade: material grade
    • spec: specifications for the material
    • source: base material for the test specimen
    • id: internal name for the specimen
    • lp: load protocol
    • size: type of specimen (M8, M12, M20)
    • gage_length_mm_: unreduced section length in mm
    • avg_reduced_dia_mm_: average measured diameter for the reduced section in mm
    • avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm
    • avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm
    • fy_n_mpa_: nominal yield stress
    • fu_n_mpa_: nominal ultimate stress
    • t_a_deg_c_: ambient temperature in degC
    • date: date of test
    • investigator: person(s) who conducted the test
    • location: laboratory where test was conducted
    • machine: setup used to conduct test
    • pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control
    • pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control
    • pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control
    • citekey: reference corresponding to the Database_References.bib file
    • yield_stress_mpa_: computed yield stress in MPa
    • elastic_modulus_mpa_: computed elastic modulus in MPa
    • fracture_strain: computed average true strain across the fracture surface
    • c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass
    • file: file name of corresponding clean (downsampled) stress-strain data

    File Format: Summarized_Mechanical_Props_Campaign

    Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

    tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
              index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
              keep_default_na=False, na_values='')
    • citekey: reference in "Campaign_References.bib".
    • Grade: material grade.
    • Spec.: specifications (e.g., J2+N).
    • Yield Stress [MPa]: initial yield stress in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign
    • Elastic Modulus [MPa]: initial elastic modulus in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Caveats

    • The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:
      • A500
      • A992_Gr50
      • BCP325
      • BCR295
      • HYP400
      • S460NL
      • S690QL/25mm
      • S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD

Explore at:
27 scholarly articles cite this dataset (View in Google Scholar)
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Seair Info Solutions PVT LTD
Authors
Seair Exim
Area covered
India
Description

Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Search
Clear search
Close search
Google apps
Main menu