100+ datasets found
  1. h

    python-github-code

    • huggingface.co
    Updated Mar 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelica Chen (2023). python-github-code [Dataset]. https://huggingface.co/datasets/angie-chen55/python-github-code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Authors
    Angelica Chen
    Description

    angie-chen55/python-github-code dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. b

    CPRD codes: ICD-10 equivalent code lists for dementia subtypes - Datasets -...

    • data.bris.ac.uk
    Updated Dec 11, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). CPRD codes: ICD-10 equivalent code lists for dementia subtypes - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2h4rmk9v7pw2k23h7vgf9tx1ea
    Explore at:
    Dataset updated
    Dec 11, 2017
    Description

    This dataset contains the ICD-10 code lists used to test the sensitivity and specificity of the Clinical Practice Research Datalink (CPRD) medical code lists for dementia subtypes. The provided code lists are used to define dementia subtypes in linked data from the Hospital Episode Statistic (HES) inpatient dataset and the Office of National Statistics (ONS) death registry, which are then used as the 'gold standard' for comparison against dementia subtypes defined using the CPRD medical code lists. The CPRD medical code lists used in this comparison are available here: Venexia Walker, Neil Davies, Patrick Kehoe, Richard Martin (2017): CPRD codes: neurodegenerative diseases and commonly prescribed drugs. https://doi.org/10.5523/bris.1plm8il42rmlo2a2fqwslwckm2 Complete download (zip, 3.9 KiB)

  3. h

    instructional_code-search-net-java

    • huggingface.co
    Updated May 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Tarin Morales (2023). instructional_code-search-net-java [Dataset]. https://huggingface.co/datasets/Nan-Do/instructional_code-search-net-java
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2023
    Authors
    Fernando Tarin Morales
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "instructional_code-search-net-java"

      Dataset Summary
    

    This is an instructional dataset for Java. The dataset contains two different kind of tasks:

    Given a piece of code generate a description of what it does. Given a description generate a piece of code that fulfils the description.

      Languages
    

    The dataset is in English.

      Data Splits
    

    There are no splits.

      Dataset Creation
    

    May of 2023

      Curation Rationale
    

    This dataset… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/instructional_code-search-net-java.

  4. ARC Code TI: QuIP

    • catalog.data.gov
    • datasets.ai
    • +4more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ames Research Center (2025). ARC Code TI: QuIP [Dataset]. https://catalog.data.gov/dataset/arc-code-ti-quip
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Ames Research Centerhttps://nasa.gov/ames/
    Description

    QuIP (QUick Image Processing) is an interpreter for image processing, graphics, psychophysical experimentation and general scientific computing.

  5. VegeNet - Image datasets and Codes

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jo Yen Tan; Jo Yen Tan (2022). VegeNet - Image datasets and Codes [Dataset]. http://doi.org/10.5281/zenodo.7254508
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jo Yen Tan; Jo Yen Tan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Compilation of python codes for data preprocessing and VegeNet building, as well as image datasets (zip files).

    Image datasets:

    1. vege_original : Images of vegetables captured manually in data acquisition stage
    2. vege_cropped_renamed : Images in (1) cropped to remove background areas and image labels renamed
    3. non-vege images : Images of non-vegetable foods for CNN network to recognize other-than-vegetable foods
    4. food_image_dataset : Complete set of vege (2) and non-vege (3) images for architecture building.
    5. food_image_dataset_split : Image dataset (4) split into train and test sets
    6. process : Images created when cropping (pre-processing step) to create dataset (2).
  6. c

    Form-Based Code Pilot Neighborhoods

    • data.clevelandohio.gov
    • hub.arcgis.com
    • +1more
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cleveland | GIS (2024). Form-Based Code Pilot Neighborhoods [Dataset]. https://data.clevelandohio.gov/datasets/form-based-code-pilot-neighborhoods
    Explore at:
    Dataset updated
    Jul 1, 2024
    Dataset authored and provided by
    Cleveland | GIS
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    Description

    DescriptionThis dataset outlines specific zones or regions designated under a Form-Based Code (FBC) framework. Unlike traditional zoning, form-based codes emphasize the physical form of buildings and public spaces over land use. These zones guide community design through parameters such as building height, setbacks, and architectural styles. The dataset provides a spatial reference for planning, zoning, and development decisions aligned with form-based design principles. The data was created by digitizing PDFs of approved Form-Based Code plans, accessible via links listed in the Ordinance Link column of the dataset.

    Applications Featuring This DatasetForm-Based Code Explorer
    

    Data GlossarySee the Attributes section below for details about each column in this dataset.

    Update FrequencyWhen FBC neighborhood regions change.
    
    
    ContactCity Planning Commission – Zoning and Technology
    
  7. z

    India Export Data of HS Code 87081090 | Singapore | ZETTALIX.COM

    • zettalix.com
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zettalix (2024). India Export Data of HS Code 87081090 | Singapore | ZETTALIX.COM [Dataset]. https://www.zettalix.com/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset authored and provided by
    Zettalix
    Area covered
    India, Singapore
    Description

    Subscribers can access export and import data for 80 countries using HS codes or product names-ideal for informed market analysis.

  8. Modeling Attack Resistant Strong PUF Exploiting Stagewise Obfuscated...

    • zenodo.org
    bin, zip
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chongyao Xu; Chongyao Xu; Man-kay Law; Man-kay Law (2024). Modeling Attack Resistant Strong PUF Exploiting Stagewise Obfuscated Interconnections With Improved Reliability [DATASET and Source Code] [Dataset]. http://doi.org/10.5281/zenodo.7995053
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chongyao Xu; Chongyao Xu; Man-kay Law; Man-kay Law
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thanks for your interest in our work!

    In order to facilitate your assessment and replication, we provides the dataset and source codes (verilog/python model/matlab) of our work (OIPUF) here.

    By the way, our latest work (SOI PUF and cSOI PUF) published in IEEE TIFS (2024) is based on OIPUF.

    If you have any questions, please feel free to contact with us: chongyaoxu@126.com / mklaw@um.edu.mo

    Full text about OIPUF can be downloaded from https://ieeexplore.ieee.org/document/10103139

    Full text about SOI PUF and cSOI PUF can be downloaded from https://ieeexplore.ieee.org/document/10458688

    Source code and FPGA project of SOI PUF and cSOI PUF can be download from https://github.com/yg99992/SOI_PUF.

    Matlab code

    matlab/Generate_OI_block.m
    This is a matlab manuscript used for generating the verilog code of random OI block.

    matlab/OIPUF_64x4_placement.m
    This is a matlab function used for generating XDC file for constraining the placement of (64,4)-OI block

    matlab/OIPUF_64x8_placement.m
    This is a matlab function used for generating XDC file for constraining the placement of (64,8)-OI block

    matlab/OIPUF_placement_example.m
    An example manuscript used for demonstrating the usage of OIPUF_64x4_placement.m and OIPUF_64x8_placement.m

    Python code

    python/puf_models.py
    The python models of XOR PUFs and OIPUFs, which can be used to generate CRPs.

    for example:

    from puf_models import oi_puf
    
    # generate a (64,4)-OIPUF and further use the generated OIPUF to generate 1M CRPs
    crps, puf_instance = oi_puf.gen_CRPs_PUF(64, 4, 1_000_000) 
    

    python/attack_pypuf.py
    A manuscript used to conduct to ANN attack on XOR PUF and OIPUF ('pypuf' package should be installed correctly).

    Verilog code

    verilog/OIPUF_64_4/
    All the verilog files of (64, 4)-OIPUF

    verilog/OIPUF_64_8/
    All the verilog files of (64, 8)-OIPUF

    CRP datasets extracted from FPGA

    It consists of 13 CRP files (All the CRPs are extracted from FPGA):

    FPGA_CRPs/FPGA3_CHAL_100M.csv
    The 100 million 64-bit challenges

    FPGA_CRPs/FPGA3_k4_PUF0.csv
    The 100 million 1-bit responses extracted from (64,4)-OIPUF0

    FPGA_CRPs/FPGA3_k4_PUF1.csv
    The 100 million 1-bit responses extracted from (64,4)-OIPUF1

    FPGA_CRPs/FPGA3_k4_PUF2.csv
    The 100 million 1-bit responses extracted from (64,4)-OIPUF2

    FPGA_CRPs/FPGA3_k4_PUF3.csv
    The 100 million 1-bit responses extracted from (64,4)-OIPUF3

    FPGA_CRPs/FPGA3_k4_PUF4.csv
    The 100 million 1-bit responses extracted from (64,4)-OIPUF4

    FPGA_CRPs/FPGA3_k4_PUF5.csv
    The 100 million 1-bit responses extracted from (64,4)-OIPUF5

    FPGA_CRPs/FPGA3_k8_PUF0.csv
    The 100 million 1-bit responses extracted from (64,8)-OIPUF0

    FPGA_CRPs/FPGA3_k8_PUF1.csv
    The 100 million 1-bit responses extracted from (64,8)-OIPUF1

    FPGA_CRPs/FPGA3_k8_PUF2.csv
    The 100 million 1-bit responses extracted from (64,8)-OIPUF2

    FPGA_CRPs/FPGA3_k8_PUF3.csv
    The 100 million 1-bit responses extracted from (64,8)-OIPUF3

    FPGA_CRPs/FPGA3_k8_PUF4.csv
    The 100 million 1-bit responses extracted from (64,8)-OIPUF4

    FPGA_CRPs/FPGA3_k8_PUF5.csv
    The 100 million 1-bit responses extracted from (64,8)-OIPUF5

  9. Replication package for DRAGON: Robust Classification for Very Large...

    • zenodo.org
    bin, zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). Replication package for DRAGON: Robust Classification for Very Large Collections of Software Repositories [Dataset]. http://doi.org/10.5281/zenodo.15424419
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DRAGON: Multi-Label Classification

    This archive contains the replication package for the DRAGON multi-label classification models, which leverage BERT-based architectures. The package includes scripts for repository mining, dataset creation, data processing, model training, and evaluation. The two main models used are DRAGON and LEGION.

    Key Components:

    • Repository Mining: Scripts to extract repositories for dataset creation.
    • Dataset Preparation: Jupyter notebooks for cleaning and transforming data.
    • Data Processing: Conversion into a Hugging Face dataset format.
    • Model Training: Training scripts for DRAGON and LEGION, with configurable preprocessing options.
    • Evaluation: Threshold tuning and performance assessment.

    Setup

    Before running any commands, ensure you have the necessary dependencies installed. It is recommended to use a virtual environment:

    python3 -m venv venv
    source venv/bin/activate # On Windows use `venv\Scripts\activate`
    pip install -r requirements.txt
    

    Project Structure

    • repository_mining/: Contains scripts for mining the initial set of repositories.
      • repository_mining/doc/: Includes documentation with the necessary information for repository mining.
    • dataset_creation/: Contains all the notebooks to be run sequentially to prepare the dataset.
    • multilabel_class/: Contains scripts for classification, threshold tuning, and evaluation.
      • multilabel_class/model_output/: trained model organized by: first dataset, then model variantion.
    • data/: Contains the hugginface datasets ( our dataset and LEGION dataset) ready for the training/eval.

    1️⃣ Data Mining

    To mine the initial set of repositories from Software Heritage, use the scripts available in the repository_mining/ folder. Detailed information and steps for repository mining can be found in:

    repository_mining/doc/
    

    2️⃣ Dataset Creation

    After mining the repositories, prepare the dataset by running the Jupyter notebooks inside the dataset_creation/ folder in sequence. These notebooks handle data cleaning, transformation, and formatting necessary for model training. All the documentation needed is inside each notebook explaining every step.

    3️⃣ Data Processing

    Once the dataset is prepared, convert it into a Hugging Face dataset using:

    python3 multilabel_class/create_dataset.py --file_path data/02_processed_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned.csv
    

    4️⃣ Classification / Training

    Train the DRAGON Model

    After processing the dataset, train the DRAGON model with the following command:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant focal --dataset_path data/03_huggingaceV_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned/dataset
    

    Ensure Configuration is Set Correctly

    Modify the configuration file multilabel_class/utils/config.py to set the following parameter to True:

    DEFAULT_PREPROCESSING_PARAMS = { 
      'use_sentence_pairs': True # If True, process as (text1, text2); if False, concatenate texts
    }
    

    Training DRAGON Without Sentence Pairs

    To train DRAGON without using sentence pairs, use the same command but set use_sentence_pairs to False in the config file:

    DEFAULT_PREPROCESSING_PARAMS = { 
      'use_sentence_pairs': False
    }
    

    Train DRAGON on a Benchmark Dataset

    To train DRAGON on a benchmark dataset, use:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant focal --dataset_path data/03_huggingaceV_datasets/LEGION/dataset
    

    Ensure the use_sentence_pairs parameter is set to True in config.py.

    Train LEGION on the DRAGON Dataset

    To train LEGION on the DRAGON dataset, use:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant db --dataset_path data/03_huggingaceV_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned/dataset
    

    Ensure the use_sentence_pairs parameter is set to False in config.py:

    DEFAULT_PREPROCESSING_PARAMS = { 
      'use_sentence_pairs': False
    }
    

    Train LEGION on a Baseline Dataset

    To train LEGION on a baseline dataset, run:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant db --dataset_path data/03_huggingaceV_datasets/LEGION/dataset
    

    5️⃣ Model Evaluation

    Once thresholds are tuned, you can evaluate the model using:

    python3 multilabel_class/evaluation.py --model_type bert --model_variant focal --dataset_path data/03_huggingaceV_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned/dataset
    

    This evaluation script computes standard multi-label classification metrics including:

    • Micro and macro F1@1..5-score
    • Precision@1..5 and recall@1..5

    Ensure that the model variant and dataset path correspond to the previously trained model.

    Recommended: Evaluation via Notebooks

    We suggest an interactive and visual analysis of model performance, you can also use the provided Jupyter notebooks located in:

    DRAGON_replication/multilabel_class/notebooks/
    

    These notebooks reproduce the complete evaluation pipeline and generate additional visualizations and metrics discussed in the associated paper.

    Both command-line and notebook-based evaluations ensure reproducibility and offer complementary insights into model behavior.

    Instructions for Unzipping Files

    Several folders in this replication package have been compressed into .zip files to reduce package size. Before running any code, you must unzip all the provided .zip files in-place—that is, extract each archive into the same directory as the .zip file, using the same name as the zip file (without the .zip extension).

    For example:

    DRAGON_replication\data\02_processed_dataset\2024-05-22.zip
    

    should be extracted to:

    DRAGON_replication\data\02_processed_dataset\2024-05-22\
    

    List of .zip files to extract

    • DRAGON_replication\data\02_processed_dataset\2024-05-22.zip
    • DRAGON_replication\data\03_huggingaceV_datasets\2024-05-22.zip
    • DRAGON_replication\data\03_huggingaceV_datasets\LEGION.zip
    • DRAGON_replication\dataset_creation\data.zip
    • DRAGON_replication\multilabel_class\model_output\2024-05-22.zip
    • DRAGON_replication\multilabel_class\model_output\LEGION.zip

    Make sure that after extraction, each corresponding folder exists and contains the expected files. Do not change the folder names or directory structure after unzipping.

    This README provides an overview of the essential steps for repository mining, dataset preparation, processing, model training, and evaluation. For further customization, refer to the configuration files and experiment with different preprocessing settings.

  10. h

    code-readability-merged

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chair of Software Engineering II, Uni Passau (2025). code-readability-merged [Dataset]. https://huggingface.co/datasets/se2p/code-readability-merged
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2025
    Dataset authored and provided by
    Chair of Software Engineering II, Uni Passau
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Java Code Readability Merged Dataset

    This dataset contains 421 Java code snippets along with a readability score, aggregated from several scientific papers [1, 2, 3]. You can download the dataset using Hugging Face: from datasets import load_dataset ds = load_dataset("se2p/code-readability-merged")

    The snippets are not split into train and test (and validation) set. Thus, the whole dataset is in the train set: ds = ds['train'] ds_as_list = ds.to_list() # Convert the dataset to… See the full description on the dataset page: https://huggingface.co/datasets/se2p/code-readability-merged.

  11. code_x_glue_cc_code_completion_token

    • huggingface.co
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2021). code_x_glue_cc_code_completion_token [Dataset]. https://huggingface.co/datasets/google/code_x_glue_cc_code_completion_token
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2021
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/

    Description

    Dataset Card for "code_x_glue_cc_code_completion_token"

      Dataset Summary
    

    CodeXGLUE CodeCompletion-token dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/CodeCompletion-token Predict next code token given context of previous tokens. Models are evaluated by token level accuracy. Code completion is a one of the most widely used features in software development through IDEs. An effective code completion tool could improve software… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_code_completion_token.

  12. i

    Medicaid Claims by Recipient Zip Code

    • hub.mph.in.gov
    Updated Sep 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Medicaid Claims by Recipient Zip Code [Dataset]. https://hub.mph.in.gov/dataset/medicaid-claims-by-recipient-zip-code
    Explore at:
    Dataset updated
    Sep 14, 2017
    Description

    Archived as of 6/26/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to the services related to recipients enrolled in Medicaid. It contains information about the total number of recipients, total number of claims, and total dollar amount, by recipient zip code. Restricted to claims with service date between 01/2012 to 12/2017. Restricted to patients with a Medicaid claim during this period. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA.

  13. h

    code-search-net-go

    • huggingface.co
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Tarin Morales (2023). code-search-net-go [Dataset]. https://huggingface.co/datasets/Nan-Do/code-search-net-go
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2023
    Authors
    Fernando Tarin Morales
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "code-search-net-go"

      Dataset Summary
    

    This dataset is the Go portion of the CodeSarchNet annotated with a summary column.The code-search-net dataset includes open source functions that include comments found at GitHub.The summary is a short description of what the function does.

      Languages
    

    The dataset's comments are in English and the functions are coded in Go

      Data Splits
    

    Train, test, validation labels are included in the dataset as… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/code-search-net-go.

  14. d

    3.01 Property Code Enforcement (dashboard)

    • catalog.data.gov
    • data-catalog-tempegov.hub.arcgis.com
    Updated Mar 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2023). 3.01 Property Code Enforcement (dashboard) [Dataset]. https://catalog.data.gov/dataset/3-01-property-code-enforcement-dashboard-0ee61
    Explore at:
    Dataset updated
    Mar 18, 2023
    Dataset provided by
    City of Tempe
    Description

    This operations dashboard shows historic and current data related to this performance measure.The performance measure dashboard is available at 3.01 Property Code Enforcement. Data Dictionary

  15. d

    Data from: Housing Code Enforcement

    • catalog.data.gov
    • data.wu.ac.at
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2023). Housing Code Enforcement [Dataset]. https://catalog.data.gov/dataset/housing-code-enforcement-181fe
    Explore at:
    Dataset updated
    Aug 26, 2023
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    Housing code enforcement activities, including inspections and violations.

  16. d

    NODC Standard Product: NODC Taxonomic Code on CD-ROM (NCEI Accession...

    • catalog.data.gov
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). NODC Standard Product: NODC Taxonomic Code on CD-ROM (NCEI Accession 0050418) [Dataset]. https://catalog.data.gov/dataset/nodc-standard-product-nodc-taxonomic-code-on-cd-rom-ncei-accession-0050418
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    (Point of Contact)
    Description

    The content of the NODC Taxonomic Code, Version 8 CD-ROM (CD-ROM NODC-68) distributed by NODC is archived in this accession. Version 7 of the NODC Taxonomic Code (CD-ROM NODC-35), which does not include Integrated Taxonomic Information System (ITIS) Taxonomic Serial Numbers (TSNs), is also archived in this NODC accession. Prior to 1996, the NODC Taxonomic Code was the largest, most flexible, and widely used of the various coding schemes which adapted the Linnean system of biological nomenclature to modern methods of data storage and retrieval. It was based on a system of code numbers that reflected taxonomic relationships. Hundreds of historic data collections archived at NODC use the NODC Taxonomic Code to encode species identification. With the development and release of ITIS in 1996, NODC published the final version (Version 8) of the NODC Taxonomic Code on CD-ROM. This CD-ROM, provides NODC taxonomic codes along with the equivalent ITIS Taxonomic Serial Numbers to facilitate the transition to a new Integrated Taxonomic Information System (ITIS, http://www.itis.gov/). With the publication of NODC Taxonomic Code Version 8, the NODC code was frozen and discontinued. ITIS assumed responsibility for assigning new TSN codes and for verifying accepted scientific names and synonyms. More information about the Integrated Taxonomic Information System is available at http://www.itis.gov.

  17. Curated Email-Based Code Reviews Datasets

    • figshare.com
    bin
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam (2024). Curated Email-Based Code Reviews Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.24679656.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.

  18. Data and code files for co-occurrence modeling project

    • catalog.data.gov
    • datadiscoverystudio.org
    • +2more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Data and code files for co-occurrence modeling project [Dataset]. https://catalog.data.gov/dataset/data-and-code-files-for-co-occurrence-modeling-project
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Files included are original data inputs on stream fishes (fish_data_OEPA_2012.csv), water chemistry (OEPA_WATER_2012.csv), geographic data (NHD_Plus_StreamCat); modeling files for generating predictions from the original data, including the R code (MVP_R_Final.txt) and Stan code (MV_Probit_Stan_Final.txt); and the model output file containing predictions for all NHDPlus catchments in the East Fork Little Miami River watershed (MVP_EFLMR_cooc_Final). This dataset is associated with the following publication: Martin, R., E. Waits, and C. Nietch. Empirically-based modeling and mapping to consider the co-occurrence of ecological receptors and stressors. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 613(614): 1228-1239, (2018).

  19. Q

    QR Codes Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). QR Codes Market Report [Dataset]. https://www.datainsightsmarket.com/reports/qr-codes-market-20882
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The QR Code market is experiencing robust growth, projected to reach a market size of $10.5 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 16.67% from 2025 to 2033. This expansion is driven by several key factors. The increasing adoption of smartphones and mobile payment systems globally fuels the demand for QR codes in diverse applications, from marketing campaigns and contactless payments to information sharing and customer engagement initiatives. The shift towards digitalization across various industries, coupled with the convenience and cost-effectiveness of QR codes, contributes significantly to market growth. The dynamic nature of QR codes, allowing for updates and tracking of performance, adds to their appeal over static alternatives. Furthermore, the diversification of QR code formats, catering to different use cases like website links, menus, file downloads, and social media integration, expands the market's reach across various sectors. The market segmentation reveals a diverse landscape. Dynamic QR codes, offering greater flexibility and analytics capabilities, are gaining traction over their static counterparts. Among end-user applications, marketing and advertising dominate, leveraging QR codes for campaigns and promotions. However, significant growth is expected in payments and transactions, driven by the rising popularity of mobile wallets and contactless payment methods. Geographically, North America and Europe are anticipated to hold substantial market shares, but Asia-Pacific is poised for rapid expansion due to its burgeoning digital economy and large smartphone user base. Competition among key players, including Uniqode Phygital Inc, QR TIGER PTE LTD, and Flowcode, is intense, fostering innovation and driving down costs, further boosting market accessibility. While challenges like security concerns and potential misuse exist, technological advancements and increased awareness about secure QR code implementation are mitigating these risks. The overall outlook for the QR code market remains highly positive, indicating a sustained period of growth and innovation driven by the evolving digital landscape. Recent developments include: July 2024: Bandhan Bank launched its latest payment solution through the Bharat QR Code for its Current account and Savings account customers. The bank claimed that the solution will simplify how these self-employed segment customers make payments at any merchant outlet. An instant notification will also be received on every payment through a small speaker.June 2024: Flowcode, a marketing technology platform, unveiled a reimagined product designed for marketing and analytics teams at F1000 companies focused on measuring and maximizing offline conversions. Flowcode integrates seamlessly with data feeds, such as product catalogs, MLS listings, and more, to automate the creation of personalized, QR-enabled user journeys. This empowers brands to deliver unique, tailored consumer experiences, significantly increasing conversion rates.. Key drivers for this market are: Increased Smartphone Penetration, Growing Demand for Contactless Solutions; Increasing need for Security and Fraud Prevention. Potential restraints include: Increased Smartphone Penetration, Growing Demand for Contactless Solutions; Increasing need for Security and Fraud Prevention. Notable trends are: The Payments and Transactions Segment is Anticipated to Witness a Significant Growth.

  20. c

    R code that determines buying and selling of water by public-supply water...

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Aug 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). R code that determines buying and selling of water by public-supply water service areas [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/r-code-that-determines-buying-and-selling-of-water-by-public-supply-water-service-areas
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    This child item describes R code used to determine whether public-supply water systems buy water, sell water, both buy and sell water, or are neutral (meaning the system has only local water supplies) using water source information from a proprietary dataset from the U.S. Environmental Protection Agency. This information was needed to better understand public-supply water use and where water buying and selling were likely to occur. Buying or selling of water may result in per capita rates that are not representative of the population within the water service area. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Output from this code was used as an input feature variable in the public supply water use machine learning model. This page includes the following files: ID_WSA_04062022_Buyers_Sellers_DR.R - an R script used to determine whether a public-supply water service area buys water, sells water, or is neutral BuySell_readme.txt - a README text file describing the script

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Angelica Chen (2023). python-github-code [Dataset]. https://huggingface.co/datasets/angie-chen55/python-github-code

python-github-code

angie-chen55/python-github-code

Explore at:
30 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Authors
Angelica Chen
Description

angie-chen55/python-github-code dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu