100+ datasets found
  1. h

    python-github-code

    • huggingface.co
    Updated Mar 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelica Chen (2023). python-github-code [Dataset]. https://huggingface.co/datasets/angie-chen55/python-github-code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Authors
    Angelica Chen
    Description

    angie-chen55/python-github-code dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    instructional_code-search-net-java

    • huggingface.co
    Updated May 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Tarin Morales (2023). instructional_code-search-net-java [Dataset]. https://huggingface.co/datasets/Nan-Do/instructional_code-search-net-java
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2023
    Authors
    Fernando Tarin Morales
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "instructional_code-search-net-java"

      Dataset Summary
    

    This is an instructional dataset for Java. The dataset contains two different kind of tasks:

    Given a piece of code generate a description of what it does. Given a description generate a piece of code that fulfils the description.

      Languages
    

    The dataset is in English.

      Data Splits
    

    There are no splits.

      Dataset Creation
    

    May of 2023

      Curation Rationale
    

    This dataset… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/instructional_code-search-net-java.

  3. b

    CPRD codes: ICD-10 equivalent code lists for dementia subtypes - Datasets -...

    • data.bris.ac.uk
    Updated Dec 11, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). CPRD codes: ICD-10 equivalent code lists for dementia subtypes - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2h4rmk9v7pw2k23h7vgf9tx1ea
    Explore at:
    Dataset updated
    Dec 11, 2017
    Description

    This dataset contains the ICD-10 code lists used to test the sensitivity and specificity of the Clinical Practice Research Datalink (CPRD) medical code lists for dementia subtypes. The provided code lists are used to define dementia subtypes in linked data from the Hospital Episode Statistic (HES) inpatient dataset and the Office of National Statistics (ONS) death registry, which are then used as the 'gold standard' for comparison against dementia subtypes defined using the CPRD medical code lists. The CPRD medical code lists used in this comparison are available here: Venexia Walker, Neil Davies, Patrick Kehoe, Richard Martin (2017): CPRD codes: neurodegenerative diseases and commonly prescribed drugs. https://doi.org/10.5523/bris.1plm8il42rmlo2a2fqwslwckm2 Complete download (zip, 3.9 KiB)

  4. ARC Code TI: QuIP

    • catalog.data.gov
    • datasets.ai
    • +4more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ames Research Center (2025). ARC Code TI: QuIP [Dataset]. https://catalog.data.gov/dataset/arc-code-ti-quip
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Ames Research Centerhttps://nasa.gov/ames/
    Description

    QuIP (QUick Image Processing) is an interpreter for image processing, graphics, psychophysical experimentation and general scientific computing.

  5. VegeNet - Image datasets and Codes

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jo Yen Tan; Jo Yen Tan (2022). VegeNet - Image datasets and Codes [Dataset]. http://doi.org/10.5281/zenodo.7254508
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jo Yen Tan; Jo Yen Tan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Compilation of python codes for data preprocessing and VegeNet building, as well as image datasets (zip files).

    Image datasets:

    1. vege_original : Images of vegetables captured manually in data acquisition stage
    2. vege_cropped_renamed : Images in (1) cropped to remove background areas and image labels renamed
    3. non-vege images : Images of non-vegetable foods for CNN network to recognize other-than-vegetable foods
    4. food_image_dataset : Complete set of vege (2) and non-vege (3) images for architecture building.
    5. food_image_dataset_split : Image dataset (4) split into train and test sets
    6. process : Images created when cropping (pre-processing step) to create dataset (2).
  6. c

    Form-Based Code Pilot Neighborhoods

    • data.clevelandohio.gov
    • hub.arcgis.com
    • +1more
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cleveland | GIS (2024). Form-Based Code Pilot Neighborhoods [Dataset]. https://data.clevelandohio.gov/datasets/form-based-code-pilot-neighborhoods
    Explore at:
    Dataset updated
    Jul 1, 2024
    Dataset authored and provided by
    Cleveland | GIS
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    Description

    DescriptionThis dataset outlines specific zones or regions designated under a Form-Based Code (FBC) framework. Unlike traditional zoning, form-based codes emphasize the physical form of buildings and public spaces over land use. These zones guide community design through parameters such as building height, setbacks, and architectural styles. The dataset provides a spatial reference for planning, zoning, and development decisions aligned with form-based design principles. The data was created by digitizing PDFs of approved Form-Based Code plans, accessible via links listed in the Ordinance Link column of the dataset.

    Applications Featuring This DatasetForm-Based Code Explorer
    

    Data GlossarySee the Attributes section below for details about each column in this dataset.

    Update FrequencyWhen FBC neighborhood regions change.
    
    
    ContactCity Planning Commission – Zoning and Technology
    
  7. Replication package for DRAGON: Robust Classification for Very Large...

    • zenodo.org
    bin, zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). Replication package for DRAGON: Robust Classification for Very Large Collections of Software Repositories [Dataset]. http://doi.org/10.5281/zenodo.15424419
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DRAGON: Multi-Label Classification

    This archive contains the replication package for the DRAGON multi-label classification models, which leverage BERT-based architectures. The package includes scripts for repository mining, dataset creation, data processing, model training, and evaluation. The two main models used are DRAGON and LEGION.

    Key Components:

    • Repository Mining: Scripts to extract repositories for dataset creation.
    • Dataset Preparation: Jupyter notebooks for cleaning and transforming data.
    • Data Processing: Conversion into a Hugging Face dataset format.
    • Model Training: Training scripts for DRAGON and LEGION, with configurable preprocessing options.
    • Evaluation: Threshold tuning and performance assessment.

    Setup

    Before running any commands, ensure you have the necessary dependencies installed. It is recommended to use a virtual environment:

    python3 -m venv venv
    source venv/bin/activate # On Windows use `venv\Scripts\activate`
    pip install -r requirements.txt
    

    Project Structure

    • repository_mining/: Contains scripts for mining the initial set of repositories.
      • repository_mining/doc/: Includes documentation with the necessary information for repository mining.
    • dataset_creation/: Contains all the notebooks to be run sequentially to prepare the dataset.
    • multilabel_class/: Contains scripts for classification, threshold tuning, and evaluation.
      • multilabel_class/model_output/: trained model organized by: first dataset, then model variantion.
    • data/: Contains the hugginface datasets ( our dataset and LEGION dataset) ready for the training/eval.

    1️⃣ Data Mining

    To mine the initial set of repositories from Software Heritage, use the scripts available in the repository_mining/ folder. Detailed information and steps for repository mining can be found in:

    repository_mining/doc/
    

    2️⃣ Dataset Creation

    After mining the repositories, prepare the dataset by running the Jupyter notebooks inside the dataset_creation/ folder in sequence. These notebooks handle data cleaning, transformation, and formatting necessary for model training. All the documentation needed is inside each notebook explaining every step.

    3️⃣ Data Processing

    Once the dataset is prepared, convert it into a Hugging Face dataset using:

    python3 multilabel_class/create_dataset.py --file_path data/02_processed_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned.csv
    

    4️⃣ Classification / Training

    Train the DRAGON Model

    After processing the dataset, train the DRAGON model with the following command:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant focal --dataset_path data/03_huggingaceV_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned/dataset
    

    Ensure Configuration is Set Correctly

    Modify the configuration file multilabel_class/utils/config.py to set the following parameter to True:

    DEFAULT_PREPROCESSING_PARAMS = { 
      'use_sentence_pairs': True # If True, process as (text1, text2); if False, concatenate texts
    }
    

    Training DRAGON Without Sentence Pairs

    To train DRAGON without using sentence pairs, use the same command but set use_sentence_pairs to False in the config file:

    DEFAULT_PREPROCESSING_PARAMS = { 
      'use_sentence_pairs': False
    }
    

    Train DRAGON on a Benchmark Dataset

    To train DRAGON on a benchmark dataset, use:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant focal --dataset_path data/03_huggingaceV_datasets/LEGION/dataset
    

    Ensure the use_sentence_pairs parameter is set to True in config.py.

    Train LEGION on the DRAGON Dataset

    To train LEGION on the DRAGON dataset, use:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant db --dataset_path data/03_huggingaceV_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned/dataset
    

    Ensure the use_sentence_pairs parameter is set to False in config.py:

    DEFAULT_PREPROCESSING_PARAMS = { 
      'use_sentence_pairs': False
    }
    

    Train LEGION on a Baseline Dataset

    To train LEGION on a baseline dataset, run:

    python3 multilabel_class/tune_thresholds.py --model_type bert --model_variant db --dataset_path data/03_huggingaceV_datasets/LEGION/dataset
    

    5️⃣ Model Evaluation

    Once thresholds are tuned, you can evaluate the model using:

    python3 multilabel_class/evaluation.py --model_type bert --model_variant focal --dataset_path data/03_huggingaceV_datasets/2024-05-22/origin-metadata-readme_names-900000dataset_forks-cleaned/dataset
    

    This evaluation script computes standard multi-label classification metrics including:

    • Micro and macro F1@1..5-score
    • Precision@1..5 and recall@1..5

    Ensure that the model variant and dataset path correspond to the previously trained model.

    Recommended: Evaluation via Notebooks

    We suggest an interactive and visual analysis of model performance, you can also use the provided Jupyter notebooks located in:

    DRAGON_replication/multilabel_class/notebooks/
    

    These notebooks reproduce the complete evaluation pipeline and generate additional visualizations and metrics discussed in the associated paper.

    Both command-line and notebook-based evaluations ensure reproducibility and offer complementary insights into model behavior.

    Instructions for Unzipping Files

    Several folders in this replication package have been compressed into .zip files to reduce package size. Before running any code, you must unzip all the provided .zip files in-place—that is, extract each archive into the same directory as the .zip file, using the same name as the zip file (without the .zip extension).

    For example:

    DRAGON_replication\data\02_processed_dataset\2024-05-22.zip
    

    should be extracted to:

    DRAGON_replication\data\02_processed_dataset\2024-05-22\
    

    List of .zip files to extract

    • DRAGON_replication\data\02_processed_dataset\2024-05-22.zip
    • DRAGON_replication\data\03_huggingaceV_datasets\2024-05-22.zip
    • DRAGON_replication\data\03_huggingaceV_datasets\LEGION.zip
    • DRAGON_replication\dataset_creation\data.zip
    • DRAGON_replication\multilabel_class\model_output\2024-05-22.zip
    • DRAGON_replication\multilabel_class\model_output\LEGION.zip

    Make sure that after extraction, each corresponding folder exists and contains the expected files. Do not change the folder names or directory structure after unzipping.

    This README provides an overview of the essential steps for repository mining, dataset preparation, processing, model training, and evaluation. For further customization, refer to the configuration files and experiment with different preprocessing settings.

  8. code_x_glue_cc_code_completion_token

    • huggingface.co
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2021). code_x_glue_cc_code_completion_token [Dataset]. https://huggingface.co/datasets/google/code_x_glue_cc_code_completion_token
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2021
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/

    Description

    Dataset Card for "code_x_glue_cc_code_completion_token"

      Dataset Summary
    

    CodeXGLUE CodeCompletion-token dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/CodeCompletion-token Predict next code token given context of previous tokens. Models are evaluated by token level accuracy. Code completion is a one of the most widely used features in software development through IDEs. An effective code completion tool could improve software… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_code_completion_token.

  9. h

    code-readability-merged

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chair of Software Engineering II, Uni Passau (2025). code-readability-merged [Dataset]. https://huggingface.co/datasets/se2p/code-readability-merged
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2025
    Dataset authored and provided by
    Chair of Software Engineering II, Uni Passau
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Java Code Readability Merged Dataset

    This dataset contains 421 Java code snippets along with a readability score, aggregated from several scientific papers [1, 2, 3]. You can download the dataset using Hugging Face: from datasets import load_dataset ds = load_dataset("se2p/code-readability-merged")

    The snippets are not split into train and test (and validation) set. Thus, the whole dataset is in the train set: ds = ds['train'] ds_as_list = ds.to_list() # Convert the dataset to… See the full description on the dataset page: https://huggingface.co/datasets/se2p/code-readability-merged.

  10. i

    Medicaid Claims by Recipient Zip Code

    • hub.mph.in.gov
    Updated Sep 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Medicaid Claims by Recipient Zip Code [Dataset]. https://hub.mph.in.gov/dataset/medicaid-claims-by-recipient-zip-code
    Explore at:
    Dataset updated
    Sep 14, 2017
    Description

    Archived as of 6/26/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to the services related to recipients enrolled in Medicaid. It contains information about the total number of recipients, total number of claims, and total dollar amount, by recipient zip code. Restricted to claims with service date between 01/2012 to 12/2017. Restricted to patients with a Medicaid claim during this period. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA.

  11. d

    3.01 Property Code Enforcement (dashboard)

    • catalog.data.gov
    Updated Mar 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2023). 3.01 Property Code Enforcement (dashboard) [Dataset]. https://catalog.data.gov/dataset/3-01-property-code-enforcement-dashboard-0ee61
    Explore at:
    Dataset updated
    Mar 18, 2023
    Dataset provided by
    City of Tempe
    Description

    This operations dashboard shows historic and current data related to this performance measure.The performance measure dashboard is available at 3.01 Property Code Enforcement. Data Dictionary

  12. h

    code-search-net-go

    • huggingface.co
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Tarin Morales (2023). code-search-net-go [Dataset]. https://huggingface.co/datasets/Nan-Do/code-search-net-go
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2023
    Authors
    Fernando Tarin Morales
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "code-search-net-go"

      Dataset Summary
    

    This dataset is the Go portion of the CodeSarchNet annotated with a summary column.The code-search-net dataset includes open source functions that include comments found at GitHub.The summary is a short description of what the function does.

      Languages
    

    The dataset's comments are in English and the functions are coded in Go

      Data Splits
    

    Train, test, validation labels are included in the dataset as… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/code-search-net-go.

  13. d

    Data from: Housing Code Enforcement

    • catalog.data.gov
    • data.wu.ac.at
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2023). Housing Code Enforcement [Dataset]. https://catalog.data.gov/dataset/housing-code-enforcement-181fe
    Explore at:
    Dataset updated
    Aug 26, 2023
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    Housing code enforcement activities, including inspections and violations.

  14. Curated Email-Based Code Reviews Datasets

    • figshare.com
    bin
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam (2024). Curated Email-Based Code Reviews Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.24679656.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.

  15. Data and code files for co-occurrence modeling project

    • catalog.data.gov
    • datadiscoverystudio.org
    • +2more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Data and code files for co-occurrence modeling project [Dataset]. https://catalog.data.gov/dataset/data-and-code-files-for-co-occurrence-modeling-project
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Files included are original data inputs on stream fishes (fish_data_OEPA_2012.csv), water chemistry (OEPA_WATER_2012.csv), geographic data (NHD_Plus_StreamCat); modeling files for generating predictions from the original data, including the R code (MVP_R_Final.txt) and Stan code (MV_Probit_Stan_Final.txt); and the model output file containing predictions for all NHDPlus catchments in the East Fork Little Miami River watershed (MVP_EFLMR_cooc_Final). This dataset is associated with the following publication: Martin, R., E. Waits, and C. Nietch. Empirically-based modeling and mapping to consider the co-occurrence of ecological receptors and stressors. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 613(614): 1228-1239, (2018).

  16. Q

    QR Codes Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). QR Codes Market Report [Dataset]. https://www.datainsightsmarket.com/reports/qr-codes-market-20882
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The QR Code market is experiencing robust growth, projected to reach a market size of $10.5 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 16.67% from 2025 to 2033. This expansion is driven by several key factors. The increasing adoption of smartphones and mobile payment systems globally fuels the demand for QR codes in diverse applications, from marketing campaigns and contactless payments to information sharing and customer engagement initiatives. The shift towards digitalization across various industries, coupled with the convenience and cost-effectiveness of QR codes, contributes significantly to market growth. The dynamic nature of QR codes, allowing for updates and tracking of performance, adds to their appeal over static alternatives. Furthermore, the diversification of QR code formats, catering to different use cases like website links, menus, file downloads, and social media integration, expands the market's reach across various sectors. The market segmentation reveals a diverse landscape. Dynamic QR codes, offering greater flexibility and analytics capabilities, are gaining traction over their static counterparts. Among end-user applications, marketing and advertising dominate, leveraging QR codes for campaigns and promotions. However, significant growth is expected in payments and transactions, driven by the rising popularity of mobile wallets and contactless payment methods. Geographically, North America and Europe are anticipated to hold substantial market shares, but Asia-Pacific is poised for rapid expansion due to its burgeoning digital economy and large smartphone user base. Competition among key players, including Uniqode Phygital Inc, QR TIGER PTE LTD, and Flowcode, is intense, fostering innovation and driving down costs, further boosting market accessibility. While challenges like security concerns and potential misuse exist, technological advancements and increased awareness about secure QR code implementation are mitigating these risks. The overall outlook for the QR code market remains highly positive, indicating a sustained period of growth and innovation driven by the evolving digital landscape. Recent developments include: July 2024: Bandhan Bank launched its latest payment solution through the Bharat QR Code for its Current account and Savings account customers. The bank claimed that the solution will simplify how these self-employed segment customers make payments at any merchant outlet. An instant notification will also be received on every payment through a small speaker.June 2024: Flowcode, a marketing technology platform, unveiled a reimagined product designed for marketing and analytics teams at F1000 companies focused on measuring and maximizing offline conversions. Flowcode integrates seamlessly with data feeds, such as product catalogs, MLS listings, and more, to automate the creation of personalized, QR-enabled user journeys. This empowers brands to deliver unique, tailored consumer experiences, significantly increasing conversion rates.. Key drivers for this market are: Increased Smartphone Penetration, Growing Demand for Contactless Solutions; Increasing need for Security and Fraud Prevention. Potential restraints include: Increased Smartphone Penetration, Growing Demand for Contactless Solutions; Increasing need for Security and Fraud Prevention. Notable trends are: The Payments and Transactions Segment is Anticipated to Witness a Significant Growth.

  17. P

    The Stack Dataset

    • paperswithcode.com
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denis Kocetkov; Raymond Li; Loubna Ben allal; Jia Li; Chenghao Mou; Carlos Muñoz Ferrandis; Yacine Jernite; Margaret Mitchell; Sean Hughes; Thomas Wolf; Dzmitry Bahdanau; Leandro von Werra; Harm de Vries (2022). The Stack Dataset [Dataset]. https://paperswithcode.com/dataset/the-stack
    Explore at:
    Dataset updated
    Oct 28, 2022
    Authors
    Denis Kocetkov; Raymond Li; Loubna Ben allal; Jia Li; Chenghao Mou; Carlos Muñoz Ferrandis; Yacine Jernite; Margaret Mitchell; Sean Hughes; Thomas Wolf; Dzmitry Bahdanau; Leandro von Werra; Harm de Vries
    Description

    The Stack contains over 3TB of permissively-licensed source code files covering 30 programming languages crawled from GitHub. The dataset was created as part of the BigCode Project, an open scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs).

  18. h

    mala-code-reasoning-v2

    • huggingface.co
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mala-code-reasoning-v2 [Dataset]. https://huggingface.co/datasets/MaLA-LM/mala-code-reasoning-v2
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    MaLA-LM
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    MaLA Corpus: Massive Language Adaptation Corpus

    This MaLA code and reasoning dataset (V2) is used for training EMMA-500 Llama 3(.1) Mono/Bi model series.

    🤗MaLA-LM/emma-500-llama3-8b-mono: CPT model trained on monolingual data mix in 500+ languages
    🤗MaLA-LM/emma-500-llama3-8b-bi: CPT model trained on monolingual data mix in 500+ languages + bilingual translation data in 2,500+ language pairs
    🤗MaLA-LM/emma-500-llama3.1-8b-mono: CPT model trained on monolingual data mix in… See the full description on the dataset page: https://huggingface.co/datasets/MaLA-LM/mala-code-reasoning-v2.

  19. data and code

    • figshare.com
    txt
    Updated Mar 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Voukelatou (2017). data and code [Dataset]. http://doi.org/10.6084/m9.figshare.4780312.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 23, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Maria Voukelatou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (raw data for use with accompanying r script)

  20. DHCS County Code Reference Table

    • healthdata.gov
    • data.chhs.ca.gov
    • +4more
    application/rdfxml +5
    Updated May 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chhs.data.ca.gov (2025). DHCS County Code Reference Table [Dataset]. https://healthdata.gov/State/DHCS-County-Code-Reference-Table/xzk9-w5kz/data
    Explore at:
    csv, tsv, application/rdfxml, xml, application/rssxml, jsonAvailable download formats
    Dataset updated
    May 13, 2025
    Dataset provided by
    chhs.data.ca.gov
    Description

    This reference table contains data elements for the 58 Counties in California that can be used to join to other data sets. This data includes the following fields:


    DHCS County Code
    County Name
    County Region Code
    County Region Description
    FIPS County Code (xxx)
    FIPS State Code + FIPS County Code (06xxx)
    North/South Indicator

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Angelica Chen (2023). python-github-code [Dataset]. https://huggingface.co/datasets/angie-chen55/python-github-code

python-github-code

angie-chen55/python-github-code

Explore at:
30 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Authors
Angelica Chen
Description

angie-chen55/python-github-code dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu