94 datasets found
  1. h

    javascript-dataset

    • huggingface.co
    Updated Sep 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshay Nambiar (2024). javascript-dataset [Dataset]. https://huggingface.co/datasets/axay/javascript-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 3, 2024
    Authors
    Akshay Nambiar
    Description

    axay/javascript-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. Z

    Developer Expertise Dataset on JavaScript Libraries

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Montandon, João Eduardo (2020). Developer Expertise Dataset on JavaScript Libraries [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1484497
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Valente, Marco Tulio
    Montandon, João Eduardo
    Silva, Luciana Lourdes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains an anonymized list of surveyed developers who provided their expertise level on three popular JavaScript libraries:

    ReactJS, a library for building enriched web interfaces

    MongoDB, a driver for accessing MongoDB databased

    Socket.IO, a library for realtime communication

  3. h

    axay-javascript-dataset-pn

    • huggingface.co
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israel Antonio Rosales Laguan (2024). axay-javascript-dataset-pn [Dataset]. https://huggingface.co/datasets/israellaguan/axay-javascript-dataset-pn
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2024
    Authors
    Israel Antonio Rosales Laguan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    DPO JavaScript Dataset

    This repository contains a modified version of the JavaScript dataset originally sourced from axay/javascript-dataset-pn. The dataset has been adapted to fit the DPO (Dynamic Programming Object) format, making it compatible with the LLaMA-Factory project.

      License
    

    This dataset is licensed under the Apache 2.0 License.

      Dataset Overview
    

    The dataset consists of JavaScript code snippets that have been restructured and enhanced for use in… See the full description on the dataset page: https://huggingface.co/datasets/israellaguan/axay-javascript-dataset-pn.

  4. Z

    Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hegedűs, Péter (2020). Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics (Training Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4281475
    Explore at:
    Dataset updated
    Nov 21, 2020
    Dataset provided by
    Antal, Gábor
    Hegedűs, Péter
    Ferenc, Rudolf
    Tóth, Zoltán Gábor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of multiple files which contain bug prediction training data.

    The entries in the dataset are JavaScript functions either being buggy or non-buggy. Bug related information was obtained from the project EsLint contained in BugsJS (https://github.com/BugsJS/eslint). The buggy instances were collected throughout the lifetime of the project, however we added non-buggy entries from the latest version which is tagged as fix (entries which were previously included as buggy were not included as non-buggy later on).

    The dataset is based on hybrid call graphs which are constructed by https://github.com/sed-szeged/hcg-js-framework. The result of this tool is a call graph where the edges are associated with a confidence level which shows how likely the given edge is a valid call edge.

    We used different threshold values from which we considered the edges to be valid. The following threshold values were used:

    0.00

    0.05

    0.20

    0.30

    The prefix in the dataset file names are coming from the used threshold. The the datasets include coupling metrics NII (Nubmer of Incoming Invocations) and NOI (Number of Outgoing Invocations) which were calculated by a static source code analyzer called SourceMeter. Hybrid counterparts of these metrics (HNII and HNOI) are based on the given threshold values.

    There are four variants for all of these datasets:

    Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics and information about the entries (file without any postfix). Column contained only in this dataset are:

    ID

    Name

    Longname

    Parent ID

    Component ID

    Path

    Line

    Column

    EndLine

    EndColumn

    Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h+s' postfix)

    Only static (NII, NOI) coupling metrics are included with additional static source code metrics (file with '_s' postfix)

    Only hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h' postfix)

    Static source code metrics which are contained in all dataset are the following:

    McCC - McCabe Cyclomatic Complexity

    NL - Nesting Level

    NLE - Nesting Level Else If

    CD - Comment Density

    CLOC - Comment Lines of Code

    DLOC - Documentation Lines of Code

    TCD - Total Comment Density (Comment Lines in an emedded function will be also considered)

    TCLOC - Total Comment Lines of Code (Comment Lines in an emedded function will be also considered)

    LLOC - Logical Lines of Code (Comment and empty lines not counted)

    LOC - Lines of Code (Comment and empty lines are counted)

    NOS - Number of Statements

    NUMPAR - Number of Parameters

    TLLOC - Logical Lines of Code (Lines in embedded functions are also counted)

    TLOC - Lines of Code (Lines in embedded functions are also counted)

    TNOS - Total Number of Statements (Statements in embedded functions are also counted)

  5. P

    CodeSearchNet Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Dec 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamel Husain; Ho-Hsiang Wu; Tiferet Gazit; Miltiadis Allamanis; Marc Brockschmidt (2024). CodeSearchNet Dataset [Dataset]. https://paperswithcode.com/dataset/codesearchnet
    Explore at:
    Dataset updated
    Dec 30, 2024
    Authors
    Hamel Husain; Ho-Hsiang Wu; Tiferet Gazit; Miltiadis Allamanis; Marc Brockschmidt
    Description

    The CodeSearchNet Corpus is a large dataset of functions with associated documentation written in Go, Java, JavaScript, PHP, Python, and Ruby from open source projects on GitHub. The CodeSearchNet Corpus includes: * Six million methods overall * Two million of which have associated documentation (docstrings, JavaDoc, and more) * Metadata that indicates the original location (repository or line number, for example) where the data was found

  6. Dataset Collected by JSObserver

    • zenodo.org
    zip
    Updated Jun 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng (2020). Dataset Collected by JSObserver [Dataset]. http://doi.org/10.5281/zenodo.3874944
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a sampled dataset collected by JSObserver on Alexa top 100K websites. We analyze the log files to identify JavaScript global identifier conflicts, i.e., variable value conflicts, variable type conflicts and function definition conflicts.

    We release the log files on websites where we detect the above conflicts, and split the whole dataset into 10 subsets, i.e., 1-50K-0.zip ~ 50K-100K-4.zip.

    The writes to a memory location in JavaScript are saved in [rank].[main/sub].[frame_cnt].asg (e.g., 1.main.0.asg) files.

    JavaScript global function definitions are saved in [rank].[main/sub].[frame_cnt].func (e.g., 1.main.0.func) files.

    The maps from script IDs to script URLs are saved in [rank].[main/sub].[frame_cnt].id2url (e.g., 1.main.0.id2url) files.

    The source code of scripts are saved in [rank].[main/sub].[frame_cnt].[script_ID].script (e.g., 1.main.0.17.script) files.

    We also sample 100 websites on which we did not detect any conflicts. The log files collected on those websites are available in sampled_no_conflict.zip

  7. w

    Dataset of books called Reliable JavaScript

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Reliable JavaScript [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Reliable+JavaScript
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Reliable JavaScript. It features 7 columns including author, publication date, language, and book publisher.

  8. Data from: Towards a Prototype Based Explainable JavaScript Vulnerability...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated May 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc; Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc (2021). Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model [Dataset]. http://doi.org/10.5281/zenodo.4742161
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc; Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset we used in our paper entitled "Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model". The manually validated dataset contains various several static source code metrics along with vulnerability fixing hashes for numerous vulnerabilities. For more details, you can read the paper here.

    Security has become a central and unavoidable aspect of today’s software development. Practitioners and researchers have proposed many code analysis tools and techniques to mitigate security risks. These tools apply static and dynamic analysis or, more recently, machine learning. Machine learning models can achieve impressive results in finding and forecasting possible security issues in programs. However, there are at least two areas where most of the current approaches fall short of developer demands: explainability and granularity of predictions. In this paper, we propose a novel and simple yet, promising approach to identify potentially vulnerable source code in JavaScript programs. The model improves the state-of-the-art in terms of explainability and prediction granularity as it gives results at the level of individual source code lines, which is fine-grained enough for developers to take immediate actions. Additionally, the model explains each predicted line (i.e., provides the most similar vulnerable line from the training set) using a prototype-based approach. In a study of 186 real-world and confirmed JavaScript vulnerability fixes of 91 projects, the approach could flag 60% of the known vulnerable lines on average by marking only 10% of the code-base, but in certain cases the model identified 100% of the vulnerable code lines while flagging only 8.72% of the code-base.

    If you wish to use our dataset, please cite this dataset, or the corresponding paper:

    @inproceedings{mosolygo2021towards,
     title={Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model},
     author={Mosolyg{\'o}, Bal{\'a}zs and V{\'a}ndor, Norbert and Antal, G{\'a}bor and Heged{\H{u}}s, P{\'e}ter and Ferenc, Rudolf},
     booktitle={2021 International Conference on Code Quality (ICCQ)},
     pages={15--25},
     year={2021},
     organization={IEEE}
    }

  9. h

    code-text-javascript

    • huggingface.co
    Updated Jul 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semeru Lab (2023). code-text-javascript [Dataset]. https://huggingface.co/datasets/semeru/code-text-javascript
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2023
    Dataset authored and provided by
    Semeru Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset is imported from CodeXGLUE and pre-processed using their script.

      Where to find in Semeru:
    

    The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-text/javascript in Semeru

      CodeXGLUE -- Code-To-Text
    
    
    
    
    
      Task Definition
    

    The task is to generate natural language comments for a code, and evaluted by smoothed bleu-4 score.

      Dataset
    

    The dataset we use comes from CodeSearchNet and we filter the dataset as the following:… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-text-javascript.

  10. Z

    Data from: Mining Rule Violations in JavaScript Code Snippets

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smethurst, Guilherme (2020). Mining Rule Violations in JavaScript Code Snippets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2593817
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Moraes, João Pedro
    Bonifácio, Rodrigo
    Smethurst, Guilherme
    Ferreira Campos, Uriel
    Pinto, Gustavo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Content of this repository This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge

    Github Repository with the software used : here.

    DATASET The dataset was retrived utilizing google bigquery and dumped to a csv file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information : 1. The Id of the question (PostId) 2. The Content (in this case the code block) 3. the lenght of the code block 4. the line count of the code block 5. The score of the post 6. The title

    A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.

    Filtered Dataset:

    Extracting code from CSV We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.

    Running ESlint Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.

    Number of Violations per Rule This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.

    Number of violations per Category As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.

    Individual Reports This information was extracted from the json reports, it's a csv file with PostID and violations per rule.

    Rules The file Rules with categories contains all the rules used and their categories.

  11. h

    javascript-github-code

    • huggingface.co
    Updated Dec 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelica Chen (2022). javascript-github-code [Dataset]. https://huggingface.co/datasets/angie-chen55/javascript-github-code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2022
    Authors
    Angelica Chen
    Description

    angie-chen55/javascript-github-code dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. P

    TFix's Code Patches Data Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jul 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berkay Berabi; Jingxuan He; Veselin Raychev; Martin Vechev (2021). TFix's Code Patches Data Dataset [Dataset]. https://paperswithcode.com/dataset/tfix-s-code-patch-data
    Explore at:
    Dataset updated
    Jul 17, 2021
    Authors
    Berkay Berabi; Jingxuan He; Veselin Raychev; Martin Vechev
    Description

    The dataset contains more than 100k code patch pairs extracted from open source projects on GitHub. Each pair comes with the erroneous and the fixed version of the corresponding code snippet. Instead of the whole file, the code snippets are extracted to focus on the problematic region (error line + other lines around it). For each sample, the repository name, the commit id, and the file names are provided so that one can access the complete files in case of interest.

    The dataset only has JavaScript programs and the error are detected by the popular static code analyzer ESLint. The dataset can be used in the fields of: program repair, code generation, bug finding, transfer learning and many more fields related to machine learning for code

  13. Data from: Dynamic Security Analysis of JavaScript: Are We There Yet?:...

    • zenodo.org
    bin
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Calzavara; Stefano Calzavara; Samuele Casarin; Samuele Casarin; Riccardo Focardi; Riccardo Focardi (2025). Dynamic Security Analysis of JavaScript: Are We There Yet?: Dataset [Dataset]. http://doi.org/10.5281/zenodo.14774184
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stefano Calzavara; Stefano Calzavara; Samuele Casarin; Samuele Casarin; Riccardo Focardi; Riccardo Focardi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    This dataset was employed to systematically evaluate dynamic security analysis tools for JavaScript. It includes compatibility data for deployed scripts, as well as various details about their execution on both the analysis tools and regular browsers. Data collection was conducted on the top 10k domains from the Tranco ranking generated on September 27, 2024.
  14. w

    Dataset of books called The joy of JavaScript

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called The joy of JavaScript [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+joy+of+JavaScript
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is The joy of JavaScript. It features 7 columns including author, publication date, language, and book publisher.

  15. Dataset collected by JSIsolate

    • zenodo.org
    zip
    Updated Aug 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng (2021). Dataset collected by JSIsolate [Dataset]. http://doi.org/10.5281/zenodo.5242976
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains: 1) the object access logs, 2) script isolation policies and 3) script write conflicts collected by JSIsolate on Alexa top 1K websites. We analyze the access logs to generate the conflict summary files and script isolation policies that assign static scripts to an execution context.

    We split the whole dataset of object access logs into 10 subsets, i.e., access-0.zip ~ access-9.zip.

    The isolation policies are released in url-level-policies.zip and domain-level-policies.zip.

    The object accesses (i.e., reads and writes) are saved in [rank].[main/sub].[frame_cnt].access (e.g., 1.main.0.access) files.

    The URLs of frames (i.e., main frames and iframes) are saved in [rank].[main/sub].[frame_cnt].frame (e.g., 1.main.0.frame) files.

    The maps from script IDs to script URLs are saved in [rank].[main/sub].[frame_cnt].id2url (e.g., 1.main.0.id2url) files.

    The maps from script IDs to their parent script (script that includes it,

    The source code of scripts are saved in [rank].[main/sub].[frame_cnt].[script_ID].script (e.g., 1.main.0.17.script) files.

    Note that we perform monkey testing during the data collection, which may cause the page to navigate to a different URL. Therefore, there could be multiple main frame files.

    The conflicts are dumped to [rank].conflicts (e.g., 1.conflicts) files.

    The isolation policies are dumped to [rank].configs (e.g., 1.configs) and [rank].configs-simple (e.g., 1.configs-simple) files.

    Note that the *.configs files also include the read/write operations that cause JSIsolate to assign a script from third-party domain to the first-party context.

  16. P

    HumanEval-X Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Mar 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qinkai Zheng; Xiao Xia; Xu Zou; Yuxiao Dong; Shan Wang; Yufei Xue; Zihan Wang; Lei Shen; Andi Wang; Yang Li; Teng Su; Zhilin Yang; Jie Tang (2023). HumanEval-X Dataset [Dataset]. https://paperswithcode.com/dataset/humaneval-x
    Explore at:
    Dataset updated
    Mar 31, 2023
    Authors
    Qinkai Zheng; Xiao Xia; Xu Zou; Yuxiao Dong; Shan Wang; Yufei Xue; Zihan Wang; Lei Shen; Andi Wang; Yang Li; Teng Su; Zhilin Yang; Jie Tang
    Description

    HumanEval-X is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation.

  17. f

    Data from: BreCaHAD: A Dataset for Breast Cancer Histopathological...

    • figshare.com
    png
    Updated Jan 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alper Aksac; Douglas J. Demetrick; Tansel Özyer; Reda Alhajj (2019). BreCaHAD: A Dataset for Breast Cancer Histopathological Annotation and Diagnosis [Dataset]. http://doi.org/10.6084/m9.figshare.7379186.v3
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jan 28, 2019
    Dataset provided by
    figshare
    Authors
    Alper Aksac; Douglas J. Demetrick; Tansel Özyer; Reda Alhajj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of 1 .xlsx file, 2 .png files, 1 .json file and 1 .zip file:annotation_details.xlsx: The distribution of annotations in the previously mentioned six classes (mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule) is presented in a Excel spreadsheet.original.png: The input image.annotated.png: An example from the dataset. In the annotated image, blue circles indicate the tumor nuclei, pink circles show non-tumor nuclei such as blood cells, stroma nuclei, and lymphocytes; orange and green circles are mitosis and apoptosis, respectively; light blue circles are true lumen for tubules, and yellow circles represent white regions (non-lumen) such as fat, blood vessel, and broken tissues.data.json: The annotations for the BreCaHAD dataset are provided in JSON (JavaScript Object Notation) format. In the given example, the JSON file (ground truth) contains two mitosis and only one tumor nuclei annotations. Here, x and y are the coordinates of the centroid of the annotated object, and the values are between 0, 1.BreCaHAD.zip: An archive file containing dataset. Three folders are included: images (original images), groundTruth (json files), and groundTruth_display (groundTruth applied on original images)

  18. w

    Dataset of books called D3.js 4.x data visualization : learn to visualize...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called D3.js 4.x data visualization : learn to visualize your data with JavaScript [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=D3.js+4.x+data+visualization+%3A+learn+to+visualize+your+data+with+JavaScript
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is D3.js 4.x data visualization : learn to visualize your data with JavaScript. It features 7 columns including author, publication date, language, and book publisher.

  19. P

    CodeQA Dataset

    • paperswithcode.com
    Updated Dec 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenxiao Liu; Xiaojun Wan (2023). CodeQA Dataset [Dataset]. https://paperswithcode.com/dataset/codeqa
    Explore at:
    Dataset updated
    Dec 29, 2023
    Authors
    Chenxiao Liu; Xiaojun Wan
    Description

    CodeQA is a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs.

    Description from: CodeQA: A Question Answering Dataset for Source Code Comprehension

  20. Z

    The Klarna Product-Page Dataset

    • data.niaid.nih.gov
    • researchdata.se
    • +1more
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moradi, Aref (2024). The Klarna Product-Page Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12605479
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    Risuleo, Riccardo Sven
    Magureanu, Stefan
    Moradi, Aref
    Lagergren, Jens
    Hotti, Alexandra
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description

    The Klarna Product Page Dataset is a dataset of publicly available pages corresponding to products sold online on various e-commerce websites. The dataset contains offline snapshots of 51,701 product pages collected from 8,175 distinct merchants across 8 different markets (US, GB, SE, NL, FI, NO, DE, AT) between 2018 and 2019. On each page, analysts labelled 5 elements of interest: the price of the product, its image, its name and the add-to-cart and go-to-cart buttons (if found). These labels are present in the HTML code as an attribute called klarna-ai-label taking one of the values: Price, Name, Main picture, Add to cart and Cart.

    The snapshots are available in 3 formats: as MHTML files (~24GB), as WebTraversalLibrary (WTL) snapshots (~7.4GB), and as screeshots (~8.9GB). The MHTML format is less lossy, a browser can render these pages though any Javascript on the page is lost. The WTL snapshots are produced by loading the MHTML pages into a chromium-based browser. To keep the WTL dataset compact, the screenshots of the rendered MTHML are provided separately; here we provide the HTML of the rendered DOM tree and additional page and element metadata with rendering information (bounding boxes of elements, font sizes etc.). The folder structure of the screenshot dataset is identical to the one the WTL dataset and can be used to complete the WTL snapshots with image information. For convenience, the datasets are provided with a train/test split in which no merchants in the test set are present in the training set.

    Corresponding Publication

    For more information about the contents of the datasets (statistics etc.) please refer to the following TMLR paper.

    GitHub Repository

    The code needed to re-run the experiments in the publication accompanying the dataset can be accessed here.

    Citing

    If you found this dataset useful in your research, please cite the paper as follows:

    @article{hotti2024the, title={The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models}, author={Alexandra Hotti and Riccardo Sven Risuleo and Stefan Magureanu and Aref Moradi and Jens Lagergren}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2024}, url={https://openreview.net/forum?id=zz6FesdDbB}, note={} }

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Akshay Nambiar (2024). javascript-dataset [Dataset]. https://huggingface.co/datasets/axay/javascript-dataset

javascript-dataset

axay/javascript-dataset

Explore at:
145 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 3, 2024
Authors
Akshay Nambiar
Description

axay/javascript-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu