Facebook
Twittersaurabh5/rlvr-code-data-JavaScript dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data and analysis from an empirical study investigating the adoption trends of modern JavaScript features introduced with ECMAScript 6 (ES6) and beyond. By mining the source code history of 158 open-source JavaScript projects, the study identifies efforts to rejuvenate legacy code by replacing outdated constructs with modern ones. The findings highlight the extensive use of modern features, their widespread adoption within one to two years after ES6's release, and ongoing trends in the rejuvenation of JavaScript codebases.
scripts.zip: Contains Python scripts used to analyze data and generate the graphs presented in the study's results.
jsminer-tool.zip: Includes the tool developed to analyze GitHub repository history and collect metrics on the adoption of modern JavaScript features.
jsminer_database_backup.zip: Provides a PostgreSQL database dump containing all code review comments from the repositories analyzed in the study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of multiple files which contain bug prediction training data.
The entries in the dataset are JavaScript functions either being buggy or non-buggy. Bug related information was obtained from the project EsLint contained in BugsJS (https://github.com/BugsJS/eslint). The buggy instances were collected throughout the lifetime of the project, however we added non-buggy entries from the latest version which is tagged as fix (entries which were previously included as buggy were not included as non-buggy later on).
The dataset is based on hybrid call graphs which are constructed by https://github.com/sed-szeged/hcg-js-framework. The result of this tool is a call graph where the edges are associated with a confidence level which shows how likely the given edge is a valid call edge.
We used different threshold values from which we considered the edges to be valid. The following threshold values were used:
0.00
0.05
0.20
0.30
The prefix in the dataset file names are coming from the used threshold. The the datasets include coupling metrics NII (Nubmer of Incoming Invocations) and NOI (Number of Outgoing Invocations) which were calculated by a static source code analyzer called SourceMeter. Hybrid counterparts of these metrics (HNII and HNOI) are based on the given threshold values.
There are four variants for all of these datasets:
Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics and information about the entries (file without any postfix). Column contained only in this dataset are:
ID
Name
Longname
Parent ID
Component ID
Path
Line
Column
EndLine
EndColumn
Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h+s' postfix)
Only static (NII, NOI) coupling metrics are included with additional static source code metrics (file with '_s' postfix)
Only hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h' postfix)
Static source code metrics which are contained in all dataset are the following:
McCC - McCabe Cyclomatic Complexity
NL - Nesting Level
NLE - Nesting Level Else If
CD - Comment Density
CLOC - Comment Lines of Code
DLOC - Documentation Lines of Code
TCD - Total Comment Density (Comment Lines in an emedded function will be also considered)
TCLOC - Total Comment Lines of Code (Comment Lines in an emedded function will be also considered)
LLOC - Logical Lines of Code (Comment and empty lines not counted)
LOC - Lines of Code (Comment and empty lines are counted)
NOS - Number of Statements
NUMPAR - Number of Parameters
TLLOC - Logical Lines of Code (Lines in embedded functions are also counted)
TLOC - Lines of Code (Lines in embedded functions are also counted)
TNOS - Total Number of Statements (Statements in embedded functions are also counted)
Facebook
TwitterAutoTrain Dataset for project: javascript-traing-1
Dataset Description
This dataset has been automatically processed by AutoTrain for project javascript-traing-1.
Languages
The BCP-47 code for the dataset's language is unk.
Dataset Structure
Data Instances
A sample from this dataset looks as follows: [ { "target": "test/NavbarSpec.js", "feat_repo_name": "aabenoja/react-bootstrap", "text": "import React from 'react'; import… See the full description on the dataset page: https://huggingface.co/datasets/ars-1/autotrain-data-javascript-traing-1.
Facebook
TwitterThis dataset contains the predicted prices of the asset JavaScript over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
TwitterTraffic analytics, rankings, and competitive metrics for javascript.com as of September 2025
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Laboratory testing tasks. The data contains the task identifier and the instructions given to the participants to complete the task. (CSV 618 kb)
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
<svg/onload=alert(1)
15) <iframe/src=javascript:confirm(1)
16) X
17) http://www.
34) alert(1)>
35)
38)
44)
45) ">https://www.google.com/');>
46)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
After scenario questionnaire results. The data contains the results of the After Scenario Questionnaire answered by 14 participants. (CSV 149 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results from laboratory testing. The data contains the task identifier, the average time to completion, number of times the task was successfully completed and the total number of errors. (CSV 209 kb)
Facebook
TwitterThis dataset was created by Margaritelli
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Content of this repository
This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge
Github Repository with the software used : here.
=======
DATASET
The dataset was retrived utilizing google bigquery and dumped to a csv
file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information :
1. The Id of the question (PostId)
2. The Content (in this case the code block)
3. the lenght of the code block
4. the line count of the code block
5. The score of the post
6. The title
A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.
Filtered Dataset:
Extracting code from CSV
We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.
Running ESlint
Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.
Number of Violations per Rule
This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.
Number of violations per Category
As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.
Individual Reports
This information was extracted from the json reports, it's a csv file with PostID and violations per rule.
Rules
The file Rules with categories contains all the rules used and their categories.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data comes from an effort to render the top 1M domains on the web in a scripted browser, and recording performance metrics of each page. These metrics are published here in numpy format. See the starter notebook for an example showing how to use the data, and what the columns contain. The following posts for a more in depth write ups:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nielsen’s Heuristic evaluation. The data contains the results form Nielsen’s Heuristic Evaluation conducted by three usability experts. (CSV 116 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Latest release of source code. A zip file of the source code from release 0.1.8. Accessed 4 May 2018. (ZIP 3513 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 28 verified Js locations in Brazil with complete contact information, ratings, reviews, and location data.
Facebook
TwitterThis dataset provides information about the number of properties, residents, and average property values for Js Waters School Road cross streets in Goldston, NC.
Facebook
Twitterjs-hyun/preprocess-videomme-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Leon
Released under Apache 2.0
Facebook
TwitterSubscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Facebook
Twittersaurabh5/rlvr-code-data-JavaScript dataset hosted on Hugging Face and contributed by the HF Datasets community