2 datasets found

Z
Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based...
data.niaid.nih.gov
zenodo.org
Updated Nov 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antal, Gábor (2020). Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics (Training Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4281475
Explore at:
Dataset updated
Nov 21, 2020
Dataset provided by
Ferenc, Rudolf
Hegedűs, Péter
Antal, Gábor
Tóth, Zoltán Gábor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of multiple files which contain bug prediction training data.

The entries in the dataset are JavaScript functions either being buggy or non-buggy. Bug related information was obtained from the project EsLint contained in BugsJS (https://github.com/BugsJS/eslint). The buggy instances were collected throughout the lifetime of the project, however we added non-buggy entries from the latest version which is tagged as fix (entries which were previously included as buggy were not included as non-buggy later on).

The dataset is based on hybrid call graphs which are constructed by https://github.com/sed-szeged/hcg-js-framework. The result of this tool is a call graph where the edges are associated with a confidence level which shows how likely the given edge is a valid call edge.

We used different threshold values from which we considered the edges to be valid. The following threshold values were used:

0.00

0.05

0.20

0.30

The prefix in the dataset file names are coming from the used threshold. The the datasets include coupling metrics NII (Nubmer of Incoming Invocations) and NOI (Number of Outgoing Invocations) which were calculated by a static source code analyzer called SourceMeter. Hybrid counterparts of these metrics (HNII and HNOI) are based on the given threshold values.

There are four variants for all of these datasets:

Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics and information about the entries (file without any postfix). Column contained only in this dataset are:

ID

Name

Longname

Parent ID

Component ID

Path

Line

Column

EndLine

EndColumn

Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h+s' postfix)

Only static (NII, NOI) coupling metrics are included with additional static source code metrics (file with '_s' postfix)

Only hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h' postfix)

Static source code metrics which are contained in all dataset are the following:

McCC - McCabe Cyclomatic Complexity

NL - Nesting Level

NLE - Nesting Level Else If

CD - Comment Density

CLOC - Comment Lines of Code

DLOC - Documentation Lines of Code

TCD - Total Comment Density (Comment Lines in an emedded function will be also considered)

TCLOC - Total Comment Lines of Code (Comment Lines in an emedded function will be also considered)

LLOC - Logical Lines of Code (Comment and empty lines not counted)

LOC - Lines of Code (Comment and empty lines are counted)

NOS - Number of Statements

NUMPAR - Number of Parameters

TLLOC - Logical Lines of Code (Lines in embedded functions are also counted)

TLOC - Lines of Code (Lines in embedded functions are also counted)

TNOS - Total Number of Statements (Statements in embedded functions are also counted)
DATS 6401 - Final Project - Yon ho Cheong.zip
figshare.com
zip
Updated Dec 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yon ho Cheong (2018). DATS 6401 - Final Project - Yon ho Cheong.zip [Dataset]. http://doi.org/10.6084/m9.figshare.7471007.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7471007.v1
Dataset updated
Dec 15, 2018
Dataset provided by
figshare
Authors
Yon ho Cheong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Antal, Gábor (2020). Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics (Training Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4281475

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics (Training Dataset)

Explore at:

Dataset updated

Nov 21, 2020

Dataset provided by

Ferenc, Rudolf
Hegedűs, Péter
Antal, Gábor
Tóth, Zoltán Gábor

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset consists of multiple files which contain bug prediction training data.

The entries in the dataset are JavaScript functions either being buggy or non-buggy. Bug related information was obtained from the project EsLint contained in BugsJS (https://github.com/BugsJS/eslint). The buggy instances were collected throughout the lifetime of the project, however we added non-buggy entries from the latest version which is tagged as fix (entries which were previously included as buggy were not included as non-buggy later on).

The dataset is based on hybrid call graphs which are constructed by https://github.com/sed-szeged/hcg-js-framework. The result of this tool is a call graph where the edges are associated with a confidence level which shows how likely the given edge is a valid call edge.

We used different threshold values from which we considered the edges to be valid. The following threshold values were used:

0.00

0.05

0.20

0.30

The prefix in the dataset file names are coming from the used threshold. The the datasets include coupling metrics NII (Nubmer of Incoming Invocations) and NOI (Number of Outgoing Invocations) which were calculated by a static source code analyzer called SourceMeter. Hybrid counterparts of these metrics (HNII and HNOI) are based on the given threshold values.

There are four variants for all of these datasets:

Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics and information about the entries (file without any postfix). Column contained only in this dataset are:

Name

Longname

Parent ID

Component ID

Path

Line

Column

EndLine

EndColumn

Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h+s' postfix)

Only static (NII, NOI) coupling metrics are included with additional static source code metrics (file with '_s' postfix)

Only hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h' postfix)

Static source code metrics which are contained in all dataset are the following:

McCC - McCabe Cyclomatic Complexity

NL - Nesting Level

NLE - Nesting Level Else If

CD - Comment Density

CLOC - Comment Lines of Code

DLOC - Documentation Lines of Code

TCD - Total Comment Density (Comment Lines in an emedded function will be also considered)

TCLOC - Total Comment Lines of Code (Comment Lines in an emedded function will be also considered)

LLOC - Logical Lines of Code (Comment and empty lines not counted)

LOC - Lines of Code (Comment and empty lines are counted)

NOS - Number of Statements

NUMPAR - Number of Parameters

TLLOC - Logical Lines of Code (Lines in embedded functions are also counted)

TLOC - Lines of Code (Lines in embedded functions are also counted)

TNOS - Total Number of Statements (Statements in embedded functions are also counted)

Clear search

Close search

Google apps

Main menu

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based...

DATS 6401 - Final Project - Yon ho Cheong.zip

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics (Training Dataset)