Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of multiple files which contain bug prediction training data.
The entries in the dataset are JavaScript functions either being buggy or non-buggy. Bug related information was obtained from the project EsLint contained in BugsJS (https://github.com/BugsJS/eslint). The buggy instances were collected throughout the lifetime of the project, however we added non-buggy entries from the latest version which is tagged as fix (entries which were previously included as buggy were not included as non-buggy later on).
The dataset is based on hybrid call graphs which are constructed by https://github.com/sed-szeged/hcg-js-framework. The result of this tool is a call graph where the edges are associated with a confidence level which shows how likely the given edge is a valid call edge.
We used different threshold values from which we considered the edges to be valid. The following threshold values were used:
0.00
0.05
0.20
0.30
The prefix in the dataset file names are coming from the used threshold. The the datasets include coupling metrics NII (Nubmer of Incoming Invocations) and NOI (Number of Outgoing Invocations) which were calculated by a static source code analyzer called SourceMeter. Hybrid counterparts of these metrics (HNII and HNOI) are based on the given threshold values.
There are four variants for all of these datasets:
Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics and information about the entries (file without any postfix). Column contained only in this dataset are:
ID
Name
Longname
Parent ID
Component ID
Path
Line
Column
EndLine
EndColumn
Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h+s' postfix)
Only static (NII, NOI) coupling metrics are included with additional static source code metrics (file with '_s' postfix)
Only hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h' postfix)
Static source code metrics which are contained in all dataset are the following:
McCC - McCabe Cyclomatic Complexity
NL - Nesting Level
NLE - Nesting Level Else If
CD - Comment Density
CLOC - Comment Lines of Code
DLOC - Documentation Lines of Code
TCD - Total Comment Density (Comment Lines in an emedded function will be also considered)
TCLOC - Total Comment Lines of Code (Comment Lines in an emedded function will be also considered)
LLOC - Logical Lines of Code (Comment and empty lines not counted)
LOC - Lines of Code (Comment and empty lines are counted)
NOS - Number of Statements
NUMPAR - Number of Parameters
TLLOC - Logical Lines of Code (Lines in embedded functions are also counted)
TLOC - Lines of Code (Lines in embedded functions are also counted)
TNOS - Total Number of Statements (Statements in embedded functions are also counted)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of multiple files which contain bug prediction training data.
The entries in the dataset are JavaScript functions either being buggy or non-buggy. Bug related information was obtained from the project EsLint contained in BugsJS (https://github.com/BugsJS/eslint). The buggy instances were collected throughout the lifetime of the project, however we added non-buggy entries from the latest version which is tagged as fix (entries which were previously included as buggy were not included as non-buggy later on).
The dataset is based on hybrid call graphs which are constructed by https://github.com/sed-szeged/hcg-js-framework. The result of this tool is a call graph where the edges are associated with a confidence level which shows how likely the given edge is a valid call edge.
We used different threshold values from which we considered the edges to be valid. The following threshold values were used:
0.00
0.05
0.20
0.30
The prefix in the dataset file names are coming from the used threshold. The the datasets include coupling metrics NII (Nubmer of Incoming Invocations) and NOI (Number of Outgoing Invocations) which were calculated by a static source code analyzer called SourceMeter. Hybrid counterparts of these metrics (HNII and HNOI) are based on the given threshold values.
There are four variants for all of these datasets:
Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics and information about the entries (file without any postfix). Column contained only in this dataset are:
ID
Name
Longname
Parent ID
Component ID
Path
Line
Column
EndLine
EndColumn
Both static (NII, NOi) and hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h+s' postfix)
Only static (NII, NOI) coupling metrics are included with additional static source code metrics (file with '_s' postfix)
Only hybrid (HNII, HNOI) coupling metrics are included with additional static source code metrics (file with '_h' postfix)
Static source code metrics which are contained in all dataset are the following:
McCC - McCabe Cyclomatic Complexity
NL - Nesting Level
NLE - Nesting Level Else If
CD - Comment Density
CLOC - Comment Lines of Code
DLOC - Documentation Lines of Code
TCD - Total Comment Density (Comment Lines in an emedded function will be also considered)
TCLOC - Total Comment Lines of Code (Comment Lines in an emedded function will be also considered)
LLOC - Logical Lines of Code (Comment and empty lines not counted)
LOC - Lines of Code (Comment and empty lines are counted)
NOS - Number of Statements
NUMPAR - Number of Parameters
TLLOC - Logical Lines of Code (Lines in embedded functions are also counted)
TLOC - Lines of Code (Lines in embedded functions are also counted)
TNOS - Total Number of Statements (Statements in embedded functions are also counted)