Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data and analysis from an empirical study investigating the adoption trends of modern JavaScript features introduced with ECMAScript 6 (ES6) and beyond. By mining the source code history of 158 open-source JavaScript projects, the study identifies efforts to rejuvenate legacy code by replacing outdated constructs with modern ones. The findings highlight the extensive use of modern features, their widespread adoption within one to two years after ES6's release, and ongoing trends in the rejuvenation of JavaScript codebases.
scripts.zip: Contains Python scripts used to analyze data and generate the graphs presented in the study's results.
jsminer-tool.zip: Includes the tool developed to analyze GitHub repository history and collect metrics on the adoption of modern JavaScript features.
jsminer_database_backup.zip: Provides a PostgreSQL database dump containing all code review comments from the repositories analyzed in the study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scraped data from top 1 million domains as reported by Majestic 1 Million on June 5th, 2022. The homepage of each domain is scraped and all encountered javascript script source URLs are extracted.
You can find the source code at github.com/get-set-fetch/scraper and detailed documentation at getsetfetch.org.
Facebook
TwitterThere's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantifying Security Issues in Reusable JavaScript Actions in GitHub Workflows
Overview
This replication package contains all the material required to replicate the analyses we made for our paper entitled Quantifying Security Issues in Reusable JavaScript Actions in GitHub Workflows, which has been accepted for publication at the MSR 2024 (the 21st International Conference on Mining Software Repositories). The materials provided here will guide you through the process of replicating our research findings.
This research is supported by the Fonds de la Recherche Scientifique - FNRS under grant numbers T.0149.22, F.4515.23, and J.0147.24.
Requirements
Before you proceed with replicating our analysis, ensure that you have the following prerequisites installed on your system:
Python 3.8 or higher
Dependencies listed in the requirements.txt file
Getting Started
To begin replicating our analysis, follow these steps:
Clone this repository to your local machine:
Navigate to the cloned directory:
Set up a Jupyter Lab environment to execute the provided notebooks.
Install the required dependencies using the requirements.txt file:
pip install -r requirements.txt
Data Replication
The data-raw folder contains all the data required to replicate the analysis. These data were obtained by running various notebooks. Here is a list of the notebooks and their resulting CSV files:
Extract Actions - actions.csv
Extract Releases - releases.csv
Extract Actions Type - types.csv
Check Manifests and Extract Dependencies - lock_dependencies.csv
Check Vulnerabilities - vulnerabilities.csv
Extract JS Entry Points and CodeQL Results - codeql_results_raw.csv, codeql_queries.csv
Extract Dependents - dependents.csv
Research Questions and Analysis
The data folder contains all the data required to replicate the paper-story notebook and the research questions. The research and analysis presented in the paper are based on two final datasets created from the data-raw files as follows:
Vulnerabilities in Dependency Network of Actions - actions_dependencies_vulnerabilities.parquet
Security Weaknesses in JavaScript Code of Actions - actions_code_vulnerabilities.parque
Facebook
TwitterA NPM package for get data of Lëtzebuerger Online Dictionnaire (LOD) from data.public.lu. Repo on Github : https://github.com/robertoentringer/lod-opendata Npm package : https://www.npmjs.com/package/lod-opendata
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Content of this repository
This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge
Github Repository with the software used : here.
=======
DATASET
The dataset was retrived utilizing google bigquery and dumped to a csv
file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information :
1. The Id of the question (PostId)
2. The Content (in this case the code block)
3. the lenght of the code block
4. the line count of the code block
5. The score of the post
6. The title
A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.
Filtered Dataset:
Extracting code from CSV
We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.
Running ESlint
Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.
Number of Violations per Rule
This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.
Number of violations per Category
As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.
Individual Reports
This information was extracted from the json reports, it's a csv file with PostID and violations per rule.
Rules
The file Rules with categories contains all the rules used and their categories.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
Facebook
Twitter
According to our latest research, the global Web Skimming JavaScript Protections market size reached USD 1.22 billion in 2024, demonstrating robust demand for advanced security solutions that safeguard digital assets. The market is expanding at a CAGR of 17.6% and is forecasted to attain a value of USD 5.19 billion by 2033. This growth is primarily fueled by the escalating threat landscape, with sophisticated web skimming attacks targeting sensitive payment data and personal information across online platforms. The increasing adoption of digital transactions and the proliferation of e-commerce have further intensified the need for comprehensive JavaScript protection solutions, making this sector one of the fastest-growing segments in the cybersecurity industry as of 2025.
The primary growth driver for the Web Skimming JavaScript Protections market is the exponential rise in web-based attacks, particularly those exploiting JavaScript vulnerabilities. Cybercriminals have become increasingly adept at injecting malicious scripts into legitimate websites, enabling them to steal sensitive payment card information and personal data. This trend has compelled organizations, especially in sectors like e-commerce and BFSI, to prioritize the deployment of advanced security solutions capable of detecting and neutralizing such threats in real time. The growing awareness of regulatory requirements, such as GDPR, PCI DSS, and other data protection mandates, is also pushing businesses to invest in robust JavaScript security frameworks, bolstering overall market growth.
Another significant factor propelling the market is the rapid digital transformation across industries. As organizations accelerate their migration to cloud-based infrastructure and adopt digital channels to enhance customer engagement, their attack surface expands, making them more susceptible to web skimming attacks. This shift has heightened the demand for scalable and adaptable security solutions that can be seamlessly integrated into dynamic web environments. The emergence of advanced threat detection technologies, such as AI-driven anomaly detection and behavior analytics, has further enhanced the effectiveness of JavaScript protection solutions, enabling organizations to proactively identify and mitigate evolving threats. These technological advancements are expected to play a critical role in sustaining the market’s upward trajectory over the forecast period.
The increasing complexity of the threat landscape, coupled with the rise of sophisticated attack vectors such as Magecart and formjacking, has led to a surge in demand for multi-layered security solutions. Enterprises are now seeking comprehensive protection strategies that encompass web application firewalls, content security policy management, and bot mitigation, among other measures. The convergence of these solutions within integrated security platforms offers organizations a holistic approach to web skimming protection, minimizing the risk of data breaches and financial losses. Furthermore, the growing adoption of managed security services by small and medium enterprises (SMEs) is democratizing access to advanced JavaScript protection tools, thereby expanding the addressable market and driving overall growth.
Regionally, North America continues to dominate the Web Skimming JavaScript Protections market, accounting for the largest share due to the high concentration of e-commerce platforms and stringent regulatory frameworks. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitization, increasing internet penetration, and a burgeoning online retail sector. Europe also represents a significant market, driven by robust data protection laws and the growing sophistication of cyber threats. Latin America and the Middle East & Africa are witnessing steady growth, with governments and enterprises increasingly recognizing the importance of web security. This global expansion underscores the critical need for effective JavaScript protection solutions across diverse industries and geographies.
Facebook
TwitterUse of a persistent identifier for access to journal articles (the DOI) is now almost universal amongst researchers. It directs to the journal landing page where the human has to then take over navigation (or payment). Recently, the deposition of data into open access repositories and the resulting assignment of a data-DOI to the data or fileset has started to be increasingly adopted, and in the near future probably mandated by funders. Unfortunately, mechanisms for the retrieval and application of the data from such sources are still inherited from those developed for journal articles. We argue these mechanisms are not fit for (data) purpose. In these three demonstrations, we show how existing standards can be used to automate the data retrieval process, starting purely from the DOI assigned to the objects. The first of these utilises the 10320/loc method (see doi:10.1021/ci500302p) which is flexible and efficient, but is not supported by the DataCite registry. The next two schemes were developed to achieve such interoperability, the first using the DataCite Media API and the second exploiting added metadata such as relatedMetadataScheme = ORE to use the repository ORE resource map. We have embedded these methods into a Javascript-based data viewing demonstrator (JSmol), which is designed to display molecular information. Handlers for other types of data could be readily incorporated, and the system could also be exploited for data-mining. Examples of recently published journal articles which use such data-DOI handling will be cited.
Facebook
TwitterDownload or connect to open data endpoints Get data Download data as spreadsheet, kml, shapefile or connect to service APIs to stay up to date. Create maps Create maps, analyse and discover trends. Watch video instructions. Code apps Make applications with our data.ArGIS API for Javascript. Categories City Council Assets, amenities and public space Council services and facilities Culture, leisure and sport Economy and business Environment and climate Planning Transport and access View all Terms Unless otherwise stated, data products available from the data hub are published under Creative Commons licences. For terms of use and more information see site Disclaimer. Contact If you have a question, comments, or requests for interactive maps and data, we would love to hear from you. Council business For information on rates, development applications, strategies, reports and other council business, see the City of Sydney's main website.
Facebook
TwitterAs of 2025, JavaScript and HTML/CSS are the most commonly used programming languages among software developers around the world, with more than 66 percent of respondents stating that they used JavaScript and just around 61.9 percent using HTML/CSS. Python, SQL, and Bash/Shell rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3305472%2F9b6db5199f19afd866a33bc89e56ef07%2F1706183889560.jpeg?generation=1712978213724236&alt=media" alt="">Understanding JSON Data Extraction:
Have you ever wondered how datasets are prepared from JSON after calling their APIs? This repository aims to demystify this process by providing five JSON files for exploration. Each file represents a snapshot of data obtained from different API endpoints.
Dataset Overview:
Data Source: API endpoints providing JSON data. File Formats: JSON (JavaScript Object Notation). Number of Files: 5 Total Records: Varies across files. Data Exploration:
Each JSON file contains structured data representing various aspects of the dataset. Explore different attributes and nested structures within the JSON files. Understand how to navigate and extract relevant information using programming languages like Python.
Included Files:
file1.json file2.json file3.json file4.json file5.json
Final Dataset: Zomato_Final_Data.csv
After extracting and preprocessing data from the five JSON files, a consolidated data frame has been created. The data frame provides a unified view of the data, facilitating analysis and modeling tasks.
Contribute Your Version: Feel free to contribute your code snippets for data extraction. Share your insights and techniques with the community to foster learning and collaboration.
Acknowledgements: Special thanks to Krish Naik and Zomato for providing the data used in this repository.
Feedback and Support: For any questions, feedback, or assistance, please reach out via [contact information]. Feel free to adjust any sections or add more details according to your specific dataset and preferences!
Facebook
Twitter
According to our latest research, the global JavaScript Integrity Monitoring Network market size reached USD 1.24 billion in 2024, and is projected to grow at a robust CAGR of 14.2% from 2025 to 2033. By the end of 2033, the market is forecasted to attain a value of USD 4.03 billion. This significant growth is being driven by the increasing prevalence of cyber threats targeting web applications, the rising complexity of JavaScript-based attacks, and the growing emphasis on regulatory compliance and data protection across diverse industries.
The primary growth factor propelling the JavaScript Integrity Monitoring Network market is the exponential rise in sophisticated cyberattacks, particularly those exploiting JavaScript vulnerabilities in web applications. As organizations increasingly rely on dynamic, client-side scripts to deliver seamless digital experiences, the attack surface for malicious actors has expanded considerably. High-profile incidents involving formjacking, supply chain attacks, and cross-site scripting have underscored the urgent need for advanced integrity monitoring solutions that can detect unauthorized code changes in real time. Enterprises are now prioritizing proactive security measures, integrating JavaScript integrity monitoring tools to safeguard customer data, maintain brand reputation, and ensure business continuity in an evolving threat landscape.
Another critical driver is the tightening regulatory environment, with governments and industry bodies imposing stringent standards for data privacy and security. Regulations such as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and various sector-specific mandates require organizations to implement robust security controls for web applications. JavaScript integrity monitoring solutions play a pivotal role in achieving compliance by providing continuous oversight, detailed audit trails, and automated alerts for suspicious activities. As penalties for non-compliance grow more severe, organizations across BFSI, healthcare, retail, and other sectors are accelerating investments in comprehensive web security frameworks, further fueling market growth.
The surge in digital transformation initiatives across industries is also contributing to the expansion of the JavaScript Integrity Monitoring Network market. Businesses are embracing cloud-native architectures, microservices, and third-party integrations to enhance agility and customer engagement. However, these advancements introduce new complexities and dependencies in the web application ecosystem, making it increasingly challenging to manage and monitor code integrity. JavaScript integrity monitoring networks offer scalable, automated, and real-time protection against emerging threats, enabling organizations to innovate securely without compromising on user experience or operational efficiency.
Regionally, North America continues to dominate the JavaScript Integrity Monitoring Network market, accounting for the largest revenue share in 2024, followed by Europe and Asia Pacific. The high adoption rate of advanced cybersecurity solutions, the presence of leading technology providers, and the early implementation of regulatory standards have positioned North America as a frontrunner in this domain. Meanwhile, Asia Pacific is witnessing the fastest growth, driven by rapid digitalization, increasing cybercrime incidents, and rising awareness of web security best practices among enterprises and government agencies.
The JavaScript Integrity Monitoring Network market is segmented by component into software, hardware, and services. The software segment currently holds the largest share, owing to the critical role of specialized applications in detecting, analyzing, and mitigating JavaScript-based threats. These solutions are designed to provide real-time monitoring, anomaly detection, and automated response capabilities, making them indispensable for organizations seeking to protec
Facebook
TwitterWith exception of metabolic simulations performed using TIMES (version 2.31.2.82), all work was performed using Python (version 3.10.4) run with IPython (version 8.4.0) in JupyterLab (version 3.3.2). The Toolbox API (OECD Toolbox version 4.5 with Service Pack 1 update, API version 6), and BioTransformer (Wishart Lab, version 3.0, executable Java Archive, June 15, 2022 release) were used for automated metabolic simulations. Efficient batch execution of metabolism simulations was handled via parallel processing multiple individual calls to either BioTransformer or the Toolbox API via the “multiprocess” package. The command line interface (CLI) calls needed to interact with BioTransformer were executed via the “subprocess” package, and the Toolbox API was queried via its Swagger user interface hosted on a locally running Windows Desktop instance of the Toolbox Server. The data generated from the MetSim hierarchical schema were translated into JavaScript Object Notation (JSON) format using Python. The resulting data were inserted into a Mongo Database (MongoDB) using the “pymongo” package for efficient storage and retrieval. The code repository including all Jupyter Notebooks documenting the analysis performed and the MetSim framework are available at https://github.com/patlewig/metsim. Data files needed to reproduce the analysis are provided at https://doi.org/10.23645/epacomptox.25463926 and as Supporting Information. This dataset is associated with the following publication: Groff, L., A. Williams, I. Shah, and G. Patlewicz. MetSim: Integrated Programmatic Access and Pathway Management for Xenobiotic Metabolism Simulators. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 37(5): 685-697, (2024).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided comes from the Local Government Association of South Australia’s statewide Metabase which is part of its Electronic Services Program initiative. For more information on the Electronic Services Program. This data is used to support the Local Government Association's My Local Services App http://www.lga.sa.gov.au/mylocal initiative. API for the following statewide Local Government datasets: Elected Members (Mayors and Councillors), Events, Libraries, Parks and Councils. The following SDK’s are available for developers to access data stored in Parse: iOS, OSX, Android, JavaScript, .Net, REST API.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
The popularity and wide adoption of JavaScript both at the client and server-side makes its code analysis more essential than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Luckily, there are quite a few tools to get this job done already. However, their performance in vitro and especially in vivo has not yet been extensively compared and evaluated.
In this paper, we systematically compare five static and two dynamic approaches for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and two static and two dynamic methods on 12 real-world Node.js modules. The tools under examination using static techniques were npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript. We performed dynamic analyzes relying on the nodejs-cg tool (a customized Node.js runtime) and the NodeProf instrumentation and profiling framework.
We provide a quantitative evaluation of the results, and a result quality analysis based on 941 manually validated call edges. On the SunSpider programs, which do not take any inputs, so dynamic extraction could be complete, all the static tools also performed well. For example, TAJS found 93% of all edges while having a 97% precision compared to the precise dynamic call graph. When it comes to real-world Node.js modules, our evaluation shows that static tools struggle with parsing the code and fail to detect a significant amount of call edges that dynamic approaches can capture. Nonetheless, a significant number of edges not detected by dynamic approaches are also reported. Among these, however, there are also edges that are real, but for some reason the unit tests did not execute the branches in which these calls were included.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS. Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts. We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License. 1. MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB. 2. VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB. 3. AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB. Credits Authors Billy Bob Brumley (Tampere University, Tampere, Finland) Juha Nurmi (Tampere University, Tampere, Finland) Mikko Niemelä (Cyber Intelligence House, Singapore) Funding This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS). Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset we used in our paper entitled "Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model". The manually validated dataset contains various several static source code metrics along with vulnerability fixing hashes for numerous vulnerabilities. For more details, you can read the paper here.
Security has become a central and unavoidable aspect of today’s software development. Practitioners and researchers have proposed many code analysis tools and techniques to mitigate security risks. These tools apply static and dynamic analysis or, more recently, machine learning. Machine learning models can achieve impressive results in finding and forecasting possible security issues in programs. However, there are at least two areas where most of the current approaches fall short of developer demands: explainability and granularity of predictions. In this paper, we propose a novel and simple yet, promising approach to identify potentially vulnerable source code in JavaScript programs. The model improves the state-of-the-art in terms of explainability and prediction granularity as it gives results at the level of individual source code lines, which is fine-grained enough for developers to take immediate actions. Additionally, the model explains each predicted line (i.e., provides the most similar vulnerable line from the training set) using a prototype-based approach. In a study of 186 real-world and confirmed JavaScript vulnerability fixes of 91 projects, the approach could flag 60% of the known vulnerable lines on average by marking only 10% of the code-base, but in certain cases the model identified 100% of the vulnerable code lines while flagging only 8.72% of the code-base.
If you wish to use our dataset, please cite this dataset, or the corresponding paper:
@inproceedings{mosolygo2021towards, title={Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model}, author={Mosolyg{\'o}, Bal{\'a}zs and V{\'a}ndor, Norbert and Antal, G{\'a}bor and Heged{\H{u}}s, P{\'e}ter and Ferenc, Rudolf}, booktitle={2021 International Conference on Code Quality (ICCQ)}, pages={15--25}, year={2021}, organization={IEEE} }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is the result of three crawls of the web performed in May 2018. The data contains raw crawl data and instrumentation captured by OpenWPM-Mobile, as well as analysis that identifies which scripts access mobile sensors, which ones perform some of browser fingerprinting, as well as clustering of scripts based on their intended use. The dataset is described in the included README.md file; more details about the methodology can be found in our ACM CCS'18 paper: Anupam Das, Gunes Acar, Nikita Borisov, Amogh Pradeep. The Web's Sixth Sense: A Study of Scripts Accessing Smartphone Sensors. In Proceedings of the 25th ACM Conference on Computer and Communications Security (CCS), Toronto, Canada, October 15–19, 2018. (Forthcoming)
Facebook
TwitterThese layers are used in the The U.S. Vessel Traffic application; a web-based visualization and data-access utility created by Esri. Explore U.S. maritime activity, look for patterns of vessel activity such as around ports and fishing grounds, or download manageable subsets of this massive data set. Vessel traffic data are an invaluable resource made available to our community by the US Coast Guard, NOAA and BOEM through Marine Cadastre. This information can help marine spatial planners better understand users of ocean space and identify potential space-use conflicts.To download this data for your own analysis, explore the Download Options, navigate to a NOAA Electronic Navigation Chart area of interest, and make your selection. This data was sourced from the Automatic Identification System (AIS) provided by USCG, NOAA, and BOEM through Marine Cadastre and aggregated for visualization and sharing in ArcGIS Pro. This application was built with the ArcGIS API for JavaScript.Access this data as an ArcGIS Online collection here. Learn more about AIS tracking here. Find more ocean and maritime resources in Living Atlas. Inquiries can be sent to Keith VanGraafeiland.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data and analysis from an empirical study investigating the adoption trends of modern JavaScript features introduced with ECMAScript 6 (ES6) and beyond. By mining the source code history of 158 open-source JavaScript projects, the study identifies efforts to rejuvenate legacy code by replacing outdated constructs with modern ones. The findings highlight the extensive use of modern features, their widespread adoption within one to two years after ES6's release, and ongoing trends in the rejuvenation of JavaScript codebases.
scripts.zip: Contains Python scripts used to analyze data and generate the graphs presented in the study's results.
jsminer-tool.zip: Includes the tool developed to analyze GitHub repository history and collect metrics on the adoption of modern JavaScript features.
jsminer_database_backup.zip: Provides a PostgreSQL database dump containing all code review comments from the repositories analyzed in the study.