Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset we used in our paper entitled "Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model". The manually validated dataset contains various several static source code metrics along with vulnerability fixing hashes for numerous vulnerabilities. For more details, you can read the paper here.
Security has become a central and unavoidable aspect of today’s software development. Practitioners and researchers have proposed many code analysis tools and techniques to mitigate security risks. These tools apply static and dynamic analysis or, more recently, machine learning. Machine learning models can achieve impressive results in finding and forecasting possible security issues in programs. However, there are at least two areas where most of the current approaches fall short of developer demands: explainability and granularity of predictions. In this paper, we propose a novel and simple yet, promising approach to identify potentially vulnerable source code in JavaScript programs. The model improves the state-of-the-art in terms of explainability and prediction granularity as it gives results at the level of individual source code lines, which is fine-grained enough for developers to take immediate actions. Additionally, the model explains each predicted line (i.e., provides the most similar vulnerable line from the training set) using a prototype-based approach. In a study of 186 real-world and confirmed JavaScript vulnerability fixes of 91 projects, the approach could flag 60% of the known vulnerable lines on average by marking only 10% of the code-base, but in certain cases the model identified 100% of the vulnerable code lines while flagging only 8.72% of the code-base.
If you wish to use our dataset, please cite this dataset, or the corresponding paper:
@inproceedings{mosolygo2021towards,
title={Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model},
author={Mosolyg{\'o}, Bal{\'a}zs and V{\'a}ndor, Norbert and Antal, G{\'a}bor and Heged{\H{u}}s, P{\'e}ter and Ferenc, Rudolf},
booktitle={2021 International Conference on Code Quality (ICCQ)},
pages={15--25},
year={2021},
organization={IEEE}
}
The datatablesview extension for CKAN enhances the display of tabular datasets within CKAN by integrating the DataTables JavaScript library. As a fork of a previous DataTables CKAN plugin, this extension aims to provide improved functionality and maintainability for presenting data in a user-friendly and interactive tabular format. This tool focuses on making data more accessible and easier to explore directly within the CKAN interface. Key Features: Enhanced Data Visualization: Transforms standard CKAN dataset views into interactive tables using the DataTables library, providing a more engaging user experience compared to plain HTML tables. Interactive Table Functionality: Includes features such as sorting, filtering, and pagination within the data table, allowing users to easily navigate and analyze large datasets directly in the browser. Improved Data Accessibility: Makes tabular data more accessible to a wider range of users by providing intuitive tools to explore and understand the information. Presumed Customizable Appearance: Given that it is based on DataTables, users will likely be able to customize the look and feel of the tables through DataTables configuration options (note: this is an assumption based on standard DataTables usage and may require coding). Use Cases (based on typical DataTables applications): Government Data Portals: Display complex government datasets in a format that is easy for citizens to search, filter, and understand, enhancing transparency and promoting data-driven decision-making. For example, presenting financial data, population statistics, or environmental monitoring results. Research Data Repositories: Allow researchers to quickly explore and analyze large scientific datasets directly within the CKAN interface, facilitating data discovery and collaboration. Corporate Data Catalogs: Enable business users to easily access and manipulate tabular data relevant to their roles, improving data literacy and enabling data-informed business strategies. Technical Integration (inferred from CKAN extension structure): The extension likely operates by leveraging CKAN's plugin architecture to override the default dataset view for tabular data. Its implementation likely uses CKAN's templating system to render datasets using DataTables' JavaScript and CSS, enhancing data-viewing experience. Benefits & Impact: By implementing the datatablesview extension, organizations can improve the user experience when accessing and exploring tabular datasets within their CKAN instances. The enhanced interactivity and data exploration features can lead to increased data utilization, improved data literacy, and more effective data-driven decision-making within organizations and communities.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Language | Number of Samples |
Java | 153,119 |
Ruby | 233,710 |
Go | 137,998 |
JavaScript | 373,598 |
Python | 472,469 |
PHP | 294,394 |
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.
This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.
This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
This data set contains individual search sessions from the transaction log of the academic search engine sowiport (www.sowiport.de). The data was collected over a period of one year (between 2nd April 2014 and 2nd April 2015). The web server log files and specific javascript-based logging techniques were used to capture the usage behaviour within the system. All activities are mapped to a list of 58 actions. This list covers all types of activities and pages that can be carried out/visited within the system (e.g. typing a query, visiting a document, selecting a facet, etc.). For each action, a session id, the date stamp and additional information (e.g. queries, document ids, and result lists) are stored. The session id is assigned via browser cookie and allows tracking user behaviour over multiple searches. Based on the session id and date stamp, the step in which an action is conducted and the length of the action is included in the data set as well. The data set contains 558,008 individual search sessions and a total of 7,982,427 logs entries. The average number of actions per search session is 7.
This work was funded by Deutsche Forschungsgemeinschaft (DFG), grant no. MA 3964/5-1; the AMUR project at GESIS.
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.
This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.
This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
The CodeSearchNet Corpus is a large dataset of functions with associated documentation written in Go, Java, JavaScript, PHP, Python, and Ruby from open source projects on GitHub. The CodeSearchNet Corpus includes: * Six million methods overall * Two million of which have associated documentation (docstrings, JavaDoc, and more) * Metadata that indicates the original location (repository or line number, for example) where the data was found
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Billy Bob Brumley (Tampere University, Tampere, Finland)
Juha Nurmi (Tampere University, Tampere, Finland)
Mikko Niemelä (Cyber Intelligence House, Singapore)
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In recent years, browsers reduced the identifying information in user-agent strings to enhance user privacy. However, Chrome has also introduced high-entropy user-agent client hints (UA-CH) and new JavaScript API to provide access to specific browser details. The study assesses the impact of these changes on the top 100,000 websites by using an instrumented crawler to measure access to high-entropy browser features via UA-CH HTTP headers and the JavaScript API. It also investigates whether tracking, advertising, and browser fingerprinting scripts have started using these new client hints and the JavaScript API.
By Asuman Senol and Gunes Acar. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society.
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.
This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.
This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SIMPATICO logs for the user evaluation of Trento in project iterations 1 and 2
The current package contains the Interaction LOG data captured in the Trento two evaluations of the results of H2020 project SIMPATICO that were undertaken from September 2017 to January 2019. The data is exported from the Elasticsearch instance that was used to log all of the interaction data. The data model for this can be found in project deliverable "D3.3 Advanced Methods And Tools For User Interaction Automation". For more information about the setup for conducting the tests and the results achieved please consult project deliverable "D6.6 SIMPATICO Evaluation Report v2". All project deliverables, except where noted, are public and are available at Zenodo community reachable at https://zenodo.org/communities/h2020-simpatico-692819.
The following caveats need to be highlighted for this data set: - The format is JSON (Javascript objects) as provided by Elasticsearch.
https://spectrum.library.concordia.ca/policies.html#TermsOfAccesshttps://spectrum.library.concordia.ca/policies.html#TermsOfAccess
This book is the result of teaching the laboratory component of an introductory course in Database Systems in the Department of Computer Science & Software Engineering, Concordia University, Montreal.. The intent of this part of the course was to have the students create a practical web-based application wherein the database forms the dynamic component of a real life application using a web browser as the user interface.
It was decided to use all open source software, namely, Apache web server, PHP, JavaScript and HTML, and also the open source database which started as MySQL and has since migrated to MariaDB.
The examples given in this book have been run successfully both using MySQL on a Windows platform and MariaDB on a Linux platform without any changes. However, the code may need to be updated as the underlying software systems evolve with time, as functions are deprecated and replaced by others. Hence the user is responsible for making any required changes to any code given in this book.
The readers are also warned of the changing privacy and data usage policy of most web sites. They should be aware that most web sites collect and mine user’s data for private profit.
The authors wish to acknowledge the contribution of many students in the introductory database course over the years whose needs and the involvement of one of the authors in the early days of the web prompted the start of this project in the late part of the 20th century. This was the era of dot com bubble
The User Tracking extension for CKAN provides an interface within the CKAN admin panel to monitor user activity and engagement. It aims to facilitate data-driven insights into how users interact with the CKAN platform by displaying user, organizational, and individual page engagement metrics. Furthermore, it includes a command-line interface (CLI) tool for exporting user activity data. Key Features: Activity Tracking Tab: Adds a dedicated "Activity tracking" tab to the CKAN admin page to display user engagement and activity data. Engagement Data: Presents data related to user, organizational, and individual page engagement. Database Tracking: Relies on creation of a useractivitytracker database table which is updated via Javascript embedded within CKAN pages with POST requests. Data Export: Offers a CLI command for exporting user activity data from the useractivitytracker table into a CSV file, covering a specified number of past days. MVC Structure: The tracked table data is displayed in an MVC format. Technical Integration: The extension integrates with CKAN by adding a new plugin that creates a middleware component. This middleware collects data via Javascript embedded throughout CKAN pages, sending POST requests to update the useractivitytracker table. Additional configuration may be required to activate the plugin through the ckan.plugins setting in the CKAN configuration file (ckan.ini). Benefits & Impact: By implementing the User Tracking extension. CKAN administrators can gain insights into user behavior and platform usage patterns. This data can inform decisions related to content optimization, user support, and overall platform improvement. Note that the extension is explicitly mentioned as compatible with CKAN 2.9, but compatibility with the recent versions remains untested or unmentioned.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present GHTraffic, a dataset of significant size comprising HTTP transactions extracted from GitHub data (i.e., from 04 August 2015 GHTorrent issues snapshot) and augmented with synthetic transaction data. This dataset facilitates reproducible research on many aspects of service-oriented computing.
The GHTraffic dataset comprises three different editions: Small (S), Medium (M) and Large (L). The S dataset includes HTTP transaction records created from google/guava repository. Guava is a popular Java library containing utilities and data structures. The M dataset includes records from the npm/npm project. It is the popular de-facto standard package manager for JavaScript. The L dataset contains data that were created by selecting eight repositories containing large and very active projects, including twbs/bootstrap, symfony/symfony, docker/docker, Homebrew/homebrew, rust-lang/rust, kubernetes/kubernetes, rails/rails, and angular/angular.js.
We also provide access to the scripts used to generate GHTraffic. Using these scripts, users can modify the configuration properties in the config.properties file in order to create a customised version of GHTraffic datasets for their own use. The readme.md file included in the distribution provides further information on how to build the code and run the scripts.
The GHTraffic scripts can be accessed by downloading the pre-configured VirtualBox image or by cloning the repository.
New Feed Enhancements - August 2022 The new road restrictions feed is available with links listed as Version 3. We will be retiring Versions 1 and 2 on October 1st, 2022. Version 3 includes both Versions 1 & 2. We have moved the Road Restrictions feed to a RESTful service allowing us to improve the delivery of the information. The benefits of this change include: The information is published faster. The Url is mirrored in both HTTP and HTTPS environments. More data formats available. For compatibility with existing customer applications, the original feed is available in the same formats as before. Format | Description ---|--- XML | Extensible Markup Language (XML) is a common data interchange format. XSD | XML Schema Document (XSD) contains information about the XML data structure. JSON | JavaScript Object Notation (JSON) is used to load easily into JavaScript enabled web pages. JSONP (.json) | Packaged JSON (JSONP) is JSON wrapped in a JavaScript function. JSONP can be reloaded without reloading the page, using the JSON (.json) data extension. Timestamp JSON | The Timestamp file is a small file that contains only the timestamp of the most recent update time in the dataset in JSON format. This file can be read quickly to determine if the main data file has changed. Timestamp JSONP (.json) | The Timestamp file in JSONP (.json) format.
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.
This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.
This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
PredictLeads Global Technographic Dataset delivers in-depth insights into technology adoption across millions of companies worldwide. Our dataset, sourced from HTML, JavaScript, and job postings, enables B2B sales, marketing, and data enrichment teams to refine targeting, enhance lead scoring, and optimize outreach strategies. By tracking 25,000+ technologies across 92M+ websites, businesses can uncover market trends, assess competitor technology stacks, and personalize their approach.
Use Cases:
✅ Enhance CRM Data – Enrich company records with detailed real-time technology insights. ✅ Targeted Sales Outreach – Identify prospects based on their tech stack and personalize outreach. ✅ Competitor & Market Analysis – Gain insights into competitor technology adoption and industry trends. ✅ Lead Scoring & Prioritization – Rank potential customers based on adopted technologies. ✅ Personalized Marketing – Craft highly relevant campaigns based on technology adoption trends.
API Attributes & Structure:
📌 PredictLeads Technographic Data is trusted by enterprises and B2B professionals for accurate, real-time technology intelligence, enabling smarter prospecting, data-driven marketing, and competitive analysis
PredictLeads Technology Detections Dataset https://docs.predictleads.com/v3/guide/technology_detections_dataset
HumanEval-X is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.