31 datasets found

Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
Data from: Towards a Prototype Based Explainable JavaScript Vulnerability...
zenodo.org
data.niaid.nih.gov
csv
Updated May 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc; Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc (2021). Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model [Dataset]. http://doi.org/10.5281/zenodo.4742161
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4742161
Dataset updated
May 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc; Balázs Mosolygó; Norbert Vándor; Gábor Antal; Péter Hegedűs; Rudolf Ferenc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset we used in our paper entitled "Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model". The manually validated dataset contains various several static source code metrics along with vulnerability fixing hashes for numerous vulnerabilities. For more details, you can read the paper here.

Security has become a central and unavoidable aspect of today’s software development. Practitioners and researchers have proposed many code analysis tools and techniques to mitigate security risks. These tools apply static and dynamic analysis or, more recently, machine learning. Machine learning models can achieve impressive results in finding and forecasting possible security issues in programs. However, there are at least two areas where most of the current approaches fall short of developer demands: explainability and granularity of predictions. In this paper, we propose a novel and simple yet, promising approach to identify potentially vulnerable source code in JavaScript programs. The model improves the state-of-the-art in terms of explainability and prediction granularity as it gives results at the level of individual source code lines, which is fine-grained enough for developers to take immediate actions. Additionally, the model explains each predicted line (i.e., provides the most similar vulnerable line from the training set) using a prototype-based approach. In a study of 186 real-world and confirmed JavaScript vulnerability fixes of 91 projects, the approach could flag 60% of the known vulnerable lines on average by marking only 10% of the code-base, but in certain cases the model identified 100% of the vulnerable code lines while flagging only 8.72% of the code-base.

If you wish to use our dataset, please cite this dataset, or the corresponding paper:

@inproceedings{mosolygo2021towards, title={Towards a Prototype Based Explainable JavaScript Vulnerability Prediction Model}, author={Mosolyg{\'o}, Bal{\'a}zs and V{\'a}ndor, Norbert and Antal, G{\'a}bor and Heged{\H{u}}s, P{\'e}ter and Ferenc, Rudolf}, booktitle={2021 International Conference on Code Quality (ICCQ)}, pages={15--25}, year={2021}, organization={IEEE} }
c
ckanext-datatablesview
catalog.civicdataecosystem.org
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-datatablesview [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-datatablesview
Explore at:
Dataset updated
Jun 4, 2025
Description
The datatablesview extension for CKAN enhances the display of tabular datasets within CKAN by integrating the DataTables JavaScript library. As a fork of a previous DataTables CKAN plugin, this extension aims to provide improved functionality and maintainability for presenting data in a user-friendly and interactive tabular format. This tool focuses on making data more accessible and easier to explore directly within the CKAN interface. Key Features: Enhanced Data Visualization: Transforms standard CKAN dataset views into interactive tables using the DataTables library, providing a more engaging user experience compared to plain HTML tables. Interactive Table Functionality: Includes features such as sorting, filtering, and pagination within the data table, allowing users to easily navigate and analyze large datasets directly in the browser. Improved Data Accessibility: Makes tabular data more accessible to a wider range of users by providing intuitive tools to explore and understand the information. Presumed Customizable Appearance: Given that it is based on DataTables, users will likely be able to customize the look and feel of the tables through DataTables configuration options (note: this is an assumption based on standard DataTables usage and may require coding). Use Cases (based on typical DataTables applications): Government Data Portals: Display complex government datasets in a format that is easy for citizens to search, filter, and understand, enhancing transparency and promoting data-driven decision-making. For example, presenting financial data, population statistics, or environmental monitoring results. Research Data Repositories: Allow researchers to quickly explore and analyze large scientific datasets directly within the CKAN interface, facilitating data discovery and collaboration. Corporate Data Catalogs: Enable business users to easily access and manipulate tabular data relevant to their roles, improving data literacy and enabling data-informed business strategies. Technical Integration (inferred from CKAN extension structure): The extension likely operates by leveraging CKAN's plugin architecture to override the default dataset view for tabular data. Its implementation likely uses CKAN's templating system to render datasets using DataTables' JavaScript and CSS, enhancing data-viewing experience. Benefits & Impact: By implementing the datatablesview extension, organizations can improve the user experience when accessing and exploring tabular datasets within their CKAN instances. The enhanced interactivity and data exploration features can lead to increased data utilization, improved data literacy, and more effective data-driven decision-making within organizations and communities.
CommitBench
zenodo.org
csv, json
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Schall; Maximilian Schall; Tamara Czinczoll; Tamara Czinczoll; Gerard de Melo; Gerard de Melo (2024). CommitBench [Dataset]. http://doi.org/10.5281/zenodo.10497442
Explore at:
json, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10497442
Dataset updated
Feb 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maximilian Schall; Maximilian Schall; Tamara Czinczoll; Tamara Czinczoll; Gerard de Melo; Gerard de Melo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Dec 15, 2023
Description
Data Statement for CommitBench

- Dataset Title: CommitBench

- Dataset Curator: Maximilian Schall, Tamara Czinczoll, Gerard de Melo

- Dataset Version: 1.0, 15.12.2023

- Data Statement Author: Maximilian Schall, Tamara Czinczoll

- Data Statement Version: 1.0, 16.01.2023

- Code URL: https://github.com/maxscha/commitbench

EXECUTIVE SUMMARY

We provide CommitBench as an open-source, reproducible and privacy- and license-aware benchmark for commit message generation. The dataset is gathered from github repositories with licenses that permit redistribution. We provide six programming languages, Java, Python, Go, JavaScript, PHP and Ruby. The commit messages in natural language are restricted to English, as it is the working language in many software development projects. The dataset has 1,664,590 examples that were generated by using extensive quality-focused filtering techniques (e.g. excluding bot commits). Additionally, we provide a version with longer sequences for benchmarking models with more extended sequence input, as well a version with

CURATION RATIONALE

We created this dataset due to quality and legal issues with previous commit message generation datasets. Given a git diff displaying code changes between two file versions, the task is to predict the accompanying commit message describing these changes in natural language. We base our GitHub repository selection on that of a previous dataset, CodeSearchNet, but apply a large number of filtering techniques to improve the data quality and eliminate noise. Due to the original repository selection, we are also restricted to the aforementioned programming languages. It was important to us, however, to provide some number of programming languages to accommodate any changes in the task due to the degree of hardware-relatedness of a language. The dataset is provides as a large CSV file containing all samples. We provide the following fields: Diff, Commit Message, Hash, Project, Split.

DOCUMENTATION FOR SOURCE DATASETS

Repository selection based on CodeSearchNet, which can be found under https://github.com/github/CodeSearchNet

LANGUAGE VARIETIES

Since GitHub hosts software projects from all over the world, there is no single uniform variety of English used across all commit messages. This means that phrasing can be regional or subject to influences from the programmer's native language. It also means that different spelling conventions may co-exist and that different terms may used for the same concept. Any model trained on this data should take these factors into account. For the number of samples for different programming languages, see Table below:

Language Number of Samples
Java 153,119
Ruby 233,710
Go 137,998
JavaScript 373,598
Python 472,469
PHP 294,394

SPEAKER DEMOGRAPHIC

Due to the extremely diverse (geographically, but also socio-economically) backgrounds of the software development community, there is no single demographic the data comes from. Of course, this does not entail that there are no biases when it comes to the data origin. Globally, the average software developer tends to be male and has obtained higher education. Due to the anonymous nature of GitHub profiles, gender distribution information cannot be extracted.

ANNOTATOR DEMOGRAPHIC

Due to the automated generation of the dataset, no annotators were used.

SPEECH SITUATION AND CHARACTERISTICS

The public nature and often business-related creation of the data by the original GitHub users fosters a more neutral, information-focused and formal language. As it is not uncommon for developers to find the writing of commit messages tedious, there can also be commit messages representing the frustration or boredom of the commit author. While our filtering is supposed to catch these types of messages, there can be some instances still in the dataset.

PREPROCESSING AND DATA FORMATTING

See paper for all preprocessing steps. We do not provide the un-processed raw data due to privacy concerns, but it can be obtained via CodeSearchNet or requested from the authors.

CAPTURE QUALITY

While our dataset is completely reproducible at the time of writing, there are external dependencies that could restrict this. If GitHub shuts down and someone with a software project in the dataset deletes their repository, there can be instances that are non-reproducible.

LIMITATIONS

While our filters are meant to ensure a high quality for each data sample in the dataset, we cannot ensure that only low-quality examples were removed. Similarly, we cannot guarantee that our extensive filtering methods catch all low-quality examples. Some might remain in the dataset. Another limitation of our dataset is the low number of programming languages (there are many more) as well as our focus on English commit messages. There might be some people that only write commit messages in their respective languages, e.g., because the organization they work at has established this or because they do not speak English (confidently enough). Perhaps some languages' syntax better aligns with that of programming languages. These effects cannot be investigated with CommitBench.

Although we anonymize the data as far as possible, the required information for reproducibility, including the organization, project name, and project hash, makes it possible to refer back to the original authoring user account, since this information is freely available in the original repository on GitHub.

METADATA

License: Dataset under the CC BY-NC 4.0 license

DISCLOSURES AND ETHICAL REVIEW

While we put substantial effort into removing privacy-sensitive information, our solutions cannot find 100% of such cases. This means that researchers and anyone using the data need to incorporate their own safeguards to effectively reduce the amount of personal information that can be exposed.

ABOUT THIS DOCUMENT

A data statement is a characterization of a dataset that provides context to allow developers and users to better understand how experimental results might generalize, how software might be appropriately deployed, and what biases might be reflected in systems built on the software.

This data statement was written based on the template for the Data Statements Version 2 schema. The template was prepared by Angelina McMillan-Major, Emily M. Bender, and Batya Friedman and can be found at https://techpolicylab.uw.edu/data-statements/ and was updated from the community Version 1 Markdown template by Leon Dercyznski.
O
MDOT Datasets for Update
opendata.maryland.gov
application/rdfxml +5
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MDOT Datasets for Update [Dataset]. https://opendata.maryland.gov/Administrative/MDOT-Datasets-for-Update/r9qv-ur97
Explore at:
application/rssxml, json, csv, application/rdfxml, xml, tsvAvailable download formats
Dataset updated
Jul 11, 2025
Dataset authored and provided by
MD Department of Information Technology
Description
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
c
Sowiport User Search Sessions Data Set (SUSS)
datacatalogue.cessda.eu
search.gesis.org
+1more
Updated Mar 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mayr, Philipp (2023). Sowiport User Search Sessions Data Set (SUSS) [Dataset]. http://doi.org/10.7802/1380
Explore at:
Unique identifier
https://doi.org/10.7802/1380
Dataset updated
Mar 11, 2023
Dataset provided by
GESIS - Leibniz Institute for the Social Sciences
Authors
Mayr, Philipp
Measurement technique
Recording
Description
This data set contains individual search sessions from the transaction log of the academic search engine sowiport (www.sowiport.de). The data was collected over a period of one year (between 2nd April 2014 and 2nd April 2015). The web server log files and specific javascript-based logging techniques were used to capture the usage behaviour within the system. All activities are mapped to a list of 58 actions. This list covers all types of activities and pages that can be carried out/visited within the system (e.g. typing a query, visiting a document, selecting a facet, etc.). For each action, a session id, the date stamp and additional information (e.g. queries, document ids, and result lists) are stored. The session id is assigned via browser cookie and allows tracking user behaviour over multiple searches. Based on the session id and date stamp, the step in which an action is conducted and the length of the action is included in the data set as well. The data set contains 558,008 individual search sessions and a total of 7,982,427 logs entries. The average number of actions per search session is 7.
This work was funded by Deutsche Forschungsgemeinschaft (DFG), grant no. MA 3964/5-1; the AMUR project at GESIS.
O
Dataset Freshness Report - Datasets with DoIT Portal Administrative...
opendata.maryland.gov
data.wu.ac.at
application/rdfxml +5
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MD Department of Information Technology (2025). Dataset Freshness Report - Datasets with DoIT Portal Administrative Ownership [Dataset]. https://opendata.maryland.gov/Administrative/Dataset-Freshness-Report-Datasets-with-DoIT-Portal/s5di-jkg2
Explore at:
application/rssxml, application/rdfxml, csv, tsv, json, xmlAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
MD Department of Information Technology
Description
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
P
CodeSearchNet Dataset
paperswithcode.com
opendatalab.com
Updated Dec 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamel Husain; Ho-Hsiang Wu; Tiferet Gazit; Miltiadis Allamanis; Marc Brockschmidt (2024). CodeSearchNet Dataset [Dataset]. https://paperswithcode.com/dataset/codesearchnet
Explore at:
Dataset updated
Dec 30, 2024
Authors
Hamel Husain; Ho-Hsiang Wu; Tiferet Gazit; Miltiadis Allamanis; Marc Brockschmidt
Description
The CodeSearchNet Corpus is a large dataset of functions with associated documentation written in Go, Java, JavaScript, PHP, Python, and Ruby from open source projects on GitHub. The CodeSearchNet Corpus includes: * Six million methods overall * Two million of which have associated documentation (docstrings, JavaDoc, and more) * Metadata that indicates the original location (repository or line number, for example) where the data was found
Z
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
data.niaid.nih.gov
zenodo.org
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nurmi, Juha (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
Explore at:
Dataset updated
Jun 20, 2023
Dataset provided by
Brumley, Billy
Niemelä, Mikko
Nurmi, Juha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
d
Data release for solar-sensor angle analysis subset associated with the...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Data release for solar-sensor angle analysis subset associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-solar-sensor-angle-analysis-subset-associated-with-the-journal-article-so
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States, Western United States
Description
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
R
Data from: Unveiling the Impact of User-Agent Reduction and Client Hints: A...
data.ru.nl
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gunes Acar; Senol, A. (2023). Unveiling the Impact of User-Agent Reduction and Client Hints: A Measurement Study (WPES'23) [Dataset]. http://doi.org/10.34973/86ks-gf89
Explore at:
(21570613845 bytes)Available download formats
Unique identifier
https://doi.org/10.34973/86ks-gf89
Dataset updated
Nov 27, 2023
Dataset provided by
Radboud University
Authors
Gunes Acar; Senol, A.
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In recent years, browsers reduced the identifying information in user-agent strings to enhance user privacy. However, Chrome has also introduced high-entropy user-agent client hints (UA-CH) and new JavaScript API to provide access to specific browser details. The study assesses the impact of these changes on the top 100,000 websites by using an instrumented crawler to measure access to high-entropy browser features via UA-CH HTTP headers and the JavaScript API. It also investigates whether tracking, advertising, and browser fingerprinting scripts have started using these new client hints and the JavaScript API.

By Asuman Senol and Gunes Acar. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society.
O
Dataset Freshness Report: GOPI Performance Measurement Datasets
opendata.maryland.gov
data.wu.ac.at
application/rdfxml +5
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MD Department of Information Technology (2025). Dataset Freshness Report: GOPI Performance Measurement Datasets [Dataset]. https://opendata.maryland.gov/Administrative/Dataset-Freshness-Report-GOPI-Performance-Measurem/frf6-xmyj
Explore at:
json, csv, tsv, application/rssxml, xml, application/rdfxmlAvailable download formats
Dataset updated
Jul 1, 2025
Dataset authored and provided by
MD Department of Information Technology
Description
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
Z
SIMPATICO Logs Final Trento Dataset v1.0
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matteo Gerosa (2020). SIMPATICO Logs Final Trento Dataset v1.0 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2554832
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Raman Kazhamiakin
Marco Pistore
Matteo Gerosa
Michele Trainotti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Trento
Description
SIMPATICO logs for the user evaluation of Trento in project iterations 1 and 2

The current package contains the Interaction LOG data captured in the Trento two evaluations of the results of H2020 project SIMPATICO that were undertaken from September 2017 to January 2019. The data is exported from the Elasticsearch instance that was used to log all of the interaction data. The data model for this can be found in project deliverable "D3.3 Advanced Methods And Tools For User Interaction Automation". For more information about the setup for conducting the tests and the results achieved please consult project deliverable "D6.6 SIMPATICO Evaluation Report v2". All project deliverables, except where noted, are public and are available at Zenodo community reachable at https://zenodo.org/communities/h2020-simpatico-692819.

The following caveats need to be highlighted for this data set: - The format is JSON (Javascript objects) as provided by Elasticsearch.

Data is completely anonymized: no traces of personal data for any of the participants can be found in this file. Individual user logs can be traced from the "userID" field that is stored, containing either a unique identifier that is backed to a logged-in user (in the cases in which just a number is stored) or a user who is interacting but has not yet logged (this is signified by the "no_user_logged_" prefix, followed by another unique identifier that can trace users interacting before login).
c
Data from: Database Web Programming (Complete)
spectrum.library.concordia.ca
zip
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bipin C. Desai; Arlin L Kipling (2020). Database Web Programming (Complete) [Dataset]. https://spectrum.library.concordia.ca/id/eprint/987312/
Explore at:
zipAvailable download formats
Dataset updated
2020
Dataset provided by
Electronic Publishing Bytepress.com
Authors
Bipin C. Desai; Arlin L Kipling
License
https://spectrum.library.concordia.ca/policies.html#TermsOfAccesshttps://spectrum.library.concordia.ca/policies.html#TermsOfAccess
Description
This book is the result of teaching the laboratory component of an introductory course in Database Systems in the Department of Computer Science & Software Engineering, Concordia University, Montreal.. The intent of this part of the course was to have the students create a practical web-based application wherein the database forms the dynamic component of a real life application using a web browser as the user interface.

It was decided to use all open source software, namely, Apache web server, PHP, JavaScript and HTML, and also the open source database which started as MySQL and has since migrated to MariaDB.

The examples given in this book have been run successfully both using MySQL on a Windows platform and MariaDB on a Linux platform without any changes. However, the code may need to be updated as the underlying software systems evolve with time, as functions are deprecated and replaced by others. Hence the user is responsible for making any required changes to any code given in this book.

The readers are also warned of the changing privacy and data usage policy of most web sites. They should be aware that most web sites collect and mine user’s data for private profit.

The authors wish to acknowledge the contribution of many students in the introductory database course over the years whose needs and the involvement of one of the authors in the early days of the web prompted the start of this project in the late part of the 20th century. This was the era of dot com bubble
c
ckanext-usertracking
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-usertracking [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-usertracking
Explore at:
Dataset updated
Jun 4, 2025
Description
The User Tracking extension for CKAN provides an interface within the CKAN admin panel to monitor user activity and engagement. It aims to facilitate data-driven insights into how users interact with the CKAN platform by displaying user, organizational, and individual page engagement metrics. Furthermore, it includes a command-line interface (CLI) tool for exporting user activity data. Key Features: Activity Tracking Tab: Adds a dedicated "Activity tracking" tab to the CKAN admin page to display user engagement and activity data. Engagement Data: Presents data related to user, organizational, and individual page engagement. Database Tracking: Relies on creation of a useractivitytracker database table which is updated via Javascript embedded within CKAN pages with POST requests. Data Export: Offers a CLI command for exporting user activity data from the useractivitytracker table into a CSV file, covering a specified number of past days. MVC Structure: The tracked table data is displayed in an MVC format. Technical Integration: The extension integrates with CKAN by adding a new plugin that creates a middleware component. This middleware collects data via Javascript embedded throughout CKAN pages, sending POST requests to update the useractivitytracker table. Additional configuration may be required to activate the plugin through the ckan.plugins setting in the CKAN configuration file (ckan.ini). Benefits & Impact: By implementing the User Tracking extension. CKAN administrators can gain insights into user behavior and platform usage patterns. This data can inform decisions related to content optimization, user support, and overall platform improvement. Note that the extension is explicitly mentioned as compatible with CKAN 2.9, but compatibility with the recent versions remains untested or unmentioned.
Data from: GHTraffic: A Dataset for Reproducible Research in...
zenodo.org
zip
Updated Aug 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thilini Bhagya; Jens Dietrich; Hans Guesgen; Steve Versteeg; Thilini Bhagya; Jens Dietrich; Hans Guesgen; Steve Versteeg (2020). GHTraffic: A Dataset for Reproducible Research in Service-Oriented Computing [Dataset]. http://doi.org/10.5281/zenodo.1034573
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1034573
Dataset updated
Aug 29, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Thilini Bhagya; Jens Dietrich; Hans Guesgen; Steve Versteeg; Thilini Bhagya; Jens Dietrich; Hans Guesgen; Steve Versteeg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present GHTraffic, a dataset of significant size comprising HTTP transactions extracted from GitHub data (i.e., from 04 August 2015 GHTorrent issues snapshot) and augmented with synthetic transaction data. This dataset facilitates reproducible research on many aspects of service-oriented computing.

The GHTraffic dataset comprises three different editions: Small (S), Medium (M) and Large (L). The S dataset includes HTTP transaction records created from google/guava repository. Guava is a popular Java library containing utilities and data structures. The M dataset includes records from the npm/npm project. It is the popular de-facto standard package manager for JavaScript. The L dataset contains data that were created by selecting eight repositories containing large and very active projects, including twbs/bootstrap, symfony/symfony, docker/docker, Homebrew/homebrew, rust-lang/rust, kubernetes/kubernetes, rails/rails, and angular/angular.js.

We also provide access to the scripts used to generate GHTraffic. Using these scripts, users can modify the configuration properties in the config.properties file in order to create a customised version of GHTraffic datasets for their own use. The readme.md file included in the distribution provides further information on how to build the code and run the scripts.

The GHTraffic scripts can be accessed by downloading the pre-configured VirtualBox image or by cloning the repository.
p
Road Restrictions - Dataset - CKAN
ckan0.cf.opendata.inter.prod-toronto.ca
Updated Jul 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Road Restrictions - Dataset - CKAN [Dataset]. https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/road-restrictions
Explore at:
Dataset updated
Jul 23, 2019
Description
New Feed Enhancements - August 2022 The new road restrictions feed is available with links listed as Version 3. We will be retiring Versions 1 and 2 on October 1st, 2022. Version 3 includes both Versions 1 & 2. We have moved the Road Restrictions feed to a RESTful service allowing us to improve the delivery of the information. The benefits of this change include: The information is published faster. The Url is mirrored in both HTTP and HTTPS environments. More data formats available. For compatibility with existing customer applications, the original feed is available in the same formats as before. Format | Description ---|--- XML | Extensible Markup Language (XML) is a common data interchange format. XSD | XML Schema Document (XSD) contains information about the XML data structure. JSON | JavaScript Object Notation (JSON) is used to load easily into JavaScript enabled web pages. JSONP (.json) | Packaged JSON (JSONP) is JSON wrapped in a JavaScript function. JSONP can be reloaded without reloading the page, using the JSON (.json) data extension. Timestamp JSON | The Timestamp file is a small file that contains only the timestamp of the most recent update time in the dataset in JSON format. This file can be read quickly to determine if the main data file has changed. Timestamp JSONP (.json) | The Timestamp file in JSONP (.json) format.
O
Dataset Freshness Report: Breakout by Agency
opendata.maryland.gov
application/rdfxml +5
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MD Department of Information Technology (2025). Dataset Freshness Report: Breakout by Agency [Dataset]. https://opendata.maryland.gov/Administrative/Dataset-Freshness-Report-Breakout-by-Agency/mb32-u83y
Explore at:
csv, application/rdfxml, tsv, json, application/rssxml, xmlAvailable download formats
Dataset updated
Jul 14, 2025
Dataset authored and provided by
MD Department of Information Technology
Description
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
d
B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts &...
datarade.ai
.json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PredictLeads, B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts & Jobs | 921M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-b2b-data-technographic-data-api-flat-file-predictleads
Explore at:
.jsonAvailable download formats
Dataset authored and provided by
PredictLeads
Area covered
Saudi Arabia, Ascension and Tristan da Cunha, Uzbekistan, Nauru, Mongolia, Hong Kong, Costa Rica, Lithuania, Cook Islands, Vanuatu
Description
PredictLeads Global Technographic Dataset delivers in-depth insights into technology adoption across millions of companies worldwide. Our dataset, sourced from HTML, JavaScript, and job postings, enables B2B sales, marketing, and data enrichment teams to refine targeting, enhance lead scoring, and optimize outreach strategies. By tracking 25,000+ technologies across 92M+ websites, businesses can uncover market trends, assess competitor technology stacks, and personalize their approach.

Use Cases:

✅ Enhance CRM Data – Enrich company records with detailed real-time technology insights. ✅ Targeted Sales Outreach – Identify prospects based on their tech stack and personalize outreach. ✅ Competitor & Market Analysis – Gain insights into competitor technology adoption and industry trends. ✅ Lead Scoring & Prioritization – Rank potential customers based on adopted technologies. ✅ Personalized Marketing – Craft highly relevant campaigns based on technology adoption trends.

API Attributes & Structure:

id (string, UUID) – Unique identifier for each technology detection.

first_seen_at (ISO 8601 date-time) – Timestamp when the technology was first detected on the company's website.

last_seen_at (ISO 8601 date-time) – Most recent timestamp when the technology was last observed.

behind_firewall (boolean) – Indicates whether the technology is protected behind a firewall.

score (float, 0–1) – Confidence score for the detection accuracy.

company (object) – The company using the detected technology, including:

- id (UUID) – Unique company identifier.

- domain (string) – Company website domain.

- company_name (string) – Company's official name.

- ticker (string, nullable) – Stock ticker (if publicly listed).

technology (object) – Information on the detected technology, including:

- id (UUID) – Unique technology identifier.

- name (string) – Technology name (e.g., Salesforce, HubSpot, AWS).

seen_on_job_openings (boolean) – True/False flag indicating if the technology is - mentioned in job postings.

seen_on_subpages (array of objects) – List of company subpages where the technology was detected.

📌 PredictLeads Technographic Data is trusted by enterprises and B2B professionals for accurate, real-time technology intelligence, enabling smarter prospecting, data-driven marketing, and competitive analysis

PredictLeads Technology Detections Dataset https://docs.predictleads.com/v3/guide/technology_detections_dataset
P
HumanEval-X Dataset
paperswithcode.com
opendatalab.com
Updated Mar 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qinkai Zheng; Xiao Xia; Xu Zou; Yuxiao Dong; Shan Wang; Yufei Xue; Zihan Wang; Lei Shen; Andi Wang; Yang Li; Teng Su; Zhilin Yang; Jie Tang (2023). HumanEval-X Dataset [Dataset]. https://paperswithcode.com/dataset/humaneval-x
Explore at:
Dataset updated
Mar 31, 2023
Authors
Qinkai Zheng; Xiao Xia; Xu Zou; Yuxiao Dong; Shan Wang; Yufei Xue; Zihan Wang; Lei Shen; Andi Wang; Yang Li; Teng Su; Zhilin Yang; Jie Tang
Description
HumanEval-X is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk

Website Statistics

Explore at:

csv, pdfAvailable download formats

Dataset updated

Jun 11, 2018

Dataset provided by

Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/

License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Description

This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

Clear search

Close search

Google apps

Main menu

Language	Number of Samples
Java	153,119
Ruby	233,710
Go	137,998
JavaScript	373,598
Python	472,469
PHP	294,394

Website Statistics

Data from: Towards a Prototype Based Explainable JavaScript Vulnerability...

ckanext-datatablesview

CommitBench

Data Statement for CommitBench

EXECUTIVE SUMMARY

CURATION RATIONALE

DOCUMENTATION FOR SOURCE DATASETS

LANGUAGE VARIETIES

SPEAKER DEMOGRAPHIC

ANNOTATOR DEMOGRAPHIC

SPEECH SITUATION AND CHARACTERISTICS

PREPROCESSING AND DATA FORMATTING

CAPTURE QUALITY

LIMITATIONS

METADATA

DISCLOSURES AND ETHICAL REVIEW

ABOUT THIS DOCUMENT

MDOT Datasets for Update

Sowiport User Search Sessions Data Set (SUSS)

Dataset Freshness Report - Datasets with DoIT Portal Administrative...

CodeSearchNet Dataset

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

Data release for solar-sensor angle analysis subset associated with the...

Data from: Unveiling the Impact of User-Agent Reduction and Client Hints: A...

Dataset Freshness Report: GOPI Performance Measurement Datasets

SIMPATICO Logs Final Trento Dataset v1.0

Data from: Database Web Programming (Complete)

ckanext-usertracking

Data from: GHTraffic: A Dataset for Reproducible Research in...

Road Restrictions - Dataset - CKAN

Dataset Freshness Report: Breakout by Agency

B2B Data | Global Technographic Data | Sourced from HTML, Java Scripts &...

HumanEval-X Dataset

Website Statistics