47 datasets found

Z
Data from: Mining Rule Violations in JavaScript Code Snippets
data.niaid.nih.gov
explore.openaire.eu
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bonifácio, Rodrigo (2020). Mining Rule Violations in JavaScript Code Snippets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2593817
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Bonifácio, Rodrigo
Moraes, João Pedro
Smethurst, Guilherme
Pinto, Gustavo
Ferreira Campos, Uriel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content of this repository This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge

Github Repository with the software used : here.

DATASET The dataset was retrived utilizing google bigquery and dumped to a csv file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information : 1. The Id of the question (PostId) 2. The Content (in this case the code block) 3. the lenght of the code block 4. the line count of the code block 5. The score of the post 6. The title

A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.

Filtered Dataset:

Extracting code from CSV We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.

Running ESlint Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.

Number of Violations per Rule This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.

Number of violations per Category As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.

Individual Reports This information was extracted from the json reports, it's a csv file with PostID and violations per rule.

Rules The file Rules with categories contains all the rules used and their categories.
Dataset collected by JSIsolate
zenodo.org
zip
Updated Aug 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng (2021). Dataset collected by JSIsolate [Dataset]. http://doi.org/10.5281/zenodo.4892853
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4892853
Dataset updated
Aug 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mingxue Zhang; Wei Meng; Mingxue Zhang; Wei Meng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains: 1) the object access logs, 2) script isolation policies and 3) script write conflicts collected by JSIsolate on Alexa top 1K websites. We analyze the access logs to generate the conflict summary files and script isolation policies that assign static scripts to an execution context.

We split the whole dataset of object access logs into 10 subsets, i.e., access-0.zip ~ access-9.zip.

The isolation policies are released in url-level-policies.zip and domain-level-policies.zip.

The object accesses (i.e., reads and writes) are saved in [rank].[main/sub].[frame_cnt].access (e.g., 1.main.0.access) files.

The URLs of frames (i.e., main frames and iframes) are saved in [rank].[main/sub].[frame_cnt].frame (e.g., 1.main.0.frame) files.

The maps from script IDs to script URLs are saved in [rank].[main/sub].[frame_cnt].id2url (e.g., 1.main.0.id2url) files.

The maps from script IDs to their parent script (script that includes it,

The source code of scripts are saved in [rank].[main/sub].[frame_cnt].[script_ID].script (e.g., 1.main.0.17.script) files.

Note that we perform monkey testing during the data collection, which may cause the page to navigate to a different URL. Therefore, there could be multiple main frame files.

The conflicts are dumped to [rank].conflicts (e.g., 1.conflicts) files.

The isolation policies are dumped to [rank].configs (e.g., 1.configs) and [rank].configs-simple (e.g., 1.configs-simple) files.

Note that the *.configs files also include the read/write operations that cause JSIsolate to assign a script from third-party domain to the first-party context.
h
code-text-javascript
huggingface.co
Updated Jul 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semeru Lab (2023). code-text-javascript [Dataset]. https://huggingface.co/datasets/semeru/code-text-javascript
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 18, 2023
Dataset authored and provided by
Semeru Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset is imported from CodeXGLUE and pre-processed using their script.

Where to find in Semeru:

The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-text/javascript in Semeru

CodeXGLUE -- Code-To-Text Task Definition

The task is to generate natural language comments for a code, and evaluted by smoothed bleu-4 score.

Dataset

The dataset we use comes from CodeSearchNet and we filter the dataset as the following:… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-text-javascript.
T
mnist
tensorflow.org
universe.roboflow.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
D
Open data
data.nsw.gov.au
researchdata.edu.au
+1more
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Sydney (2025). Open data [Dataset]. https://data.nsw.gov.au/data/dataset/5-cityofsydney--open-data
Explore at:
Dataset updated
Jul 22, 2025
Dataset provided by
City of Sydney
Description
Download or connect to open data endpoints Get data Download data as spreadsheet, kml, shapefile or connect to service APIs to stay up to date. Create maps Create maps, analyse and discover trends. Watch video instructions. Code apps Make applications with our data.ArGIS API for Javascript. Categories City Council Assets, amenities and public space Council services and facilities Culture, leisure and sport Economy and business Environment and climate Planning Transport and access View all Terms Unless otherwise stated, data products available from the data hub are published under Creative Commons licences. For terms of use and more information see site Disclaimer. Contact If you have a question, comments, or requests for interactive maps and data, we would love to hear from you. Council business For information on rates, development applications, strategies, reports and other council business, see the City of Sydney's main website.
Z
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
data.niaid.nih.gov
zenodo.org
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nurmi, Juha (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
Explore at:
Dataset updated
Jun 20, 2023
Dataset provided by
Brumley, Billy
Nurmi, Juha
Niemelä, Mikko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Data from: Polars
kaggle.com
Updated Aug 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abstr Phil (2022). Polars [Dataset]. https://www.kaggle.com/datasets/abstrphil/polars
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abstr Phil
Description
Python wheel for Polars.

For use especially in code competitions where Internet access is restricted.
g
Local Government Association of SA - API Statewide Local Government Datasets...
gimi9.com
Updated Jul 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Local Government Association of SA - API Statewide Local Government Datasets | gimi9.com [Dataset]. https://gimi9.com/dataset/au_api-statewide-local-government-datasets/
Explore at:
Dataset updated
Jul 2, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data provided comes from the Local Government Association of South Australia’s statewide Metabase which is part of its Electronic Services Program initiative. For more information on the Electronic Services Program. This data is used to support the Local Government Association's My Local Services App http://www.lga.sa.gov.au/mylocal initiative. API for the following statewide Local Government datasets: Elected Members (Mayors and Councillors), Events, Libraries, Parks and Councils. The following SDK’s are available for developers to access data stored in Parse: iOS, OSX, Android, JavaScript, .Net, REST API.
w
Dataset Freshness Report for data.maryland.gov - As of May 04 2018
data.wu.ac.at
csv, json, rdf, xml
Updated May 8, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of Maryland (2018). Dataset Freshness Report for data.maryland.gov - As of May 04 2018 [Dataset]. https://data.wu.ac.at/odso/data_gov/N2ZhYmJjNjUtMjA2YS00MTdjLTg4ZjYtMjA2YjRhZTYzMTEy
Explore at:
csv, xml, rdf, jsonAvailable download formats
Dataset updated
May 8, 2018
Dataset provided by
State of Maryland
Area covered
Maryland
Description
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
h
js-fakes-4bars
huggingface.co
Updated Feb 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Tristan Behrens (2022). js-fakes-4bars [Dataset]. https://huggingface.co/datasets/TristanBehrens/js-fakes-4bars
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2022
Authors
Dr. Tristan Behrens
Description
JSFakes (Dr. Tristan Behrens).

This is a tokenized version of the JS-Fakes dataset by Omar Peracha. The original dataset can be found here: js-fakes.git The representation is four tracks with four bars per track.

Purpose.

This dataset is a good starting point for Music Generation. You could train GPT-2 on the samples to compose music.

Contact.

Find me on LinkedIn and say hello. If you find and issue or have a feature request, please contact me. Please be so… See the full description on the dataset page: https://huggingface.co/datasets/TristanBehrens/js-fakes-4bars.
Z
[Database] Urban Water Consumption at Multiple Spatial and Temporal Scales....
data.niaid.nih.gov
Updated Mar 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castelletti Andrea (2021). [Database] Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4390459
Explore at:
Dataset updated
Mar 2, 2021
Dataset provided by
Di Mauro Anna
Di Nardo Armando
Castelletti Andrea
Cominola Andrea
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains the complete catalog of datasets and publications reviewed in: Di Mauro A., Cominola A., Castelletti A., Di Nardo A.. Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water 2021.The complete catalog contains:

92 state-of-the-art water demand datasets identified at the district, household, and end use scales;

120 related peer-reviewed publications;

57 additional datasets with electricity demand data at the end use and household scales.

The following metadata are reported, for each dataset:

Authors

Year

Location

Dataset Size

Time Series Length

Time Sampling Resolution

Access Policy.

The following metadata are reported, for each publication:

Authors

Year

Journal

Title

Spatial Scale

Type of Study: Survey (S) / Dataset (D)

Domain: Water (W)/Electricity (E)

Time Sampling Resolution

Access Policy

Dataset Size

Time Series Length

Location

Authors: Anna Di Mauro - Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | anna.dimauro@unicampania.it; Andrea Cominola - Chair of Smart Water Networks | Technische Universität Berlin - Einstein Center Digital Future (Germany) | andrea.cominola@tu-berlin.de; Andrea Castelletti - Department of Electronics, Information and Bioengineering | Politecnico di Milano (Italy) | andrea.castelletti@polimi.it Armando Di Nardo -Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | armando.dinardo@unicampania.it

Citation and reference:

If you use this database, please consider citing our paper

Di Mauro, A., Cominola, A., Castelletti, A., & Di Nardo, A. (2021). Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water, 13(1), 36, https://doi.org/10.3390/w13010036

Updates and Contributions:

The catalogue stored in this public repository can be collaboratively updated as more datasets become available. The authors will periodically update it to a new version.

New requests can be submitted to the authors, so that the dataset collection can be improved by different contributors. Contributors will be cited, step by step, in the updated versions of the dataset catalogue.

Updates history:

March 1st, 2021 - Pacheco, C.J.B., Horsburgh, J.S., Tracy, J.R. (Utah State University, Logan, UT - USA) --- The dataset associated with paper Bastidas Pacheco, C.J.; Horsburgh, J.S.; Tracy, R.J.. A Low-Cost, Open Source Monitoring System for Collecting High Temporal Resolution Water Use Data on Magnetically Driven Residential Water Meters. Sensors 2020, 20, 3655. is published in the HydroShare repository, where it is available as an OPEN dataset. Data can be found here: https://doi.org/10.4211/hs.4de42db6485f47b290bd9e17b017bb51
Number of open source projects and versions worldwide 2023, by ecosystem
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of open source projects and versions worldwide 2023, by ecosystem [Dataset]. https://www.statista.com/statistics/1268650/worldwide-open-source-projects-versions-ecosystems/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Worldwide
Description
At the end of 2022, there were approximately *** million JavaScript open source projects in the Maven Central Repository and around ** million JavaScript project versions worldwide. While JavaScript is the largest ecosystem in the Maven Central Repository, Java, Python, and .NET also have thousands of available open source projects.
javascript
kaggle.com
zip
Updated Mar 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
'';!--"=&{(alert(1))} (2021). javascript [Dataset]. https://www.kaggle.com/testawadhoot/javascript
Explore at:
zip(319 bytes)Available download formats
Dataset updated
Mar 5, 2021
Authors
'';!--"=&{(alert(1))}
Description
Context

There's a story behind every dataset and here's your opportunity to share yours.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

?
d
Custom Built Data Collection Tools
datarade.ai
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Decision Software (2023). Custom Built Data Collection Tools [Dataset]. https://datarade.ai/data-categories/web-browsing-data/datasets
Explore at:
.json, .xml, .csv, .xls, .txtAvailable download formats
Dataset updated
Nov 23, 2023
Dataset authored and provided by
Decision Software
Area covered
Jamaica, Kuwait, Estonia, Mongolia, Iceland, Tanzania, Aruba, Saint Barthélemy, Mozambique, Hungary
Description
Our advanced data extraction tool is designed to empower businesses, researchers, and developers by providing an efficient and reliable way to collect and organize information from any online source. Whether you're gathering market insights, monitoring competitors, tracking trends, or building data-driven applications, our platform offers a perfect solution for automating the extraction and processing of structured data from websites. With seamless integration of AI, our tool takes the process a step further, enabling smarter, more refined data extraction that adapts to your needs over time.

In a digital world where information is continuously updated, timely access to data is critical. Our tool allows you to set up automated data extraction schedules, ensuring that you always have access to the most current information. Whether you're tracking stock prices, monitoring social media trends, or gathering product information, you can configure extraction schedules to suit your needs. Our AI-powered system also allows the tool to learn and optimize based on the data it collects, improving efficiency and accuracy with repeated use. From frequent updates by the minute to less frequent daily, weekly, or monthly collections, our platform handles it all seamlessly.

Our tool doesn’t just gather data—it organizes it. The extracted information is automatically structured into easily usable formats like CSV, JSON, or XML, making it ready for immediate use in applications, databases, or reports. We offer flexibility in the output format to ensure smooth integration with your existing tools and workflows. With AI-enhanced data parsing, the system recognizes and categorizes information more effectively, providing higher quality data for analysis, visualization, or importing into third-party systems.

Whether you’re collecting data from a handful of pages or millions, our system is built to scale. We can handle both small and large-scale extraction tasks with high reliability and performance. Our infrastructure ensures fast, efficient processing, even for the most demanding tasks. With parallel extraction capabilities, you can gather data from multiple sources simultaneously, reducing the time it takes to compile large datasets. AI-powered optimization further improves performance, making the extraction process faster and more adaptive to fluctuating data volumes.

Our tool doesn’t stop at extraction. We provide options for enriching the data by cross-referencing it with other sources or applying custom rules to transform raw information into more meaningful insights. This leads to a more insightful and actionable dataset, giving you a competitive edge through superior data-driven decision-making.

Modern websites often use dynamic content generated by JavaScript, which can be challenging to extract. Our tool, enhanced with AI, is designed to handle even the most complex web architectures, including dynamic loading, infinite scrolling, and paginated content.

Finally, our platform provides detailed logs of all extraction activities, giving you full visibility into the process. With built-in analytics, AI-powered insights can help you monitor progress, and identify issues.

In today’s fast-paced digital world, access to accurate, real-time data is critical for success. Our AI-integrated data extraction tool offers a reliable, flexible, and scalable solution to help you gather and organize the information you need with minimal effort. Whether you’re looking to gain a competitive edge, conduct in-depth research, or build sophisticated applications, our platform is designed to meet your needs and exceed expectations.
SparseBeads Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers; J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers (2020). SparseBeads Dataset [Dataset]. http://doi.org/10.5281/zenodo.290117
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.290117
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers; J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The presented data set, inspired by the SophiaBeads Dataset Project for X-ray Computed Tomography, is collected for studies involving sparsity-regularised reconstruction. The aim is to provide tomographic data for various samples where the sparsity in the image varies.

This dataset is made available as part of the publication

"SparseBeads Data: Benchmarking Sparsity-Regularized Computed Tomography", Jakob S Jørgensen et al, 2017. Meas. Sci. Technol. 28 124005.

Direct link: https://doi.org/10.1088/1361-6501/aa8c29.

This manuscript is published as part of Special Feature on Advanced X-ray Tomography (open access). We refer the users to this publication for an extensive detail in the experimental planning and data acquisition.

Each zipped data folder includes

The meta data for data acquisition and geometry parameters of the scan (.xtekct and .ctprofile.xml).

A sinogram of the central slice (CentreSlice > Sinograms > .tif) along with meta data for the 2D slice (.xtek2dct and .ct2dprofile.xml),

List of projection angles (.ang)

and a 2D FDK reconstruction using the CTPro reconstruction suite (RECON2D > .vol) with volume visualisation parameters (.vgi), added as a reference.

We also include an extra script for those that wish to use the SophiaBeads Dataset Project Codes, which essentially replaces the main script provided, sophiaBeads.m (visit https://zenodo.org/record/16539). Please note that sparseBeads.m script will have to be placed in the same folder as the project codes. The latest version of this script can be found here: https://github.com/jakobsj/SparseBeads_code

For more information, please contact

jakj [at] dtu.dk

jakob.jorgensen [at] manchester.ac.uk
e
OpenStreetMap
data.europa.eu
data.wu.ac.at
unknown, zip
Updated Mar 3, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greater London Authority (2010). OpenStreetMap [Dataset]. https://data.europa.eu/data/datasets/openstreetmap
Explore at:
unknown, zipAvailable download formats
Dataset updated
Mar 3, 2010
Dataset authored and provided by
Greater London Authority
Description
OpenStreetMap (openstreetmap.org) is a global collaborative mapping project, which offers maps and map data released with an open license, encouraging free re-use and re-distribution. The data is created by a large community of volunteers who use a variety of simple on-the-ground surveying techniques, and wiki-syle editing tools to collaborate as they create the maps, in a process which is open to everyone. The project originated in London, and an active community of mappers and developers are based here. Mapping work in London is ongoing (and you can help!) but the coverage is already good enough for many uses.

Browse the map of London on OpenStreetMap.org

Downloads:

The whole of England updated daily:

england.osm.bz2 ~185M - .osm formatted raw XML data (compressed)

england.shp.zip ~156M - ESRI shapefiles

For more details of downloads available from OpenStreetMap, including downloading the whole planet, see 'planet.osm' on the wiki.

Data access APIs:

Download small areas of the map by bounding-box. For example this URL requests the data around Trafalgar Square:

http://api.openstreetmap.org/api/0.6/map?bbox=-0.13062,51.5065,-0.12557,51.50969

Data filtered by "tag". For example this URL returns all elements in London tagged shop=supermarket:

http://www.informationfreeway.org/api/0.6/*[shop=supermarket][bbox=-0.48,51.30,0.21,51.70]

The .osm format

The format of the data is a raw XML represention of all the elements making up the map. OpenStreetMap is composed of interconnected "nodes" and "ways" (and sometimes "relations") each with a set of name=value pairs called "tags". These classify and describe properties of the elements, and ultimately influence how they get drawn on the map. To understand more about tags, and different ways of working with this data format refer to the following pages on the OpenStreetMap wiki.

.osm - About the raw XML data format of OpenStreetMap

Downloading data - More details of the API and extract download options

Xapi - The extended api for filtering by tag

Map Features - A list of tags, and what map features they represent

Converting map data between formats - catalogue of tools for converting from .osm to other formats

Simple embedded maps

Rather than working with raw map data, you may prefer to embed maps from OpenStreetMap on your website with a simple bit of javascript. You can also present overlays of other data, in a manner very similar to working with google maps. In fact you can even use the google maps API to do this. See OSM on your own website for details and links to various javascript map libraries.

Help build the map!

The OpenStreetMap project aims to attract large numbers of contributors who all chip in a little bit to help build the map. Although the map editing tools take a little while to learn, they are designed to be as simple as possible, so that everyone can get involved. This project offers an exciting means of allowing local London communities to take ownership of their part of the map.

Read about how to Get Involved and see the London page for details of OpenStreetMap community events.
d
Ads.txt / App-ads.txt for advertisement compliance
datarade.ai
.json, .csv, .txt
Updated Jan 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datandard (2024). Ads.txt / App-ads.txt for advertisement compliance [Dataset]. https://datarade.ai/data-products/ads-txt-app-ads-txt-for-advertisement-compliance-datandard
Explore at:
.json, .csv, .txtAvailable download formats
Dataset updated
Jan 1, 2024
Dataset authored and provided by
Datandard
Area covered
Mauritius, Grenada, Yemen, Fiji, Chad, Latvia, Iraq, Turks and Caicos Islands, French Polynesia, Sint Maarten (Dutch part)
Description
In today's digital landscape, data transparency and compliance are paramount. Organizations across industries are striving to maintain trust and adhere to regulations governing data privacy and security. To support these efforts, we present our comprehensive Ads.txt and App-Ads.txt dataset.

Key Benefits of Our Dataset:

Coverage: Our dataset offers a comprehensive view of the Ads.txt and App-Ads.txt files, providing valuable information about publishers, advertisers, and the relationships between them. You gain a holistic understanding of the digital advertising ecosystem.

Multiple Data Formats: We understand that flexibility is essential. Our dataset is available in multiple formats, including .CSV, .JSON, and more. Choose the format that best suits your data processing needs.

Global Scope: Whether your business operates in a single country or spans multiple continents, our dataset is tailored to meet your needs. It provides data from various countries, allowing you to analyze regional trends and compliance.

Top-Quality Data: Quality matters. Our dataset is meticulously curated and continuously updated to deliver the most accurate and reliable information. Trust in the integrity of your data for critical decision-making.

Seamless Integration: We've designed our dataset to seamlessly integrate with your existing systems and workflows. No disruptions—just enhanced compliance and efficiency.

The Power of Ads.txt & App-Ads.txt: Ads.txt (Authorized Digital Sellers) and App-Ads.txt (Authorized Sellers for Apps) are industry standards developed by the Interactive Advertising Bureau (IAB) to increase transparency and combat ad fraud. These files specify which companies are authorized to sell digital advertising inventory on a publisher's website or app. Understanding and maintaining these files is essential for data compliance and the prevention of unauthorized ad sales.

How Can You Benefit? - Data Compliance: Ensure that your organization adheres to industry standards and regulations by monitoring Ads.txt and App-Ads.txt files effectively. - Ad Fraud Prevention: Identify unauthorized sellers and take action to prevent ad fraud, ultimately protecting your revenue and brand reputation. - Strategic Insights: Leverage the data in these files to gain insights into your competitors, partners, and the broader digital advertising landscape. - Enhanced Decision-Making: Make data-driven decisions with confidence, armed with accurate and up-to-date information about your advertising partners. - Global Reach: If your operations span the globe, our dataset provides insights into the Ads.txt and App-Ads.txt files of publishers worldwide.

Multiple Data Formats for Your Convenience: - CSV (Comma-Separated Values): A widely used format for easy data manipulation and analysis in spreadsheets and databases. - JSON (JavaScript Object Notation): Ideal for structured data and compatibility with web applications and APIs. - Other Formats: We understand that different organizations have different preferences and requirements. Please inquire about additional format options tailored to your needs.

Data That You Can Trust:

We take data quality seriously. Our team of experts curates and updates the dataset regularly to ensure that you receive the most accurate and reliable information available. Your confidence in the data is our top priority.

Seamless Integration:

Integrate our Ads.txt and App-Ads.txt dataset effortlessly into your existing systems and processes. Our goal is to enhance your compliance efforts without causing disruptions to your workflow.

In Conclusion:

Transparency and compliance are non-negotiable in today's data-driven world. Our Ads.txt and App-Ads.txt dataset empowers you with the knowledge and tools to navigate the complexities of the digital advertising ecosystem while ensuring data compliance and integrity. Whether you're a Data Protection Officer, a data compliance professional, or a business leader, our dataset is your trusted resource for maintaining data transparency and safeguarding your organization's reputation and revenue.

Get Started Today:

Don't miss out on the opportunity to unlock the power of data transparency and compliance. Contact us today to learn more about our Ads.txt and App-Ads.txt dataset, available in multiple formats and tailored to your specific needs. Join the ranks of organizations worldwide that trust our dataset for a compliant and transparent future.
c
ckanext-datatablesview
catalog.civicdataecosystem.org
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-datatablesview [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-datatablesview
Explore at:
Dataset updated
Jun 4, 2025
Description
The datatablesview extension for CKAN enhances the display of tabular datasets within CKAN by integrating the DataTables JavaScript library. As a fork of a previous DataTables CKAN plugin, this extension aims to provide improved functionality and maintainability for presenting data in a user-friendly and interactive tabular format. This tool focuses on making data more accessible and easier to explore directly within the CKAN interface. Key Features: Enhanced Data Visualization: Transforms standard CKAN dataset views into interactive tables using the DataTables library, providing a more engaging user experience compared to plain HTML tables. Interactive Table Functionality: Includes features such as sorting, filtering, and pagination within the data table, allowing users to easily navigate and analyze large datasets directly in the browser. Improved Data Accessibility: Makes tabular data more accessible to a wider range of users by providing intuitive tools to explore and understand the information. Presumed Customizable Appearance: Given that it is based on DataTables, users will likely be able to customize the look and feel of the tables through DataTables configuration options (note: this is an assumption based on standard DataTables usage and may require coding). Use Cases (based on typical DataTables applications): Government Data Portals: Display complex government datasets in a format that is easy for citizens to search, filter, and understand, enhancing transparency and promoting data-driven decision-making. For example, presenting financial data, population statistics, or environmental monitoring results. Research Data Repositories: Allow researchers to quickly explore and analyze large scientific datasets directly within the CKAN interface, facilitating data discovery and collaboration. Corporate Data Catalogs: Enable business users to easily access and manipulate tabular data relevant to their roles, improving data literacy and enabling data-informed business strategies. Technical Integration (inferred from CKAN extension structure): The extension likely operates by leveraging CKAN's plugin architecture to override the default dataset view for tabular data. Its implementation likely uses CKAN's templating system to render datasets using DataTables' JavaScript and CSS, enhancing data-viewing experience. Benefits & Impact: By implementing the datatablesview extension, organizations can improve the user experience when accessing and exploring tabular datasets within their CKAN instances. The enhanced interactivity and data exploration features can lead to increased data utilization, improved data literacy, and more effective data-driven decision-making within organizations and communities.
d
Klamath Marsh January Through May Maximum Surface Water Extent, 1985-2021
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Klamath Marsh January Through May Maximum Surface Water Extent, 1985-2021 [Dataset]. https://catalog.data.gov/dataset/klamath-marsh-january-through-may-maximum-surface-water-extent-1985-2021
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Klamath County, Klamath Marsh
Description
The U.S. Geological Survey Oregon Water Science Center, in cooperation with The Klamath Tribes initiated a project to understand changes in the surface-water extent of Klamath Marsh, Oregon and changes in groundwater levels within and surrounding the marsh. The initial phase of the study focused on developing datasets needed for future interpretive phases of the investigation. This data release documents the creation of a geospatial dataset of January through May maximum surface-water extent based on a model developed by John Jones (2015; 2019) to detect surface-water inundation within vegetated areas from satellite imagery. The Dynamic Surface Water Extent (DSWE) model uses Landsat at-surface reflectance imagery paired with a digital elevation model to classify pixels within a Landsat scene as one of the following types: “not water”, “water – high confidence”, “water – moderate confidence”, “wetland – moderate confidence”, “wetland – low confidence”, and “cloud/shadow/snow” (Jones, 2015; Walker and others, 2020). The model has been replicated by Walker and others (2020) for use within the Google Earth Engine (GEE, https://code.earthengine.google.com/) online geospatial processing platform. The GEE version of the DSWE model enables users who have limited computer processing power to access DSWE datasets. The JavaScript-based interface enables the selection of specific timeframes for analyzing surface water extent as well as creating composite scenes of maximum surface water extent (MSWE) over a specified timeframe. The GEE platform was used to create MSWE datasets showing maximum surface water inundation within the Klamath Marsh for the month of January through May during 1985 – 2021. The dataset presented here includes a summary file of maps and figures (.pdf), surface area calculations of January through May MSWE in tabular (.csv) format, study area polygon in vector (.shp) format, and 37 January through May MSWE scenes in raster (.tif) and vector (.shp) format. References Cited Jones, J.W., 2015, Efficient Wetland Surface Water Detection and Monitoring via Landsat: Comparison with in situ Data from the Everglades Depth Estimation Network. Remote Sensing, 7, 12503–12538. Jones, J.W., 2019, Improved Automated Detection of Subpixel-Scale Inundation—Revised Dynamic Surface Water Extent (DSWE) Partial Surface Water Tests. Remote Sensing, 11, 374. https://doi.org/10.3390/rs11040374 Walker, J.J., Petrakis, R.E., and Soulard, C.E., 2020, Implementation of a Surface Water Extent Model using Cloud-Based Remote Sensing - Code and Maps: U.S. Geological Survey data release, https://doi.org/10.5066/P9LH9YYF.
R
Data from: Unveiling the Impact of User-Agent Reduction and Client Hints: A...
data.ru.nl
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gunes Acar; Senol, A. (2023). Unveiling the Impact of User-Agent Reduction and Client Hints: A Measurement Study (WPES'23) [Dataset]. http://doi.org/10.34973/86ks-gf89
Explore at:
(21570613845 bytes)Available download formats
Unique identifier
https://doi.org/10.34973/86ks-gf89
Dataset updated
Nov 27, 2023
Dataset provided by
Radboud University
Authors
Gunes Acar; Senol, A.
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In recent years, browsers reduced the identifying information in user-agent strings to enhance user privacy. However, Chrome has also introduced high-entropy user-agent client hints (UA-CH) and new JavaScript API to provide access to specific browser details. The study assesses the impact of these changes on the top 100,000 websites by using an instrumented crawler to measure access to high-entropy browser features via UA-CH HTTP headers and the JavaScript API. It also investigates whether tracking, advertising, and browser fingerprinting scripts have started using these new client hints and the JavaScript API.

By Asuman Senol and Gunes Acar. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bonifácio, Rodrigo (2020). Mining Rule Violations in JavaScript Code Snippets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2593817

Data from: Mining Rule Violations in JavaScript Code Snippets

Explore at:

Dataset updated

Jan 24, 2020

Dataset provided by

Bonifácio, Rodrigo
Moraes, João Pedro
Smethurst, Guilherme
Pinto, Gustavo
Ferreira Campos, Uriel

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Content of this repository This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge

Github Repository with the software used : here.

DATASET The dataset was retrived utilizing google bigquery and dumped to a csv file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information : 1. The Id of the question (PostId) 2. The Content (in this case the code block) 3. the lenght of the code block 4. the line count of the code block 5. The score of the post 6. The title

A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.

Filtered Dataset:

Extracting code from CSV We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.

Running ESlint Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.

Number of Violations per Rule This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.

Number of violations per Category As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.

Individual Reports This information was extracted from the json reports, it's a csv file with PostID and violations per rule.

Rules The file Rules with categories contains all the rules used and their categories.

Clear search

Close search

Google apps

Main menu

Data from: Mining Rule Violations in JavaScript Code Snippets

Github Repository with the software used : here.

Dataset collected by JSIsolate

code-text-javascript

mnist

Open data

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

Data from: Polars

Local Government Association of SA - API Statewide Local Government Datasets...

Dataset Freshness Report for data.maryland.gov - As of May 04 2018

js-fakes-4bars

[Database] Urban Water Consumption at Multiple Spatial and Temporal Scales....

Number of open source projects and versions worldwide 2023, by ecosystem

javascript

Context

Content

Acknowledgements

Inspiration

Custom Built Data Collection Tools

SparseBeads Dataset

OpenStreetMap

Downloads:

Data access APIs:

The .osm format

Simple embedded maps

Help build the map!

Ads.txt / App-ads.txt for advertisement compliance

ckanext-datatablesview

Klamath Marsh January Through May Maximum Surface Water Extent, 1985-2021

Data from: Unveiling the Impact of User-Agent Reduction and Client Hints: A...

Data from: Mining Rule Violations in JavaScript Code Snippets

Github Repository with the software used : here.