Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Content of this repository This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge
DATASET The dataset was retrived utilizing google bigquery and dumped to a csv file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information : 1. The Id of the question (PostId) 2. The Content (in this case the code block) 3. the lenght of the code block 4. the line count of the code block 5. The score of the post 6. The title
A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.
Filtered Dataset:
Extracting code from CSV We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.
Running ESlint Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.
Number of Violations per Rule This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.
Number of violations per Category As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.
Individual Reports This information was extracted from the json reports, it's a csv file with PostID and violations per rule.
Rules The file Rules with categories contains all the rules used and their categories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains: 1) the object access logs, 2) script isolation policies and 3) script write conflicts collected by JSIsolate on Alexa top 1K websites. We analyze the access logs to generate the conflict summary files and script isolation policies that assign static scripts to an execution context.
We split the whole dataset of object access logs into 10 subsets, i.e., access-0.zip ~ access-9.zip.
The isolation policies are released in url-level-policies.zip and domain-level-policies.zip.
The object accesses (i.e., reads and writes) are saved in [rank].[main/sub].[frame_cnt].access (e.g., 1.main.0.access) files.
The URLs of frames (i.e., main frames and iframes) are saved in [rank].[main/sub].[frame_cnt].frame (e.g., 1.main.0.frame) files.
The maps from script IDs to script URLs are saved in [rank].[main/sub].[frame_cnt].id2url (e.g., 1.main.0.id2url) files.
The maps from script IDs to their parent script (script that includes it,
The source code of scripts are saved in [rank].[main/sub].[frame_cnt].[script_ID].script (e.g., 1.main.0.17.script) files.
Note that we perform monkey testing during the data collection, which may cause the page to navigate to a different URL. Therefore, there could be multiple main frame files.
The conflicts are dumped to [rank].conflicts (e.g., 1.conflicts) files.
The isolation policies are dumped to [rank].configs (e.g., 1.configs) and [rank].configs-simple (e.g., 1.configs-simple) files.
Note that the *.configs files also include the read/write operations that cause JSIsolate to assign a script from third-party domain to the first-party context.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset is imported from CodeXGLUE and pre-processed using their script.
Where to find in Semeru:
The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-text/javascript in Semeru
CodeXGLUE -- Code-To-Text
Task Definition
The task is to generate natural language comments for a code, and evaluted by smoothed bleu-4 score.
Dataset
The dataset we use comes from CodeSearchNet and we filter the dataset as the following:… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-text-javascript.
The MNIST database of handwritten digits.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
Download or connect to open data endpoints Get data Download data as spreadsheet, kml, shapefile or connect to service APIs to stay up to date. Create maps Create maps, analyse and discover trends. Watch video instructions. Code apps Make applications with our data.ArGIS API for Javascript. Categories City Council Assets, amenities and public space Council services and facilities Culture, leisure and sport Economy and business Environment and climate Planning Transport and access View all Terms Unless otherwise stated, data products available from the data hub are published under Creative Commons licences. For terms of use and more information see site Disclaimer. Contact If you have a question, comments, or requests for interactive maps and data, we would love to hear from you. Council business For information on rates, development applications, strategies, reports and other council business, see the City of Sydney's main website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Billy Bob Brumley (Tampere University, Tampere, Finland)
Juha Nurmi (Tampere University, Tampere, Finland)
Mikko Niemelä (Cyber Intelligence House, Singapore)
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Python wheel for Polars.
For use especially in code competitions where Internet access is restricted.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided comes from the Local Government Association of South Australia’s statewide Metabase which is part of its Electronic Services Program initiative. For more information on the Electronic Services Program. This data is used to support the Local Government Association's My Local Services App http://www.lga.sa.gov.au/mylocal initiative. API for the following statewide Local Government datasets: Elected Members (Mayors and Councillors), Events, Libraries, Parks and Councils. The following SDK’s are available for developers to access data stored in Parse: iOS, OSX, Android, JavaScript, .Net, REST API.
This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.
This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.
This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.
JSFakes (Dr. Tristan Behrens).
This is a tokenized version of the JS-Fakes dataset by Omar Peracha. The original dataset can be found here: js-fakes.git The representation is four tracks with four bars per track.
Purpose.
This dataset is a good starting point for Music Generation. You could train GPT-2 on the samples to compose music.
Contact.
Find me on LinkedIn and say hello. If you find and issue or have a feature request, please contact me. Please be so… See the full description on the dataset page: https://huggingface.co/datasets/TristanBehrens/js-fakes-4bars.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the complete catalog of datasets and publications reviewed in: Di Mauro A., Cominola A., Castelletti A., Di Nardo A.. Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water 2021.The complete catalog contains:
92 state-of-the-art water demand datasets identified at the district, household, and end use scales;
120 related peer-reviewed publications;
57 additional datasets with electricity demand data at the end use and household scales.
The following metadata are reported, for each dataset:
Authors
Year
Location
Dataset Size
Time Series Length
Time Sampling Resolution
Access Policy.
The following metadata are reported, for each publication:
Authors
Year
Journal
Title
Spatial Scale
Type of Study: Survey (S) / Dataset (D)
Domain: Water (W)/Electricity (E)
Time Sampling Resolution
Access Policy
Dataset Size
Time Series Length
Location
Authors: Anna Di Mauro - Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | anna.dimauro@unicampania.it; Andrea Cominola - Chair of Smart Water Networks | Technische Universität Berlin - Einstein Center Digital Future (Germany) | andrea.cominola@tu-berlin.de; Andrea Castelletti - Department of Electronics, Information and Bioengineering | Politecnico di Milano (Italy) | andrea.castelletti@polimi.it Armando Di Nardo -Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | armando.dinardo@unicampania.it
Citation and reference:
If you use this database, please consider citing our paper
Di Mauro, A., Cominola, A., Castelletti, A., & Di Nardo, A. (2021). Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water, 13(1), 36, https://doi.org/10.3390/w13010036
Updates and Contributions:
The catalogue stored in this public repository can be collaboratively updated as more datasets become available. The authors will periodically update it to a new version.
New requests can be submitted to the authors, so that the dataset collection can be improved by different contributors. Contributors will be cited, step by step, in the updated versions of the dataset catalogue.
Updates history:
March 1st, 2021 - Pacheco, C.J.B., Horsburgh, J.S., Tracy, J.R. (Utah State University, Logan, UT - USA) --- The dataset associated with paper Bastidas Pacheco, C.J.; Horsburgh, J.S.; Tracy, R.J.. A Low-Cost, Open Source Monitoring System for Collecting High Temporal Resolution Water Use Data on Magnetically Driven Residential Water Meters. Sensors 2020, 20, 3655. is published in the HydroShare repository, where it is available as an OPEN dataset. Data can be found here: https://doi.org/10.4211/hs.4de42db6485f47b290bd9e17b017bb51
At the end of 2022, there were approximately *** million JavaScript open source projects in the Maven Central Repository and around ** million JavaScript project versions worldwide. While JavaScript is the largest ecosystem in the Maven Central Repository, Java, Python, and .NET also have thousands of available open source projects.
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
?
Our advanced data extraction tool is designed to empower businesses, researchers, and developers by providing an efficient and reliable way to collect and organize information from any online source. Whether you're gathering market insights, monitoring competitors, tracking trends, or building data-driven applications, our platform offers a perfect solution for automating the extraction and processing of structured data from websites. With seamless integration of AI, our tool takes the process a step further, enabling smarter, more refined data extraction that adapts to your needs over time.
In a digital world where information is continuously updated, timely access to data is critical. Our tool allows you to set up automated data extraction schedules, ensuring that you always have access to the most current information. Whether you're tracking stock prices, monitoring social media trends, or gathering product information, you can configure extraction schedules to suit your needs. Our AI-powered system also allows the tool to learn and optimize based on the data it collects, improving efficiency and accuracy with repeated use. From frequent updates by the minute to less frequent daily, weekly, or monthly collections, our platform handles it all seamlessly.
Our tool doesn’t just gather data—it organizes it. The extracted information is automatically structured into easily usable formats like CSV, JSON, or XML, making it ready for immediate use in applications, databases, or reports. We offer flexibility in the output format to ensure smooth integration with your existing tools and workflows. With AI-enhanced data parsing, the system recognizes and categorizes information more effectively, providing higher quality data for analysis, visualization, or importing into third-party systems.
Whether you’re collecting data from a handful of pages or millions, our system is built to scale. We can handle both small and large-scale extraction tasks with high reliability and performance. Our infrastructure ensures fast, efficient processing, even for the most demanding tasks. With parallel extraction capabilities, you can gather data from multiple sources simultaneously, reducing the time it takes to compile large datasets. AI-powered optimization further improves performance, making the extraction process faster and more adaptive to fluctuating data volumes.
Our tool doesn’t stop at extraction. We provide options for enriching the data by cross-referencing it with other sources or applying custom rules to transform raw information into more meaningful insights. This leads to a more insightful and actionable dataset, giving you a competitive edge through superior data-driven decision-making.
Modern websites often use dynamic content generated by JavaScript, which can be challenging to extract. Our tool, enhanced with AI, is designed to handle even the most complex web architectures, including dynamic loading, infinite scrolling, and paginated content.
Finally, our platform provides detailed logs of all extraction activities, giving you full visibility into the process. With built-in analytics, AI-powered insights can help you monitor progress, and identify issues.
In today’s fast-paced digital world, access to accurate, real-time data is critical for success. Our AI-integrated data extraction tool offers a reliable, flexible, and scalable solution to help you gather and organize the information you need with minimal effort. Whether you’re looking to gain a competitive edge, conduct in-depth research, or build sophisticated applications, our platform is designed to meet your needs and exceed expectations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The presented data set, inspired by the SophiaBeads Dataset Project for X-ray Computed Tomography, is collected for studies involving sparsity-regularised reconstruction. The aim is to provide tomographic data for various samples where the sparsity in the image varies.
This dataset is made available as part of the publication
"SparseBeads Data: Benchmarking Sparsity-Regularized Computed Tomography", Jakob S Jørgensen et al, 2017. Meas. Sci. Technol. 28 124005.
Direct link: https://doi.org/10.1088/1361-6501/aa8c29.
This manuscript is published as part of Special Feature on Advanced X-ray Tomography (open access). We refer the users to this publication for an extensive detail in the experimental planning and data acquisition.
Each zipped data folder includes
The meta data for data acquisition and geometry parameters of the scan (.xtekct and .ctprofile.xml).
A sinogram of the central slice (CentreSlice > Sinograms > .tif) along with meta data for the 2D slice (.xtek2dct and .ct2dprofile.xml),
List of projection angles (.ang)
and a 2D FDK reconstruction using the CTPro reconstruction suite (RECON2D > .vol) with volume visualisation parameters (.vgi), added as a reference.
We also include an extra script for those that wish to use the SophiaBeads Dataset Project Codes, which essentially replaces the main script provided, sophiaBeads.m (visit https://zenodo.org/record/16539). Please note that sparseBeads.m script will have to be placed in the same folder as the project codes. The latest version of this script can be found here: https://github.com/jakobsj/SparseBeads_code
For more information, please contact
OpenStreetMap (openstreetmap.org) is a global collaborative mapping project, which offers maps and map data released with an open license, encouraging free re-use and re-distribution. The data is created by a large community of volunteers who use a variety of simple on-the-ground surveying techniques, and wiki-syle editing tools to collaborate as they create the maps, in a process which is open to everyone. The project originated in London, and an active community of mappers and developers are based here. Mapping work in London is ongoing (and you can help!) but the coverage is already good enough for many uses.
Browse the map of London on OpenStreetMap.org
The whole of England updated daily:
For more details of downloads available from OpenStreetMap, including downloading the whole planet, see 'planet.osm' on the wiki.
Download small areas of the map by bounding-box. For example this URL requests the data around Trafalgar Square:
http://api.openstreetmap.org/api/0.6/map?bbox=-0.13062,51.5065,-0.12557,51.50969
Data filtered by "tag". For example this URL returns all elements in London tagged shop=supermarket:
http://www.informationfreeway.org/api/0.6/*[shop=supermarket][bbox=-0.48,51.30,0.21,51.70]
The format of the data is a raw XML represention of all the elements making up the map. OpenStreetMap is composed of interconnected "nodes" and "ways" (and sometimes "relations") each with a set of name=value pairs called "tags". These classify and describe properties of the elements, and ultimately influence how they get drawn on the map. To understand more about tags, and different ways of working with this data format refer to the following pages on the OpenStreetMap wiki.
Rather than working with raw map data, you may prefer to embed maps from OpenStreetMap on your website with a simple bit of javascript. You can also present overlays of other data, in a manner very similar to working with google maps. In fact you can even use the google maps API to do this. See OSM on your own website for details and links to various javascript map libraries.
The OpenStreetMap project aims to attract large numbers of contributors who all chip in a little bit to help build the map. Although the map editing tools take a little while to learn, they are designed to be as simple as possible, so that everyone can get involved. This project offers an exciting means of allowing local London communities to take ownership of their part of the map.
Read about how to Get Involved and see the London page for details of OpenStreetMap community events.
In today's digital landscape, data transparency and compliance are paramount. Organizations across industries are striving to maintain trust and adhere to regulations governing data privacy and security. To support these efforts, we present our comprehensive Ads.txt and App-Ads.txt dataset.
Key Benefits of Our Dataset:
The Power of Ads.txt & App-Ads.txt: Ads.txt (Authorized Digital Sellers) and App-Ads.txt (Authorized Sellers for Apps) are industry standards developed by the Interactive Advertising Bureau (IAB) to increase transparency and combat ad fraud. These files specify which companies are authorized to sell digital advertising inventory on a publisher's website or app. Understanding and maintaining these files is essential for data compliance and the prevention of unauthorized ad sales.
How Can You Benefit? - Data Compliance: Ensure that your organization adheres to industry standards and regulations by monitoring Ads.txt and App-Ads.txt files effectively. - Ad Fraud Prevention: Identify unauthorized sellers and take action to prevent ad fraud, ultimately protecting your revenue and brand reputation. - Strategic Insights: Leverage the data in these files to gain insights into your competitors, partners, and the broader digital advertising landscape. - Enhanced Decision-Making: Make data-driven decisions with confidence, armed with accurate and up-to-date information about your advertising partners. - Global Reach: If your operations span the globe, our dataset provides insights into the Ads.txt and App-Ads.txt files of publishers worldwide.
Multiple Data Formats for Your Convenience: - CSV (Comma-Separated Values): A widely used format for easy data manipulation and analysis in spreadsheets and databases. - JSON (JavaScript Object Notation): Ideal for structured data and compatibility with web applications and APIs. - Other Formats: We understand that different organizations have different preferences and requirements. Please inquire about additional format options tailored to your needs.
Data That You Can Trust:
We take data quality seriously. Our team of experts curates and updates the dataset regularly to ensure that you receive the most accurate and reliable information available. Your confidence in the data is our top priority.
Seamless Integration:
Integrate our Ads.txt and App-Ads.txt dataset effortlessly into your existing systems and processes. Our goal is to enhance your compliance efforts without causing disruptions to your workflow.
In Conclusion:
Transparency and compliance are non-negotiable in today's data-driven world. Our Ads.txt and App-Ads.txt dataset empowers you with the knowledge and tools to navigate the complexities of the digital advertising ecosystem while ensuring data compliance and integrity. Whether you're a Data Protection Officer, a data compliance professional, or a business leader, our dataset is your trusted resource for maintaining data transparency and safeguarding your organization's reputation and revenue.
Get Started Today:
Don't miss out on the opportunity to unlock the power of data transparency and compliance. Contact us today to learn more about our Ads.txt and App-Ads.txt dataset, available in multiple formats and tailored to your specific needs. Join the ranks of organizations worldwide that trust our dataset for a compliant and transparent future.
The datatablesview extension for CKAN enhances the display of tabular datasets within CKAN by integrating the DataTables JavaScript library. As a fork of a previous DataTables CKAN plugin, this extension aims to provide improved functionality and maintainability for presenting data in a user-friendly and interactive tabular format. This tool focuses on making data more accessible and easier to explore directly within the CKAN interface. Key Features: Enhanced Data Visualization: Transforms standard CKAN dataset views into interactive tables using the DataTables library, providing a more engaging user experience compared to plain HTML tables. Interactive Table Functionality: Includes features such as sorting, filtering, and pagination within the data table, allowing users to easily navigate and analyze large datasets directly in the browser. Improved Data Accessibility: Makes tabular data more accessible to a wider range of users by providing intuitive tools to explore and understand the information. Presumed Customizable Appearance: Given that it is based on DataTables, users will likely be able to customize the look and feel of the tables through DataTables configuration options (note: this is an assumption based on standard DataTables usage and may require coding). Use Cases (based on typical DataTables applications): Government Data Portals: Display complex government datasets in a format that is easy for citizens to search, filter, and understand, enhancing transparency and promoting data-driven decision-making. For example, presenting financial data, population statistics, or environmental monitoring results. Research Data Repositories: Allow researchers to quickly explore and analyze large scientific datasets directly within the CKAN interface, facilitating data discovery and collaboration. Corporate Data Catalogs: Enable business users to easily access and manipulate tabular data relevant to their roles, improving data literacy and enabling data-informed business strategies. Technical Integration (inferred from CKAN extension structure): The extension likely operates by leveraging CKAN's plugin architecture to override the default dataset view for tabular data. Its implementation likely uses CKAN's templating system to render datasets using DataTables' JavaScript and CSS, enhancing data-viewing experience. Benefits & Impact: By implementing the datatablesview extension, organizations can improve the user experience when accessing and exploring tabular datasets within their CKAN instances. The enhanced interactivity and data exploration features can lead to increased data utilization, improved data literacy, and more effective data-driven decision-making within organizations and communities.
The U.S. Geological Survey Oregon Water Science Center, in cooperation with The Klamath Tribes initiated a project to understand changes in the surface-water extent of Klamath Marsh, Oregon and changes in groundwater levels within and surrounding the marsh. The initial phase of the study focused on developing datasets needed for future interpretive phases of the investigation. This data release documents the creation of a geospatial dataset of January through May maximum surface-water extent based on a model developed by John Jones (2015; 2019) to detect surface-water inundation within vegetated areas from satellite imagery. The Dynamic Surface Water Extent (DSWE) model uses Landsat at-surface reflectance imagery paired with a digital elevation model to classify pixels within a Landsat scene as one of the following types: “not water”, “water – high confidence”, “water – moderate confidence”, “wetland – moderate confidence”, “wetland – low confidence”, and “cloud/shadow/snow” (Jones, 2015; Walker and others, 2020). The model has been replicated by Walker and others (2020) for use within the Google Earth Engine (GEE, https://code.earthengine.google.com/) online geospatial processing platform. The GEE version of the DSWE model enables users who have limited computer processing power to access DSWE datasets. The JavaScript-based interface enables the selection of specific timeframes for analyzing surface water extent as well as creating composite scenes of maximum surface water extent (MSWE) over a specified timeframe. The GEE platform was used to create MSWE datasets showing maximum surface water inundation within the Klamath Marsh for the month of January through May during 1985 – 2021. The dataset presented here includes a summary file of maps and figures (.pdf), surface area calculations of January through May MSWE in tabular (.csv) format, study area polygon in vector (.shp) format, and 37 January through May MSWE scenes in raster (.tif) and vector (.shp) format. References Cited Jones, J.W., 2015, Efficient Wetland Surface Water Detection and Monitoring via Landsat: Comparison with in situ Data from the Everglades Depth Estimation Network. Remote Sensing, 7, 12503–12538. Jones, J.W., 2019, Improved Automated Detection of Subpixel-Scale Inundation—Revised Dynamic Surface Water Extent (DSWE) Partial Surface Water Tests. Remote Sensing, 11, 374. https://doi.org/10.3390/rs11040374 Walker, J.J., Petrakis, R.E., and Soulard, C.E., 2020, Implementation of a Surface Water Extent Model using Cloud-Based Remote Sensing - Code and Maps: U.S. Geological Survey data release, https://doi.org/10.5066/P9LH9YYF.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In recent years, browsers reduced the identifying information in user-agent strings to enhance user privacy. However, Chrome has also introduced high-entropy user-agent client hints (UA-CH) and new JavaScript API to provide access to specific browser details. The study assesses the impact of these changes on the top 100,000 websites by using an instrumented crawler to measure access to high-entropy browser features via UA-CH HTTP headers and the JavaScript API. It also investigates whether tracking, advertising, and browser fingerprinting scripts have started using these new client hints and the JavaScript API.
By Asuman Senol and Gunes Acar. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Content of this repository This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge
DATASET The dataset was retrived utilizing google bigquery and dumped to a csv file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information : 1. The Id of the question (PostId) 2. The Content (in this case the code block) 3. the lenght of the code block 4. the line count of the code block 5. The score of the post 6. The title
A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.
Filtered Dataset:
Extracting code from CSV We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.
Running ESlint Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.
Number of Violations per Rule This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.
Number of violations per Category As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.
Individual Reports This information was extracted from the json reports, it's a csv file with PostID and violations per rule.
Rules The file Rules with categories contains all the rules used and their categories.