Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The appendix of our ICSE 2018 paper "Search-Based Test Data Generation for SQL Queries: Appendix".
The appendix contains:
Facebook
TwitterAttribution-NonCommercial-NoDerivs 2.5 (CC BY-NC-ND 2.5)https://creativecommons.org/licenses/by-nc-nd/2.5/
License information was derived automatically
NADA (Not-A-Database) is an easy-to-use geometric shape data generator that allows users to define non-uniform multivariate parameter distributions to test novel methodologies. The full open-source package is provided at GIT:NA_DAtabase. See Technical Report for details on how to use the provided package.
This database includes 3 repositories:
NADA_Dis: Is the model able to correctly characterize/Disentangle a complex latent space?The repository contains 3x100,000 synthetic black and white images to test the ability of the models to correctly define a proper latent space (e.g., autoencoders) and disentangle it. The first 100,000 images contain 4 shapes and uniform parameter space distributions, while the other images have a more complex underlying distribution (truncated Gaussian and correlated marginal variables).
NADA_OOD: Does the model identify Out-Of-Distribution images?The repository contains 100,000 training images (4 different shapes with 3 possible colors located in the upper left corner of the canvas) and 6x100,000 increasingly different sets of images (changing the color class balance, reducing the radius of the shape, moving the shape to the lower left corner) providing increasingly challenging out-of-distribution images. This can help to test not only the capability of a model, but also methods that produce reliability estimates and should correctly classify OOD elements as "unreliable" as they are far from the original distributions.
NADA_AlEp: Does the model distinguish between different types (Aleatoric/Epistemic) of uncertainties?The repository contains 5x100,000 images with different type of noise/uncertainties:
NADA_AlEp_0_Clean: Dataset clean of noise to use as a possible training set.
NADA_AlEp_1_White_Noise: Epistemic white noise dataset. Each image is perturbed with an amount of white noise randomly sampled from 0% to 90%.
NADA_AlEp_2_Deformation: Dataset with Epistemic deformation noise. Each image is deformed by a randomly amount uniformly sampled between 0% and 90%. 0% corresponds to the original image, while 100% is a full deformation to the circumscribing circle.
NADA_AlEp_3_Label: Dataset with label noise. Formally, 20% of Triangles of a given color are missclassified as a Square with a random color (among Blue, Orange, and Brown) and viceversa (Squares to Triangles). Label noise introduces \textit{Aleatoric Uncertainty} because it is inherent in the data and cannot be reduced.
NADA_AlEp_4_Combined: Combined dataset with all previous sources of uncertainty.
Each image can be used for classification (shape/color) or regression (radius/area) tasks.
All datasets can be modified and adapted to the user's research question using the included open source data generator.
Facebook
TwitterDESCRIPTION
The TAU Spatial Room Impulse Response Database (TAU-SRIR DB) database contains spatial room impulse responses (SRIRs) captured in various spaces of Tampere University (TAU), Finland, for a fixed receiver position and multiple source positions per room, along with separate recordings of spatial ambient noise captured at the same recording point. The dataset is intended for emulation of spatial multichannel recordings for evaluation and/or training of multichannel processing algorithms in realistic reverberant conditions and over multiple rooms. The major distinct properties of the database compared to other databases of room impulse responses are:
Capturing in a high resolution multichannel format (32 channels) from which multiple more limited application-specific formats can be derived (e.g. tetrahedral array, circular array, first-order Ambisonics, higher-order Ambisonics, binaural).
Extraction of densely spaced SRIRs along measurement trajectories, allowing emulation of moving source scenarios.
Multiple source distances, azimuths, and elevations from the receiver per room, allowing emulation of complex configurations for multi-source methods.
Multiple rooms, allowing evaluation of methods at various acoustic conditions, and training of methods with the aim of generalization on different rooms.
The RIRs were collected by staff of TAU between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.
NOTE: This database is a work-in-progress. We intend to publish additional rooms, additional formats, and potentially higher-fidelity versions of the captured responses in the near future, as new versions of the database in this repository.
REPORT AND REFERENCE
A compact description of the dataset, recording setup, recording procedure, and extraction can be found in:
Politis., Archontis, Adavanne, Sharath, & Virtanen, Tuomas (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan.
available here. A more detailed report specifically focusing on the dataset collection and properties will follow.
AIM
The dataset can be used for generating multichannel or monophonic mixtures for testing or training of methods under realistic reverberation conditions, related to e.g. multichannel speech enhancement, acoustic scene analysis, and machine listening, among others. It is especially suitable for the follow application scenarios:
monophonic and multichannal reverberant single- or multi-source speech in multi-room reverberant conditions
monophonic and multichannel polyphonic sound events in multi-room reverberant conditions
single-source and multi-source localization in multi-room reverberant conditions, in static or dynamic scenarios
single-source and multi-source tracking in multi-room reverberant conditions, in static or dynamic scenarios
sound event localization and detection in multi-room reverberant conditions, in static or dynamic scenarios
SPECIFICATIONS
The SRIRs were captured using an Eigenmike spherical microphone array. A Genelec G Three loudspeaker was used to playback a maximum length sequence (MLS) around the Eigenmike. The SRIRs were obtained in the STFT domain using a least-squares regression between the known measurement signal (MLS) and far-field recording independently at each frequency. In this version of the dataset the SRIRs and ambient noise are downsampled to 24kHz for compactness.
The currently published SRIR set was recorded at nine different indoor locations inside the Tampere University campus at Hervanta, Finland. Additionally, 30 minutes of ambient noise recordings were collected at the same locations with the IR recording setup unchanged. SRIR directions and distances differ with the room. Possible azimuths span the whole range of $\phi\in[-180,180)$, while the elevations span approximately a range between $\theta\in[-45,45]$ degrees. The currently shared measured spaces are as follows:
Large open space in underground bomb shelter, with plastic-coated floor and rock walls. Ventilation noise. Circular source trajectory.
Large open gym space. Ambience of people using weights and gym equipment in adjacent rooms. Circular source trajectory.
Small classroom (PB132) with group work tables and carpet flooring. Ventilation noise. Circular source trajectory.
Meeting room (PC226) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.
Lecture hall (SA203) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.
Small classroom (SC203) with group work tables and carpet flooring. Ventilation noise. Linear source trajectory.
Large classroom (SE203) with hard floor and rows of desks. Ventilation noise. Linear source trajectory.
Lecture hall (TB103) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.
Meeting room (TC352) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.
The measurement trajectories were organised in groups, with each group being specified by a circular or linear trace at the floor at a certain distance from the z-axis of the microphone. For circular trajectories two ranges were measured, a close and a far one, except room TC352, where the same range was measured twice, but with different furniture configuration and open or closed doors. For linear trajectories also two ranges were measured, close and far, but with linear paths at either side of the array, resulting in 4 unique trajectory groups, with the exception of room SA203 where 3 ranges were measured resulting on 6 trajectory groups. Linear trajectory groups are always parallel to each other, in the same room.
Each trajectory group had multiple measurement trajectories, following the same floor path, but with the source at different heights.
The SRIRs are extracted from the noise recordings of the slowly moving source across those trajectories, at an angular spacing of approximately every 1 degree from the microphone. Instead of extracting SRIRs at equally spaced points along the path (e.g. every 20cm), this extraction scheme was found more practical for synthesis purposes, making emulation of moving sources at an approximately constant angular speed easier.
More details on the trajectory geometries can be found in the README file and the measinfo.mat file.
RECORDING FORMATS
As with the DCASE2019-2021 datasets, currently the database is provided in two formats, first-order Ambisonics, and a tetrahedral microphone array - both derived from the Eigenmike 32-channel recordings. For more details on the format specifications, check the README.
We intend to add additional formats of the database, of both higher resolution (e.g. higher-order Ambisonics), or lower resolution (e.g. binaural).
REFERENCE DOAs
For each extracted RIR across a measurement trajectory there is a direction-of-arrival (DOA) associated with it, which can be used as the reference direction for sound source spatialized using this RIR, for training or evaluation purposes. The DOAs were determined acoustically from the extracted RIRs, by windowing the direct sound part and applying a broadband version of the MUSIC localization algorithm on the windowed multichannel signal.
The DOAs are provided as Cartesian components [x, y, z] of unit length vectors.
SCENE GENERATOR
A set of routines is shared, here termed scene generator, that can spatialize a bank of sound samples using the SRIRs and noise recordings of this library, to emulate scenes for the two target formats. The code is similar to the one used to generate the TAU-NIGENS Spatial Sound Events 2021 dataset, and has been ported to Python from the original version written in Matlab.
The generator can be found here, along with more details on its use.
The generator at the moment is set to work with the NIGENS sound event sample database, and the FSD50K sound event database, but additional sample banks can be added with small modifications.
The dataset together with the generator has been used by the authors in the following public challenges:
DCASE 2019 Challenge Task 3, to generate the TAU Spatial Sound Events 2019 dataset (development/evaluation)
DCASE 2020 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2020 dataset
DCASE2021 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2021 dataset
DCASE2022 Challenge Task 3, to generate additional SELD synthetic mixtures for training the task baseline
NOTE: The current version of the generator is work-in-progress, with some code being quite "rough". If something does not work as intended or it is not clear what certain parts do, please contact us.
DATASET STRUCTURE
The dataset contains a folder of the SRIRs (TAU-SRIR_DB), with all the SRIRs per room in a single MAT file. The file rirdata.mat contains some general information such as sample rate, format specifications, and most importantly the DOAs of every extracted SRIR. The file measinfo.mat contains measurement and recording information in each room. Finally, the dataset contains a folder of spatial ambient noise recordings (TAU-SNoise_DB), with one subfolder per room having two audio recordings fo the spatial ambience, one for each format, FOA or MIC. For more information on how to SRIRs and DOAs are organized, check the README.
DOWNLOAD
The files TAU-SRIR_DB.z01, ..., TAU-SRIR_DB.zip contain the SRIRs and measurement info files.
The files TAU-SNoise_DB.z01, ..., TAU-SNoise_DB.zip
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TESTAR extracted State Model datasets with TESTAR tool using MyThaiStar web application as System Under Test (SUT). This State Models has been generated to be used as an example to be automatically generated and introduced locally in DECODER PKM, from H2020 DECODER Project. TESTAR tool is an open source tool (www.testar.org) for automated testing through graphical user interface (GUI) currently being developed by the Universitat Politecnica de Valencia and the Open University of the Netherlands. MyThaiStar (github.com/devonfw/my-thai-star) is the reference application that Capgemini uses internally to promote best programming practices and the correct use of last technologies. It’s is developed with Devon Framework, the standard tool for development at the company. PKM is the Persistent Knowledge Monitor developed as main infrastructure from H2020 DECODER Project (www.decoder-project.eu) under grant agreement number 824231. As TESTAR explores automatically the SUT, it will use the Document Object Model (DOM) information extracted from MyThaiStar SUT, to generate and save a TESTAR State Model in the OrientDB graph database. This model contains information about the Widgets, States and Actions, that were found in the SUT. - MyThaiStar.json.gz: JSON file exported from OrientDB that contains a database with the TESTAR State Model. It can be imported into OrientDB using the TESTAR tool, to analyze and interact with the State Model. - ArtefactStateModel_MyThaiStar_2020.1_zpnffj5c3407972370_2020-06-15_12h14m24s: for DECODER project purposes, the knowledge extracted with TESTAR in the generation of the State Model has been summarized and referenced in an artifact JSON file to be adapted to PKM input requirements.
Facebook
TwitterSuccess.ai SaaS Platform: Revolutionizing B2B Lead Generation & Email Outreach
Pricing Success.ai offers unparalleled value with a transparent pricing model. Start for free and explore the platform’s robust features, including unlimited access to 700M+ verified B2B leads. Affordable upgrade plans ensure you get the best value for your business growth.
Login Easily log in to your account and access your personalized dashboard. Seamlessly manage your leads, campaigns, and outreach strategies all in one place.
Get Started for FREE Success.ai allows you to begin your journey at no cost. Test the platform’s powerful capabilities with no credit card required. Experience features like AI-driven lead search, email crafting, and outreach optimization before committing to a plan.
Book a Demo Curious about how Success.ai can transform your business? Book a demo to see the platform in action. Learn how to streamline your lead generation process, maximize ROI, and scale your outreach efforts with ease.
Why Success.ai? 700M+ Professionals Success.ai provides access to the largest verified database of over 700 million global professional contacts. Every lead is rigorously verified to ensure accuracy, enabling you to target decision-makers with precision.
Find and Win Your Ideal Customers The platform’s advanced search features let you locate prospects by name, company, or email. Whether you're targeting CEOs, sales managers, or industry-specific professionals, Success.ai helps you find and connect with your ideal audience.
AI-Powered Capabilities Success.ai leverages AI to enhance every aspect of your sales process. From crafting hyper-personalized cold emails to filtering leads by industry, revenue, or company size, the platform ensures your outreach efforts are efficient and effective.
Solutions for Every Business Need Sales Leaders Accelerate your sales cycle with tools designed to seamlessly book new deals and drive revenue growth.
Startups Find, contact, and win clients globally with the power of Success.ai. Tailored tools help startups scale quickly with minimal resources.
Marketing Agencies Grow your client base and enhance your campaigns with targeted lead generation and cold email strategies.
Lead Generation Agencies Unlock the potential of your campaigns with access to the world’s largest verified B2B database. Drive conversions and client satisfaction with precision-targeted outreach.
Unmatched Features for Growth Unlimited B2B Leads: Access 700M+ verified contacts to fuel your pipeline. AI-Powered Writer: Craft personalized emails effortlessly, improving engagement and response rates. Unlimited Email Warmup: Ensure your emails land in inboxes, avoiding spam folders. Unified CRM: Manage leads, campaigns, and responses in one streamlined platform. 24/7 Live Support: Dedicated support ensures your success at every step. What Users Say Success.ai has received glowing reviews from over 10,000 satisfied companies. From startups to established enterprises, users praise the platform’s ease of use, robust features, and significant ROI.
For example, Muhammad Sulaiman says, “This tool has made filling our sales pipeline easier than ever. The AI writer and extensive database have been game-changers.”
Get Started Today Join the ranks of businesses achieving hypergrowth with Success.ai. With unlimited access to the largest verified B2B database, advanced AI tools, and unmatched affordability, Success.ai is the ultimate platform for sales success.
Start for FREE or Book a Demo today to see how Success.ai can transform your lead generation efforts!
Register on the platform: app.success.ai See our prices: https://www.success.ai/pricing Book a demo: https://calendly.com/d/cmh7-chj-pcz/success-ai-demo-session?
Facebook
TwitterMNIST is a subset of a larger set available from NIST (it's copied from http://yann.lecun.com/exdb/mnist/)
The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. . Four files are available:
Many methods have been tested with this training set and test set (see http://yann.lecun.com/exdb/mnist/ for more details)
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
# Analysis and Figure Generation Scripts for the Paper "Differences between Neurodivergent and Neurotypical Software Engineers: Analyzing the 2022 Stack Overflow Survey"
This repository contains the necessary R scripts and session data to re-run all tests reported in the paper, as well as generate the figures we use (and a few others we did not use).
## Files
- SO_analysis_EASE25.R: The main R script executing all other parts. Running this script will calculate and the p-values of all tests, sorted by condition (lines 10 to 602). These are the same as reported in the paper in Tables 1 to 7. Additionally, figures will be generated (lines 603 onwards).
- SO_data_filtered_sampled.RData: A RData file containing the filtered and sampled data and functions necessary to run the tests/generate the figures.
- 01_SO_preprocessing.R and 02_SO_random_sampling.R: Scripts containing filtering logic and sampling logic. If those files are executed within SO_analysis_EASE25.R instead of loading the RData file (lines 5 to 8), the filtering and sampling is re-run. Note that this results in different random samples, and therefore different results than our paper. Also, the effect size calculations then have to be adjusted to the results that are significant (lines 596 to 601).
- generatedGraphs.zip: The graph files generated by the main R script.
- paperFigures.zip: The graphs we used in the paper. These are the generated graphs, but edited for readability.
## License
The Public 2022 Stack Overflow Developer Survey Results is made available under the Open Database License (ODbL): http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/).
We hereby would like to attribute Stack Overflow as the source of these results. All data we make available (i.e., the RData file) is a derivative work, and as such also shared under the under the ODbl.
Facebook
TwitterTypically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."
Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
Image from stocksnap.io.
Analyses for this dataset could include time series, clustering, classification and more.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The appendix of our ICSE 2018 paper "Search-Based Test Data Generation for SQL Queries: Appendix".
The appendix contains: