https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore historical ownership and registration records by performing a reverse Whois lookup for the email address data-generator.com@contactprivacy.com..
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore the historical Whois records related to data-generator.com (Domain). Get insights into ownership history and changes over time.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Most scientists consider randomized experiments to be the best method available to establish causality. On the Internet, during the past twenty-five years, randomized experiments have become common, often referred to as A/B testing. For practical reasons, much A/B testing does not use pseudo-random number generators to implement randomization. Instead, hash functions are used to transform the distribution of identifiers of experimental units into a uniform distribution. Using two large, industry data sets, I demonstrate that the success of hash-based quasi-randomization strategies depends greatly on the hash function used: MD5 yielded good results, while SHA512 yielded less impressive ones.
The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Terms of Use Generator market is projected to grow from USD 3.42 billion in 2025 to USD 13.14 billion by 2033, with a CAGR of 18.5% during the forecast period. The increasing need for online platforms and applications, along with growing concerns about data privacy and security, is fueling the market growth. Moreover, the escalating adoption of mobile applications and e-commerce platforms has created a further need for clear and comprehensive Terms of Use agreements. North America is expected to dominate the Terms of Use Generator market. The region's developed digital infrastructure, widespread adoption of mobile devices, and stringent data privacy regulations are contributing to its dominance. The Asia Pacific region is projected to witness significant growth in the coming years, driven by the region's expanding internet penetration, rising smartphone usage, and increasing awareness about online data protection.
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Testing web APIs automatically requires generating input data values such as addressess, coordinates or country codes. Generating meaningful values for these types of parameters randomly is rarely feasible, which means a major obstacle for current test case generation approaches. In this paper, we present ARTE, the first semantic-based approach for the Automated generation of Realistic TEst inputs for web APIs. Specifically, ARTE leverages the specification of the API under test to extract semantically related values for every parameter by applying knowledge extraction techniques. Our approach has been integrated into RESTest, a state-of-the-art tool for API testing, achieving an unprecedented level of automation which allows to generate up to 100\% more valid API calls than existing fuzzing techniques (30\% on average). Evaluation results on a set of 26 real-world APIs show that ARTE can generate realistic inputs for 7 out of every 10 parameters, outperforming the results obtained by related approaches.
The Weather Generator Gridded Data consists of two products:
[1] statistically perturbed gridded 100-year historic daily weather data including precipitation [in mm], and detrended maximum and minimum temperature in degrees Celsius, and
[2] stochastically generated and statistically perturbed gridded 1000-year daily weather data including precipitation [in mm], maximum temperature [in degrees Celsius], and minimum temperature in degrees Celsius.
The base climate of this dataset is a combination of historically observed gridded data including Livneh Unsplit 1915-2018 (Pierce et. al. 2021), Livneh 1915-2015 (Livneh et. al. 2013) and PRISM 2016-2018 (PRISM Climate Group, 2014). Daily precipitation is from Livneh Unsplit 1915-2018, daily temperature is from Livneh 2013 spanning 1915-2015 and was extended to 2018 with daily 4km PRISM that was rescaled to the Livneh grid resolution (1/16 deg). The Livneh temperature was bias corrected by month to the corresponding monthly PRISM climate over the same period. Baseline temperature was then detrended by month over the entire time series based on the average monthly temperature from 1991-2020. Statistical perturbations and stochastic generation of the time series were performed by the Weather Generator (Najibi et al. 2024a and Najibi et al. 2024b).
The repository consists of 30 climate perturbation scenarios that range from -25 to +25 % change in mean precipitation, and from 0 to +5 degrees Celsius change in mean temperature. Changes in thermodynamics represent scaling of precipitation during extreme events by a scaling factor per degree Celsius increase in mean temperature and consists primarily of 7%/degree-Celsius with 14%/degree-Celsius as sensitivity perturbations. Further insight for thermodynamic scaling can be found in full report linked below or in Najibi et al. 2024a and Najibi et al. 2024b.
The data presented here was created by the Weather Generator which was developed by Dr. Scott Steinschneider and Dr. Nasser Najibi (Cornell University). If a separate weather generator product is desired apart from this gridded climate dataset, the weather generator code can be adopted to suit the specific needs of the user. The weather generator code and supporting information can be found here: https://github.com/nassernajibi/WGEN-v2.0/tree/main. The full report for the model and performance can be found here: https://water.ca.gov/-/media/DWR-Website/Web-Pages/Programs/All-Programs/Climate-Change-Program/Resources-for-Water-Managers/Files/WGENCalifornia_Final_Report_final_20230808.pdf
NOTES: 1. Please use this link to leave the data view to see the full description: https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Manifest-Data-CT-1984-2008/h6d8-qiar 2. Please Use ALL CAPS when searching using the "Filter" function on text such as: LITCHFIELD. But not needed for the upper right corner "Find in this Dataset" search where for example "Litchfield" can be used. Dataset Description: We know there are errors in the data although we strive to minimize them. Examples include: • Manifests completed incorrectly by the generator or the transporter - data was entered based on the incorrect information. We can only enter the information we receive. • Data entry errors – we now have QA/QC procedures in place to prevent or catch and fix a lot of these. • Historically there are multiple records of the same generator. Each variation in spelling in name or address generated a separate handler record. We have worked to minimize these but many remain. The good news is that as long as they all have the same EPA ID they will all show up in your search results. • Handlers provide erroneous data to obtain an EPA ID - data entry was based on erroneous information. Examples include incorrect or bogus addresses and names. There are also a lot of MISSPELLED NAMES AND ADDRESSES! • Missing manifests – Not every required manifest gets submitted to DEEP. Also, of the more than 100,000 paper manifests we receive each year, some were incorrectly handled and never entered. • Missing data – we know that the records for approximately 25 boxes of manifests, mostly prior to 1985 were lost from the database in the 1980’s. • Translation errors – the data has been migrated to newer data platforms numerous times, and each time there have been errors and data losses. • Wastes incorrectly entered – mostly due to complex names that were difficult to spell, or typos in quantities or units of measure. Since Summer 2019, scanned images of manifest hardcopies may be viewed at the DEEP Document Online Search Portal: https://filings.deep.ct.gov/DEEPDocumentSearchPortal/
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Fake Email Address Generator Market Analysis The global market for Fake Email Address Generators is expected to reach a value of XXX million by 2033, growing at a CAGR of XX% from 2025 to 2033. Key drivers of this growth include the increasing demand for privacy and anonymity online, the growing prevalence of spam and phishing attacks, and the proliferation of digital marketing campaigns. Additionally, the adoption of cloud-based solutions and the emergence of new technologies, such as artificial intelligence (AI), are further fueling market expansion. Key trends in the Fake Email Address Generator market include the growing popularity of enterprise-grade solutions, the emergence of disposable email services, and the increasing integration with other online tools. Restraints to market growth include concerns over security and data protection, as well as the availability of free or low-cost alternatives. The market is dominated by a few major players, including Burnermail, TrashMail, and Guerrilla Mail, but a growing number of smaller vendors are emerging with innovative solutions. Geographically, North America and Europe are the largest markets, followed by the Asia Pacific region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recommendations for identifier lifecycle management.
Note: Please use the following view to be able to see the entire Dataset Description: https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Portal-Manifest-Metadata/x2z6-swxe Dataset Description Outline (5 sections) • INTRODUCTION • WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF? • WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA? • HOW DOES THE PORTAL MANIFEST METADATA DATASET RELATE TO THE OTHER TWO MANIFEST DATASETS PUBLISHED IN CT OPEN DATA? • IMPORTANT NOTES INTRODUCTION • All of DEEP’s paper hazardous waste manifest records were recently scanned and “indexed”. • Indexing consisted of 6 basic pieces of information or “metadata” taken from each manifest about the Generator and stored with the scanned image. The metadata enables searches by: Site Town, Site Address, Generator Name, Generator ID Number, Manifest ID Number and Date of Shipment. • All of the metadata and scanned images are available electronically via DEEP’s Document Online Search Portal at: https://filings.deep.ct.gov/DEEPDocumentSearchPortal/ • Therefore, it is no longer necessary to visit the DEEP Records Center in Hartford for manifest records or information. • This CT Data dataset “Hazardous Waste Portal Manifest Metadata” (or “Portal Manifest Metadata”) was copied from the DEEP Document Online Search Portal, and includes only the metadata – no images. WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF? The Portal Manifest Metadata is a good search tool to use along with the Portal. Searching the Portal Manifest Metadata can provide the following advantages over searching the Portal: • faster searches, especially for “large searches” - those with a large number of search returns unlimited number of search returns (Portal is limited to 500); • larger display of search returns; • search returns can be sorted and filtered online in CT Data; and • search returns and the entire dataset can be downloaded from CT Data and used offline (e.g. download to Excel format) • metadata from searches can be copied from CT Data and pasted into the Portal search fields to quickly find single scanned images. The main advantages of the Portal are: • it provides access to scanned images of manifest documents (CT Data does not); and • images can be downloaded one or multiple at a time. WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA? All hazardous waste manifest records received and maintained by the DEEP Manifest Program; including: • manifests originating from a Connecticut Generator or sent to a Connecticut Destination Facility including manifests accompanying an exported shipment • manifests with RCRA hazardous waste listed on them (such manifests may also have non-RCRA hazardous waste listed) • manifests from a Generator with a Connecticut Generator ID number (permanent or temporary number) • manifests with sufficient quantities of RCRA hazardous waste listed for DEEP to consider the Generator to be a Small or Large Quantity Generator • manifests with PCBs listed on them from 2016 to 6-29-2018. • Note: manifests sent to a CT Destination Facility were indexed by the Connecticut or Out of State Generator. Searches by CT Designated Facility are not possible unless such facility is the Generator for the purposes of manifesting. All other manifests were considered “non-hazardous” manifests and not scanned. They were discarded after 2 years in accord with DEEP records retention schedule. Non-hazardous manifests include: • Manifests with only non-RCRA hazardous waste listed • Manifests from generators that did not have a permanent or temporary Generator ID number • Sometimes non-hazardous manifests were considered “Hazar
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Digsilent simulation file is uploaded: IEEE 39 bus system
This dataset was created by Owen Tamuno Gilbert
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Market Overview and Growth Prospects The global domain name generator market is projected to witness steady growth, with a market size valued at XXX million in 2025. Driven by the surging adoption of online businesses, the increasing demand for unique and memorable domain names, and the rise of AI-powered domain name generation tools, the market is expected to expand at a CAGR of XX% during the forecast period from 2025 to 2033. Market Dynamics and Segmentation Key trends shaping the market include the proliferation of cloud-based services, the growing preference for enterprise-level solutions, and the emergence of AI and machine learning capabilities. The market is segmented by application into personal and enterprise use cases. Cloud-based solutions dominate the market, while on-premises deployments are gaining traction in large organizations with specific security and compliance requirements. Prominent companies operating in the domain name generator space include DomainsBot, Panabee, DomainWheel, and others. The market is largely concentrated in North America and Europe, but emerging economies in Asia Pacific and the Middle East & Africa are expected to contribute to future growth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Imports - Generators, Transformers & Accessories (Census) in the United States decreased to 3331.03 USD Million in February from 3351.49 USD Million in January of 2024. This dataset includes a chart with historical data for the United States Imports of Generators, Transformers & Accessories.
The Weather Generator Gridded Data consists of two products: [1] statistically perturbed gridded 100-year historic daily weather data including precipitation [in mm], and detrended maximum and minimum temperature in degrees Celsius, and [2] stochastically generated and statistically perturbed gridded 1000-year daily weather data including precipitation [in mm], maximum temperature [in degrees Celsius], and minimum temperature in degrees Celsius. The base climate of this dataset is a combination of historically observed gridded data including Livneh Unsplit 1915-2018 (Pierce et. al. 2021), Livneh 1915-2015 (Livneh et. al. 2013) and PRISM 2016-2018 (PRISM Climate Group, 2014). Daily precipitation is from Livneh Unsplit 1915-2018, daily temperature is from Livneh 2013 spanning 1915-2015 and was extended to 2018 with daily 4km PRISM that was rescaled to the Livneh grid resolution (1/16 deg). The Livneh temperature was bias corrected by month to the corresponding monthly PRISM climate over the same period. Baseline temperature was then detrended by month over the entire time series based on the average monthly temperature from 1991-2020. Statistical perturbations and stochastic generation of the time series were performed by the Weather Generator (Najibi et al. 2024a and Najibi et al. 2024b). The repository consists of 30 climate perturbation scenarios that range from -25 to +25 % change in mean precipitation, and from 0 to +5 degrees Celsius change in mean temperature. Changes in thermodynamics represent scaling of precipitation during extreme events by a scaling factor per degree Celsius increase in mean temperature and consists primarily of 7%/degree-Celsius with 14%/degree-Celsius as sensitivity perturbations. Further insight for thermodynamic scaling can be found in full report linked below or in Najibi et al. 2024a and Najibi et al. 2024b. The data presented here was created by the Weather Generator which was developed by Dr. Scott Steinschneider and Dr. Nasser Najibi (Cornell University). If a separate weather generator product is desired apart from this gridded climate dataset, the weather generator code can be adopted to suit the specific needs of the user. The weather generator code and supporting information can be found here: https://github.com/nassernajibi/WGEN-v2.0/tree/main. The full report for the model and performance can be found here: https://water.ca.gov/-/media/DWR-Website/Web-Pages/Programs/All-Programs/Climate-Change-Program/Resources-for-Water-Managers/Files/WGENCalifornia_Final_Report_final_20230808.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.
NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
Datasets
The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).
The datasets contain both benign and malicious traffic. All collected datasets are balanced.
The version of NetFlow used to build the datasets is 5.
Dataset
Aim
Samples
Benign-malicious
traffic ratio
D1
Training
400,003
50%
D2
Test
57,239
50%
Infrastructure and implementation
Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.
DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)
Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).
The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.
The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.
Parameters
Description
'--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
Enumerate users, password hashes, privileges, roles, databases, tables and columns
--level=5
Increase the probability of a false positive identification
--risk=3
Increase the probability of extracting data
--random-agent
Select the User-Agent randomly
--batch
Never ask for user input, use the default behavior
--answers="follow=Y"
Predefined answers to yes
Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).
The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.
However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.
To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the datasets collected and used in the research project:
O. Mikkonen, A. Wright, E. Moliner and V. Välimäki, “Neural Modeling Of Magnetic Tape Recorders,” in Proceedings of the International Conference on Digital Audio Effects (DAFx), Copenhagen, Denmark, 4-7 September 2023.
A pre-print of the article is available in arXiv. The code is open-source and published in GitHub. The accompanying web page can be found from here.
Overview
The data is divided into various subsets, stored in separate directories. The data contains both toy data generated using a software emulation of a reel-to-reel tape recorder, as well as real data collected from a physical device. The various subsets can be used for training, validating, and testing neural network behavior, similarly as was done in the research article.
Toy and Real Data
The toy data was generated using CHOWTape, a physically modeled reel-to-reel tape recorder. The subsets generated with the software emulation are denoted with the string CHOWTAPE
. Two variants of the toy data was produced: in the first variant, the fluctuating delay produced by the simulated tape transport was disabled, and in the second kind, the delay was enabled. The latter variants are denoted with the string WOWFLUTTER
.
The real data is collected using an Akai 4000D reel-to-reel tape recorder. The corresponding subsets are denoted with the string AKAI
. Two tape speeds were used during the recording: 3 3/4 IPS (inches per second) and 7 1/2 IPS, with the corresponding subsets denoted with '3.75IPS' and '7.5IPS' respectively. On top of this, two different brands of magnetic tape were used for capturing the datasets with different tape speeds: Maxell and Scotch, with the corresponding subsets denoted with 'MAXELL' and 'SCOTCH' respectively.
Directories
For training the models, a fraction of the inputs from SignalTrain LA2A Dataset was used. The training, validation, and testing can be replicated using the subsets:
ReelToReel_Dataset_MiniPulse100_AKAI_*/ (hysteretic nonlinearity, real data)
ReelToReel_Dataset_Mini192kHzPulse100_AKAI_*/ (delay generator, real data)
Silence_AKAI_*/ (noise generator, real data)
ReelToReel_Dataset_MiniPulse100_CHOWTAPE*/ (hysteretic nonlinearity, toy data)
ReelToReel_Dataset_MiniPulse100_CHOWTAPE_F[0.6]_SL[60]_TRAJECTORIES/ (delay generator, toy data)
For visualizing the model behavior, the following subsets can be used:
LogSweepsContinuousPulse100_*/ (nonlinear magnitude responses)
SinesFadedShortContinuousPulse100*/ (magnetic hysteresis curves)
Directory structure
Each directory/subset is made of up of further subdirectories that are most often used to separate the training, validation and test sets from each other. Thus, a typical directory will look like the following:
[DIRECTORY_NAME]
├── Train
│ ├── input_x_.wav
│ ...
│ ├── target_x_.wav
│ ...
└── Val
│ ├── input_y_.wav
│ ...
│ ├── target_y_.wav
│ ...
├── Test
│ ├── input_z_.wav
│ ...
│ ├── target_z_.wav
│ ...
While not all of the audio is used for training purposes, all of the subsets share part of this structure to make the corresponding datasets compatible with the dataloader that was used.
The input and target files denoted with the same number x
, e.g. input_100_.wav
and target_100_.wav
make up a pair, such that the target audio is the input audio processed with one of the used effects. In some of the cases, a third file named trajectory_x_.npy
can be found, which consists of the corresponding pre-extracted delay trajectory in the NumPy
binary file format.
Revision History
Version 1.1.0
Added high-resolution (192kHz) dataset for configuration (SCOTCH, 3.75 IPS)
Version 1.0.0
Initial publish
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore historical ownership and registration records by performing a reverse Whois lookup for the email address data-generator.com@contactprivacy.com..