Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of three folders:Systems: Three public event logs: Sepsis Cases, RTFMS, and BPIC 2012.Logs: Clean and noisy logs derived from the base systems.From each base log, we created samples of seven sizes (1000, 2000, 4000, 10000, 20000, 40000, 100000 traces) using sampling with replacement, yielding 21 clean logs.Noise was then added using $\snip$ across seven intensity levels (0.1%, 0.2%, 0.4%, 1.0%, 2.0%, 4.0%, 10.0%) and five noise types (absence, insertion, ordering, substitution, mixed). Percentages refer to the number of trace-level injections.Each configuration was repeated five times, producing 3,675 noisy logs and a total of 3,696 logs.Models: Contains discovered models for all clean logs and a random subset of noisy logs (incomplete), using the Alpha, Heuristics, and Inductive miners.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/38777/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38777/terms
The 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement Files are an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022], and implemented in https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code). The NMF was produced using the official "production settings," the final set of algorithmic parameters and privacy-loss budget allocations that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Characteristics File. The NMF consists of the full set of privacy-protected statistical queries (counts of individuals or housing units with particular combinations of characteristics) of confidential 2010 Census data relating to the redistricting data portion of the 2010 Demonstration Data Products Suite - Redistricting and Demographic and Housing Characteristics File - Production Settings (2023-04-03). These statistical queries, called "noisy measurements" were produced under the zero-Concentrated Differential Privacy framework (Bun, M. and Steinke, T [2016]; see also Dwork C. and Roth, A. [2014]) implemented via the discrete Gaussian mechanism (Cannone C., et al., [2023]), which added positive or negative integer-valued noise to each of the resulting counts. The noisy measurements are an intermediate stage of the TDA prior to the post-processing the TDA then performs to ensure internal and hierarchical consistency within the resulting tables. The Census Bureau has released these 2010 Census demonstration data to enable data users to evaluate the expected impact of disclosure avoidance variability on 2020 Census data. The 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement Files (2023-04-03) have been cleared for public dissemination by the Census Bureau Disclosure Review Board (CBDRB-FY22-DSEP-004). The data include zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism. These are estimated counts of individuals and housing units included in the 2010 Census Edited File (CEF), which includes confidential data initially collected in the 2010 Census of Population and Housing. The noisy measurements included in this file were subsequently post-processed by the TopDown Algorithm (TDA) to produce the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) (https://www2.census.gov/programs-surveys/decennial/2020/program-management/data-product- planning/2010-demonstration-data-products/04 Demonstration_Data_Products_Suite/2023-04-03/). As these 2010 Census demonstration data are intended to support study of the design and expected impacts of the 2020 Disclosure Avoidance System, the 2010 CEF records were pre-processed before application of the zCDP framework. This pre-processing converted the 2010 CEF records into the input-file format, response codes, and tabulation categories used for the 2020 Census, which differ in substantive ways from the format, response codes, and tabulation categories originally used for the 2010 Census. The NMF provides estimates of counts of persons in the CEF by various characteristics and combinations of characteristics, including their reported race and ethnicity, whether they were of voting age, whether they resided in a housing unit or one of 7 group quarters types, and their census block of residence, after the addition of discrete Gaussian noise (with the scale parameter determined by the privacy-loss budget allocation for that particular query under zCDP). Noisy measurements of the counts of occupied and vacant housing units by census block are also included. Lastly, data on constraints--information into which no noise was infused by the Disclosure Avoidance System (DAS) and used by the TDA to post-process the noisy measurements into the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) --are provided. These data are available for download (i.e. not restricted access). Due to their size, they must be downloaded through the link on this
Facebook
Twitterhttps://opendata.vancouver.ca/pages/licence/https://opendata.vancouver.ca/pages/licence/
This dataset contains the boundaries of areas where noise levels are limited by City bylaws. Data currencyThe extract for this dataset is updated weekly. There may be no change in data content from one week to the next because there is no change in source data. Priorities and resources will also determine how fast a change in reality is reflected in the database. Data accuracyThese boundaries follow street and/or lane centrelines so their placement in the street right of way is approximate. Websites for further informationManage noise
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
DNA-encoded library (DEL) is a powerful ligand discovery technology that has been widely adopted in the pharmaceutical industry. DEL selections are typically performed with a purified protein target immobilized on a matrix or in solution phase. Recently, DELs have also been used to interrogate the targets in the complex biological environment, such as membrane proteins on live cells. However, due to the complex landscape of the cell surface, the selection inevitably involves significant nonspecific interactions, and the selection data are much noisier than the ones with purified proteins, making reliable hit identification highly challenging. Researchers have developed several approaches to denoise DEL datasets, but it remains unclear whether they are suitable for cell-based DEL selections. Here, we report the proof-of-principle of a new machine-learning (ML)-based approach to process cell-based DEL selection datasets by using a Maximum A Posteriori (MAP) estimation loss function, a probabilistic framework that can account for and quantify uncertainties of noisy data. We applied the approach to a DEL selection dataset, where a library of 7,721,415 compounds was selected against a purified carbonic anhydrase 2 (CA-2) and a cell line expressing the membrane protein carbonic anhydrase 12 (CA-12). The extended-connectivity fingerprint (ECFP)-based regression model using the MAP loss function was able to identify true binders and also reliable structure–activity relationship (SAR) from the noisy cell-based selection datasets. In addition, the regularized enrichment metric (known as MAP enrichment) could also be calculated directly without involving the specific machine-learning model, effectively suppressing low-confidence outliers and enhancing the signal-to-noise ratio. Future applications of this method will focus on de novo ligand discovery from cell-based DEL selections.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Urban noise can interfere with avian communication through masking, but birds can reduce this interference by altering their vocalizations. Although several experimental studies indicate that birds can rapidly change their vocalizations in response to sudden increases in ambient noise, none have investigated whether this is a learned response that depends on previous exposure. Black-capped chickadees (Poecile atricapillus) change the frequency of their songs in response to both fluctuating traffic noise and experimental noise. We investigated whether these responses to fluctuating noise depend on familiarity with noise. We confirmed that males in noisy areas sang higher-frequency songs than those in quiet areas, but found that only males in already-noisy territories shifted songs upwards in immediate response to experimental noise. Unexpectedly, males in more quiet territories shifted songs downwards in response to experimental noise. These results suggest that chickadees may require prior experience with fluctuating noise to adjust vocalizations in such a way as to minimize masking. Thus, learning to cope may be an important part of adjusting to acoustic life in the city.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustering is a widely used unsupervised learning technique that groups data into homogeneous clusters. However, when dealing with real-world data that contain categorical values, existing algorithms can be computationally costly in high dimensions and can struggle with noisy data that has missing values. Furthermore, except for one algorithm, no others provide theoretical guarantees of clustering accuracy. In this article, we propose a general categorical data encoding method and a computationally efficient spectral-based algorithm to cluster high-dimensional noisy categorical data (nominal or ordinal). Under a statistical model for data on m attributes from n subjects in r clusters with missing probability ϵ, we show that our algorithm exactly recovers the true clusters with high probability when mn(1−ϵ)≥CMr2 log 3M, with M=max(n,m) and a fixed constant C. In addition, we show that mn(1−ϵ)2≥rδ/2 with 0
Facebook
Twitterhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
This database has been collected and packaged under the auspices of the IST-EU STREP project HIWIRE (Human Input that Works In Real Environments). The database was designed to be used as a tool for development and test of speech processing and recognition techniques dealing with robust non-native speech recognition.The database contains 8,099 English utterances pronounced by non-native speakers (31 French, 20 Greek, 20 Italian, and 10 Spanish speakers). The collected utterances correspond to human input in a command and control aeronautics application. The data was recorded in studio with a close-talking microphone and real noise recorded in an airplane cockpit was artificially added to the data. The signals are provided in clean (studio recordings with close talking microphone), low, mid and high noise conditions. The three noise levels correspond approximately to signal-to-noise ratios of 10dB, 5dB and -5 dB respectively.Clean audio data has been recorded in different office rooms using a close-talking microphone for lowest ambient acoustic effects (Plantronics USB-45). The used sampling frequency is 16 kHz and data is stored in Windows PCM WAV 16 bits mono format.Recordings correspond to prompts extracted from an aeronautic command and control application. A total of 8,099 utterances have been recorded corresponding to 81 speakers pronouncing 100 utterances each. The speaker distribution is as follows:
| Country | # Speakers | # Utterances |
| France | 31 (38.3%) | 3100 |
| Greece | 20 (24.7%) | 2000 |
| Italy | 20 (24.7%) | 2000 |
| Spain | 10 (12.3%) | 999 |
| Total | 81 | 8099 |
Facebook
TwitterFSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
Data curators
Eduardo Fonseca and Mercedes Collado
Contact
You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.
Citation
If you use this dataset or part of it, please cite the following ICASSP 2019 paper:
Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra, “Learning Sound Event Classifiers from Web Audio with Noisy Labels”, arXiv preprint arXiv:1901.01189, 2019
You can also consider citing our ISMIR 2017 paper that describes the Freesound Annotator, which was used to gather the manual annotations included in FSDnoisy18k:
Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, “Freesound Datasets: A Platform for the Creation of Open Audio Datasets”, In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017
FSDnoisy18k description
What follows is a summary of the most basic aspects of FSDnoisy18k. For a complete description of FSDnoisy18k, make sure to check:
FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
The source of audio content is Freesound—a sound sharing site created an maintained by the Music Technology Group hosting over 400,000 clips uploaded by its community of users, who additionally provide some basic metadata (e.g., tags, and title). The 20 classes of FSDnoisy18k are drawn from the AudioSet Ontology and are selected based on data availability as well as on their suitability to allow the study of label noise. The 20 classes are: "Acoustic guitar", "Bass guitar", "Clapping", "Coin (dropping)", "Crash cymbal", "Dishes, pots, and pans", "Engine", "Fart", "Fire", "Fireworks", "Glass", "Hi-hat", "Piano", "Rain", "Slam", "Squeak", "Tearing", "Walk, footsteps", "Wind", and "Writing". FSDnoisy18k was created with the Freesound Annotator, which is a platform for the collaborative creation of open audio datasets.
We defined a clean portion of the dataset consisting of correct and complete labels. The remaining portion is referred to as the noisy portion. Each clip in the dataset has a single ground truth label (singly-labeled data).
The clean portion of the data consists of audio clips whose labels are rated as present in the clip and predominant (almost all with full inter-annotator agreement), meaning that the label is correct and, in most cases, there is no additional acoustic material other than the labeled class. A few clips may contain some additional sound events, but they occur in the background and do not belong to any of the 20 target classes. This is more common for some classes that rarely occur alone, e.g., “Fire”, “Glass”, “Wind” or “Walk, footsteps”.
The noisy portion of the data consists of audio clips that received no human validation. In this case, they are categorized on the basis of the user-provided tags in Freesound. Hence, the noisy portion features a certain amount of label noise.
Code
We've released the code for our ICASSP 2019 paper at https://github.com/edufonseca/icassp19. The framework comprises all the basic stages: feature extraction, training, inference and evaluation. After loading the FSDnoisy18k dataset, log-mel energies are computed and a CNN baseline is trained and evaluated. The code also allows to test four noise-robust loss functions. Please check our paper for more details.
Label noise characteristics
FSDnoisy18k features real label noise that is representative of audio data retrieved from the web, particularly from Freesound. The analysis of a per-class, random, 15% of the noisy portion of FSDnoisy18k revealed that roughly 40% of the analyzed labels are correct and complete, whereas 60% of the labels show some type of label noise. Please check the FSDnoisy18k companion site for a detailed characterization of the label noise in the dataset, including a taxonomy of label noise for singly-labeled data as well as a per-class description of the label noise.
FSDnoisy18k basic characteristics
The dataset most relevant characteristics are as follows:
License
FSDnoisy18k has licenses at two different levels, as explained next. All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. In particular, all Freesound clips included in FSDnoisy18k are released under either CC-BY or CC0. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of audio clips and their corresponding license in the LICENSE-INDIVIDUAL-CLIPS file downloaded with the dataset.
In addition, FSDnoisy18k as a whole is the result of a curation process and it has an additional license. FSDnoisy18k is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the dataset.
Files
FSDnoisy18k can be downloaded as a series of zip files with the following directory structure:
root │ └───FSDnoisy18k.audio_train/ Audio clips in the train set │ └───FSDnoisy18k.audio_test/ Audio clips in the test set │ └───FSDnoisy18k.meta/ Files for evaluation setup │ │ │ └───train.csv Data split and ground truth for the train set │ │ │ └───test.csv Ground truth for the test set │ └───FSDnoisy18k.doc/ │ └───README.md The dataset description file that you are reading │ └───LICENSE-DATASET License of the FSDnoisy18k dataset as an entity │ └───LICENSE-INDIVIDUAL-CLIPS.csv Licenses of the individual audio clips from Freesound
Each row (i.e. audio clip) of the train.csv file contains the following
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MindAffect is a startup working to make Brain Computer Interfaces (BCI) with a mission to “open up new dimensions of interaction” by providing developing technologies which allow users to directly control computers with their brains. So far we have achieved this mission by;
Modern BCIs (including our own) rely heavily on machine learning techniques to process the noisy data gathered from EEG sensors and cope with the high degree of variability in responses over different individuals and environments. MindAffect firmly believes that the key to enabling the new BCI applications we all want is a combination of more sophisticated machine learning algorithms and larger and more diverse datasets on which to train these algorithms
As a first step to enabling machine learning experts to improve the BCI experience, we are publishing our internal testing datasets and the analysis codes used to develop and refine our own algorithms. We hope these datasets will help people in developing new and improved algorithms for this type of data.
Initially, we have committed about 60 datasets from our of our development team. We are committed to adding more datasets to this as we gather them to try and build as large as possible database of cVEP EEG data for algorithm development. Further, as more users gather their own data with our open-source bci, we hope they will be willing to donate their own datasets to rapidly gather a large and diverse dataset for further algorithm enhancement.
Specifically, this dataset was gathered by one of our developers in Nijmegen in the Netherlands using our on-line bci system exactly as shown in this video
This dataset was gathered and donated by MindAffect B.V.
What can you do with this data? 1. Get better performance in less time? How about using a deep-learning approach? 2. Generalize your algorithm to transfer between data-sets so the user does not have to re-calibrate for each new data-set? 3. Generalize over multiple users (as we add new user data?) 4. Generalize to different BCI types (as we add P300 and SSVEP datasets..)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data and code needed to replicate the results in Noise, Cognitive Function, and Worker Productivity.Paper Abstract:Cognitive science research suggests the noisy workplaces common in low and middle income countries can impair workers’ cognitive functions. However, whether this translates into lower earnings for workers depends on the importance of these functions for productivity and whether workers understand these effects. I use two randomized experiments in Nairobi, Kenya to answer these questions. First, I randomize exposure to engine noise during a textile training course at a government training facility. An increase of 7 dB reduces productivity by approximately 3%. In order to study what mechanism drives this effect, I then randomize engine noise during tests of cognitive function and an effort task. The same noise change impairs cognitive function but not effort task performance. Finally, in both experiments, I examine whether individuals appreciate the impact of noise on their performance by eliciting participants’ willingness to pay for quiet working conditions while randomly varying whether they are compensated based on their performance. Individuals’ willingness to pay does not depend on the wage structure; suggesting that they are not aware that quiet working conditions would increase their performance pay. Thus, workers may fail to mitigate earnings losses by sorting into quieter jobs where they are more productive.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
We here provide the code related to our recent paper "Gaussian Processes with Noisy Regression Inputs for Dynamical Systems".
To run the code, execute the 'offline_phase.mat' or 'offline_phase_all.mat' files.
Facebook
TwitterDefra has published strategic noise map data that give a snapshot of the estimated noise from major road and rail sources across England in 2017. The data was developed as part of implementing the Environmental Noise Directive.
This publication explains which noise sources were included in 2017 strategic noise mapping process. It provides summary maps for major road and rail sources and provides links to the detailed Geographic Information Systems (GIS) noise datasets.
This data will help transport authorities to better identify and prioritise relevant local action on noise. It will also be useful for planners, academics and others working to assess noise and its impacts.
https://environment.data.gov.uk/dataset/5836745c-4e11-4767-94a5-2656f82e01a3">Laeq 16h: indicates the annual average noise levels for the 16-hour period between 0700 – 2300
https://environment.data.gov.uk/dataset/e8e78e12-9297-450b-b875-e0523cb3c9ea">Lden: indicates a 24 hour annual average noise level with separate weightings for the evening and night periods
https://environment.data.gov.uk/dataset/f6c0e3b6-3186-4d0a-b0e7-ca32bfb6573f">Lnight: indicates night time annual average noise level results in dB, where night is defined as 2300 - 0700
https://environment.data.gov.uk/dataset/b9c6bf30-a02d-4378-94a0-2982de1bef86">Laeq 16h: indicates the annual average noise levels for the 16-hour period between 0700 – 2300
https://environment.data.gov.uk/dataset/fd1c6327-ad77-42ae-a761-7c6a0866523d">Lden: indicates a 24 hour annual average noise level with separate weightings for the evening and night periods.
https://environment.data.gov.uk/dataset/cc48e728-602a-4e8a-9221-49f661ab58f8">Lnight: indicates night time annual average noise level results in dB, where night is defined as 2300 - 0700
We’ve published data which shows the estimated number of people affected by noise from road traffic, railway and industrial sources.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A series of our previous studies explored the use of an abstract visual representation of the amplitude envelope cues from target sentences to benefit speech perception in complex listening environments. The purpose of this study was to expand this auditory-visual speech perception to the tactile domain. Twenty adults participated in speech recognition measurements in four different sensory modalities (AO, auditory-only; AV, auditory-visual; AT, auditory-tactile; AVT, auditory-visual-tactile). The target sentences were fixed at 65 dB sound pressure level and embedded within a simultaneous speech-shaped noise masker of varying degrees of signal-to-noise ratios (−7, −5, −3, −1, and 1 dB SNR). The amplitudes of both abstract visual and vibrotactile stimuli were temporally synchronized with the target speech envelope for comparison. Average results showed that adding temporally-synchronized multimodal cues to the auditory signal did provide significant improvements in word recognition performance across all three multimodal stimulus conditions (AV, AT, and AVT), especially at the lower SNR levels of −7, −5, and −3 dB for both male (8–20% improvement) and female (5–25% improvement) talkers. The greatest improvement in word recognition performance (15–19% improvement for males and 14–25% improvement for females) was observed when both visual and tactile cues were integrated (AVT). Another interesting finding in this study is that temporally synchronized abstract visual and vibrotactile stimuli additively stack in their influence on speech recognition performance. Our findings suggest that a multisensory integration process in speech perception requires salient temporal cues to enhance speech recognition ability in noisy environments.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The aim is to predict the size of the pellets (pellet feed) at the end of the production process in a steel industry operating in the global market.
The prediction will be carried out using historical data from sensors that capture information from each stage of the production process, statistical models and artificial intelligence algorithms, which will seek to identify trends and patterns in order to estimate the size of the pellets at the end of the process.
The dataset contains 10 columns and 9997 rows where each row shows a stage of the production process with its respective information.
This data can be extremely useful for process engineers, data scientists and other professionals involved in the steel industry.
For process engineers, detailed analysis of variables can provide valuable insights into operational efficiency. They can identify bottlenecks in the process, assess the impact of different operating conditions and implement improvements that result in more efficient and higher quality production.
For data scientists, the dataset offers a rich source of information for building predictive models. Using machine learning techniques, they can develop algorithms that predict pellet size based on input variables, allowing for real-time adjustments and optimization of the production process. In addition, statistical analysis can reveal hidden patterns and trends that may not be evident at first glance.
Perform univariate and multivariate analysis. Visualize data distributions for variables such as Umidade, Bentonita, and Taxa_Alimentacao_Disco.
Create plots to study relationships between features. Use heatmaps to analyze correlations between numerical features.
Build machine learning models to predict Distribuicao_Tamanho_Pelotas using features. Test different regression models (machine learning or deep learning) for better insights.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dear Researcher,
Thank you for using this code and datasets. I explain how GEPFCM code related to my paper "Generalized entropy based possibilistic fuzzy C-Means for clustering noisy data and its convergence proof" published in Neurocomputing, works. The main datasets mentioned in the paper together with GEPFCM code are included. If there is any question, feel free to contact me at: bas_salaraskari@yahoo.com s_askari@aut.ac.ir
Regards,
S. Askari
Guidelines for GEPFCM algorithm: 1. Open the file GEPFCM Code using MATLAB. This is relaxed form of the algorithm to handle noisy data. 2. Enter or paste name of the dataset you wish to cluster in line 15 after "load". It loads the dataset in the workplace. 3. For details of the parameters cFCM, cPCM, c1E, c2E, eta, and m, please read the paper. 4. Lines 17 and 18: "N" is number of data vectors and "D" is number of independent variables. 5. Line 26: "C" is number of clusters. To input your own desired value for number of clusters, "uncomment" this line and then enter the value. Since the datasets provided here, include "C", this line is "comment". 6. Line 28: "ruopt" is optimal value of ρ discussed in equation 13 of the paper. To enter your own value of ρ, "uncomment" this line. Since the datasets provided here, include "ruopt ", this line is "comment". 7. If line 50 is "comment", covariance norm (Mahalanobis distance) is use and if it is "uncomment", identity norm (Euclidean distance) is used. 8. When you run the algorithm, first FCM is applied to the data. Cluster centers calculated by FCM initialize PFCM. Then PFCM is applied to the data and cluster centers computed by PFCM initialize GEPFCM. Finally, GEPFCM is applied to the data. 9. For two-dimensional plot, "uncomment" lines 419-421 and "comment" lines 423-425. For three-dimensional plot, "comment" lines 419-421 and "uncomment" lines 423-425. 10. To run the algorithm, press Ctrl Enter on your keyboard. 11. For your own dataset, please arrange the data as the datasets described in the MS word file "Read Me".
Facebook
TwitterIn our research work, we have accumulated a dataset of two thousand sound files from different resources and extracted the necessary feature that could be further utilized in the deep learning problem of emergency sound classification. In addition, we have shared the link to our dataset for both WAV format files cropped at a specific time window and at a fixed frequency and CSV file of extracted features. Our dataset contains the ambulance sounds and other road noises data which is in the form of audio files.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Policy applications Our study reveals the ways in which wildlife can alter their signals to contend with anthropogenic noise, and discusses the potential fitness and management consequences of these signal alterations. This information, combined with an identification of current research needs, will allow researchers and managers to better develop noise pollution risk assessment protocols and prioritize mitigation efforts to reduce anthropogenic noise.12-Mar-2021 Methods Literature Search Strategy and Inclusion Criteria
We searched the peer-reviewed scientific literature to synthesize information regarding noise pollution impacts on wildlife acoustic communication and to assess research gaps and biases. We restricted the search to terrestrial systems because general approaches to noise pollution risk assessment and recommendations for noise mitigation already exist for some coastal and marine systems (Southall et al. 2007). Perhaps more importantly, a vast body of research conducted to date on marine wildlife has yielded valuable knowledge such as species-specific spectral sensitivity, critical impact thresholds, and mitigation effectiveness which can be drawn upon to advance general theory and research and to develop further regulatory guidelines (Erbe et al. 2016). Finally, the physics of sound transmission differ between water and air, affecting both how sound is perceived by organisms and potential mitigation strategies (Würsig et al. 2000, Shannon et al. 2015). We used Web of Science (search conducted 4/5/2018) to search for studies investigating the impact of noise pollution on wildlife modulation of call frequency, rate, duration, and amplitude (see Table 2 for specific search terms). We assessed these multiple communication response variables even though they may be related because each response may have different ecological and/or evolutionary implications. An initial search produced 815 studies. After implementing all inclusion criteria (see below), our search resulted in 181 data points from 32 studies representing six continents (Table 3).
We used the “Analyze Results” feature in Web of Science to filter out irrelevant disciplines (e.g., Audiology, Speech Pathology, nexcluded = 347). After compiling remaining results into a database, we removed duplicate studies (nexcluded = 5) and studies determined to be topically irrelevant based on reading of all titles (nexcluded = 117). We excluded studies broadcasting white noise as a treatment, as we were interested in responses to spectral characteristics that more closely match environmental noise pollution (i.e., loud, low-frequency sounds, nexcluded = 3). However, we retained one study that explicitly manipulated the characteristics of white noise to approximate low-frequency traffic sounds. We excluded studies conducted in a laboratory setting, as we were only interested in responses of free-living wildlife to noises experienced in their natural habitat (nexcluded = 5). After detailed screening of article texts, we removed studies that did not assess effects of noise pollution on the above focal response variables and studies with analysis methods or reporting that precluded us from extracting a relevant effect size (nexcluded = 59).
For remaining studies, we extracted the location, focal taxa, response variable, sound source, and study design. We also extracted means, sample sizes, and standard deviations of response variables for studies assessing categorical predictor variables (e.g., call characteristics at quiet and noisy sites), or values of Pearson’s r for studies assessing continuous predictor variables (e.g., response characteristics over a gradient of decibel levels). In studies with multiple treatments, we used the two extreme ends of the environmental sound spectrum for analysis. For example, if a study tested call rates in “quiet”, “moderate”, and “loud” environments, we compared responses between “quiet” and “loud” sites. Sound sources included airplane (n = 2), construction (n = 6), energy development (n = 17), roadway (n = 52), urban (n = 101), and white noise (n = 3). We also distinguished study designs as event-based (n = 41) versus continuous (n = 140). Event-based study designs evaluated instantaneous signal flexibility in the presence of anthropogenic sound (e.g., a grasshopper calling more loudly during an airplane overflight compared to normal conditions, Fig. 2). Continuous study designs, on the other hand, evaluated differences in acoustic properties between populations in loud and quiet environments (e.g., communication characteristics of red-winged blackbirds, (Agelaius phoenicus), in rural versus urban environments; Fig. 2). Following our literature search, we incorporated a specific search for bat studies, as they were underrepresented in our initial search and we felt that they are good models for the study of anthropogenic sound impacts due to their reliance on acoustic information for both communication and foraging.
Analysis
To assess potential biases in the noise pollution literature, we assessed observed versus expected proportions of studies using Pearson’s χ2 tests. We conducted these tests to analyze numbers of studies for each response variable, sound source, focal taxa, continent, and study design; in each case we tested a null hypothesis that an equal proportion of studies have been conducted for each category (e.g., 50% of studies each for event-based and continuous study designs). To control the Type I error rate, we employed a Holm’s Sequential Bonferroni correction.
We conducted a meta-analysis to assess wildlife responses to noise pollution using the metafor package (Viechtbauer, 2010) in the R statistical environment (version 3.4.1, R Core Team 2017). We ran mixed-effects meta regression models with study design (event-based versus continuous), and taxa as fixed effects and study ID as a random effect.
When possible, we calculated Hedge’s g for each study that used a categorical noise treatment. When studies evaluated responses to noise along a continuous gradient, we calculated Hedge’s g using Pearson’s r. To evaluate overall effect of each response variable (Minimum Frequency, Maximum Frequency, Peak Frequency, Duration, Rate, and Amplitude), as well as the effect of study type and taxa, we evaluated overlap of 95% confidence intervals with zero. After conducting analyses, we constructed Q-Q plots to visually assess model fit.
Facebook
TwitterTo prohibit, eliminate and abate loud, unusual and unnecessary noise or noises which annoy, disturb, injure or endanger the comfort, repose, health, peace or safety of others within the City of Regina.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File (2023-06-30) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9 , and implemented in https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code). The NMF was produced using the official “production settings,” the final set of algorithmic parameters and privacy-loss budget allocations, that were used to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File and the 2020 Census Demographic and Housing Characteristics File. The NMF consists of the full set of privacy-protected statistical queries (counts of individuals or housing units with particular combinations of characteristics) of confidential 2010 Census data relating to the 2010 Demonstration Data Products Suite – Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File – Production Settings (2023-04-03). These statistical queries, called “noisy measurements” were produced under the zero-Concentrated Differential Privacy framework (Bun, M. and Steinke, T [2016] https://arxiv.org/abs/1605.02065; see also Dwork C. and Roth, A. [2014] https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf) implemented via the discrete Gaussian mechanism (Cannone C., et al., [2023] https://arxiv.org/abs/2004.00010), which added positive or negative integer-valued noise to each of the resulting counts. The noisy measurements are an intermediate stage of the TDA prior to the post-processing the TDA then performs to ensure internal and hierarchical consistency within the resulting tables. The Census Bureau has released these 2010 Census demonstration data to enable data users to evaluate the expected impact of disclosure avoidance variability on 2020 Census data. The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File (2023-04-03) has been cleared for public dissemination by the Census Bureau Disclosure Review Board (CBDRB-FY22-DSEP-004).
The 2010 Census Production Settings Demographic and Housing Characteristics Demonstration Noisy Measurement File includes zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism. These are estimated counts of individuals and housing units included in the 2010 Census Edited File (CEF), which includes confidential data initially collected in the 2010 Census of Population and Housing. The noisy measurements included in this file were subsequently post-processed by the TopDown Algorithm (TDA) to produce the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) (https://www2.census.gov/programs-surveys/decennial/2020/program-management/data-product-planning/2010-demonstration-data-products/04-Demonstration_Data_Products_Suite/2023-04-03/). As these 2010 Census demonstration data are intended to support study of the design and expected impacts of the 2020 Disclosure Avoidance System, the 2010 CEF records were pre-processed before application of the zCDP framework. This pre-processing converted the 2010 CEF records into the input-file format, response codes, and tabulation categories used for the 2020 Census, which differ in substantive ways from the format, response codes, and tabulation categories originally used for the 2010 Census.
The NMF provides estimates of counts of persons in the CEF by various characteristics and combinations of characteristics including their reported race and ethnicity, whether they were of voting age, whether they resided in a housing unit or one of 7 group quarters types, and their census block of residence after the addition of discrete Gaussian noise (with the scale parameter determined by the privacy-loss budget allocation for that particular query under zCDP). Noisy measurements of the counts of occupied and vacant housing units by census block are also included. Lastly, data on constraints—information into which no noise was infused by the Disclosure Avoidance System (DAS) and used by the TDA to post-process the noisy measurements into the 2010 Census Production Settings Privacy-Protected Microdata File - Redistricting (P.L. 94-171) and Demographic and Housing Characteristics File (2023-04-03) —are provided.
Facebook
TwitterWe study the robustness of different sweep protocols for accelerated adiabaticity following in the presence of static errors and of dissipative and dephasing phenomena. While in the noise-free case, counterdiabatic driving is, by definition, insensitive to the form of the original sweep function, this property may be lost when the quantum system is open. We indeed observe that, according to the decay and dephasing channels investigated here, the performance of the system becomes highly dependent on the sweep function. Our findings are relevant for the experimental implementation of robust shortcuts-to-adiabaticity techniques for the control of quantum systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of three folders:Systems: Three public event logs: Sepsis Cases, RTFMS, and BPIC 2012.Logs: Clean and noisy logs derived from the base systems.From each base log, we created samples of seven sizes (1000, 2000, 4000, 10000, 20000, 40000, 100000 traces) using sampling with replacement, yielding 21 clean logs.Noise was then added using $\snip$ across seven intensity levels (0.1%, 0.2%, 0.4%, 1.0%, 2.0%, 4.0%, 10.0%) and five noise types (absence, insertion, ordering, substitution, mixed). Percentages refer to the number of trace-level injections.Each configuration was repeated five times, producing 3,675 noisy logs and a total of 3,696 logs.Models: Contains discovered models for all clean logs and a random subset of noisy logs (incomplete), using the Alpha, Heuristics, and Inductive miners.