16 datasets found

E
A Replication Dataset for Fundamental Frequency Estimation
live.european-language-grid.eu
data.niaid.nih.gov
+1more
json
Updated Oct 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). A Replication Dataset for Fundamental Frequency Estimation [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7808
Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods.© 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating this quantity have been proposed over the years, and a number of speech and noise corpora have been developed for evaluating their performance. The present dataset contains estimated fundamental frequency tracks of 25 algorithms, six speech corpora, two noise corpora, at nine signal-to-noise ratios between -20 and 20 dB SNR, as well as an additional evaluation of synthetic harmonic tone complexes in white noise.The dataset also contains pre-calculated performance measures both novel and traditional, in reference to each speech corpus’ ground truth, the algorithms’ own clean-speech estimate, and our own consensus truth. It can thus serve as the basis for a comparison study, or to replicate existing studies from a larger dataset, or as a reference for developing new fundamental frequency estimation algorithms. All source code and data is available to download, and entirely reproducible, albeit requiring about one year of processor-time.Included Code and Data
ground truth data.zip is a JBOF dataset of fundamental frequency estimates and ground truths of all speech files in the following corpora:
CMU-ARCTIC (consensus truth) [1]FDA (corpus truth and consensus truth) [2]KEELE (corpus truth and consensus truth) [3]MOCHA-TIMIT (consensus truth) [4]PTDB-TUG (corpus truth and consensus truth) [5]TIMIT (consensus truth) [6]
noisy speech data.zip is a JBOF datasets of fundamental frequency estimates of speech files mixed with noise from the following corpora:NOISEX [7]QUT-NOISE [8]
synthetic speech data.zip is a JBOF dataset of fundamental frequency estimates of synthetic harmonic tone complexes in white noise.noisy_speech.pkl and synthetic_speech.pkl are pickled Pandas dataframes of performance metrics derived from the above data for the following list of fundamental frequency estimation algorithms:AUTOC [9]AMDF [10]BANA [11]CEP [12]CREPE [13]DIO [14]DNN [15]KALDI [16]MAPSMBSC [17]NLS [18]PEFAC [19]PRAAT [20]RAPT [21]SACC [22]SAFE [23]SHR [24]SIFT [25]SRH [26]STRAIGHT [27]SWIPE [28]YAAPT [29]YIN [30]
noisy speech evaluation.py and synthetic speech evaluation.py are Python programs to calculate the above Pandas dataframes from the above JBOF datasets. They calculate the following performance measures:Gross Pitch Error (GPE), the percentage of pitches where the estimated pitch deviates from the true pitch by more than 20%.Fine Pitch Error (FPE), the mean error of grossly correct estimates.High/Low Octave Pitch Error (OPE), the percentage pitches that are GPEs and happens to be at an integer multiple of the true pitch.Gross Remaining Error (GRE), the percentage of pitches that are GPEs but not OPEs.Fine Remaining Bias (FRB), the median error of GREs.True Positive Rate (TPR), the percentage of true positive voicing estimates.False Positive Rate (FPR), the percentage of false positive voicing estimates.False Negative Rate (FNR), the percentage of false negative voicing estimates.F₁, the harmonic mean of precision and recall of the voicing decision.
Pipfile is a pipenv-compatible pipfile for installing all prerequisites necessary for running the above Python programs.
The Python programs take about an hour to compute on a fast 2019 computer, and require at least 32 Gb of memory.References:
John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.Man Mohan Sondhi. New methods of pitch extraction. Audio and Electroacoustics, IEEE Transactions on, 16(2):262—266, 1968.Myron J. Ross, Harry L. Shaffer, Asaf Cohen, Richard Freudberg, and Harold J. Manley. Average magnitude difference function pitch extractor. Acoustics, Speech and Signal Processing, IEEE Transactions on, 22(5):353—362, 1974.Na Yang, He Ba, Weiyang Cai, Ilker Demirkol, and Wendi Heinzelman. BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):1833–1848, December 2014.Michael Noll. Cepstrum Pitch Determination. The Journal of the Acoustical Society of America, 41(2):293–309, 1967.Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello. CREPE: A Convolutional Representation for Pitch Estimation. arXiv:1802.06182 [cs, eess, stat], February 2018. arXiv: 1802.06182.Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications. IEICE Transactions on Information and Systems, E99.D(7):1877–1884, 2016.Kun Han and DeLiang Wang. Neural Network Based Pitch Tracking in Very Noisy Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):2158–2168, Decem- ber 2014.Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal, and Sanjeev Khudanpur. A pitch extraction algorithm tuned for automatic speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 2494–2498. IEEE, 2014.Lee Ngee Tan and Abeer Alwan. Multi-band summary correlogram-based pitch detection for noisy speech. Speech Communication, 55(7-8):841–856, September 2013.Jesper Kjær Nielsen, Tobias Lindstrøm Jensen, Jesper Rindom Jensen, Mads Græsbøll Christensen, and Søren Holdt Jensen. Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient. Signal Processing, 135:188–197, June 2017.Sira Gonzalez and Mike Brookes. PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2):518—530, February 2014.Paul Boersma. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, volume 17, page 97—110. Amsterdam, 1993.David Talkin. A robust algorithm for pitch tracking (RAPT). Speech coding and synthesis, 495:518, 1995.Byung Suk Lee and Daniel PW Ellis. Noise robust pitch tracking by subband autocorrelation classification. In Interspeech, pages 707–710, 2012.Wei Chu and Abeer Alwan. SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech. In INTERSPEECH, pages 2590–2593, 2010.Xuejing Sun. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, volume 1, page I—333. IEEE, 2002.Markel. The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20(5):367—377, December 1972.Thomas Drugman and Abeer Alwan. Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. In Interspeech, page 1973—1976, 2011.Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuichi Nisimura, Toshio Irino, and Hideki Banno. TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In Acous- tics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3933–3936. IEEE, 2008.Arturo Camacho. SWIPE: A sawtooth waveform inspired pitch estimator for speech and music. PhD thesis, University of Florida, 2007.Kavita Kasi and Stephen A. Zahorian. Yet Another Algorithm for Pitch Tracking. In IEEE International Conference on Acoustics Speech and Signal Processing, pages I–361–I–364, Orlando, FL, USA, May 2002. IEEE.Alain de Cheveigné and Hideki Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917, 2002.
h
rag
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VIGNESH M, rag [Dataset]. https://huggingface.co/datasets/vicky3241/rag
Explore at:
Authors
VIGNESH M
Description
import pandas as pd

Example dataset with new columns

data = [ { "title": "Pandas Library", "about": "Pandas is a Python library for data manipulation and analysis.", "procedure": "Install Pandas via pip, load data into DataFrames, clean and analyze data using built-in functions.", "content": """ Pandas provides data structures like Series and DataFrame for handling structured data. It supports indexing, slicing, aggregation, joining, and filtering… See the full description on the dataset page: https://huggingface.co/datasets/vicky3241/rag.
w
CLEAN PANDA (Name) - Reverse Whois Lookup
whoisdatacenter.com
csv
Updated Apr 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AllHeart Web Inc (2023). CLEAN PANDA (Name) - Reverse Whois Lookup [Dataset]. https://whoisdatacenter.com/name/CLEAN-PANDA/
Explore at:
csvAvailable download formats
Dataset updated
Apr 28, 2023
Dataset authored and provided by
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Mar 15, 1985 - Oct 2, 2025
Description
Investigate historical ownership changes and registration details by initiating a reverse Whois lookup for the name CLEAN PANDA.
C
Clean Keyboard Dust Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Clean Keyboard Dust Report [Dataset]. https://www.archivemarketresearch.com/reports/clean-keyboard-dust-528903
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global clean keyboard dust market is experiencing robust growth, driven by increasing digitalization and the consequent rise in computer usage across both professional and personal settings. The market, estimated at $500 million in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 8% from 2025 to 2033. This growth is fueled by several factors, including the growing awareness of hygiene and the need for maintaining clean and functional keyboards, especially in healthcare and other sensitive environments. Furthermore, the increasing prevalence of allergies and respiratory issues linked to dust accumulation on keyboards is boosting demand for effective cleaning solutions. The market's segmentation likely includes various product types like compressed air dusters, cleaning wipes, gels, and specialized keyboard cleaning kits. The competitive landscape is relatively fragmented, with a mix of established players like Lenovo and emerging brands vying for market share. Innovation in cleaning technologies and the development of eco-friendly solutions will shape the future trajectory of this market. The market's restraints include the relatively low cost of simple cleaning methods such as using a brush or compressed air, leading some consumers to avoid specialized products. However, increasing awareness of the long-term benefits of proper keyboard hygiene, along with the development of more efficient and convenient cleaning solutions, are likely to mitigate these restraints. The regional distribution of the market is likely to reflect global computer usage patterns, with North America, Europe, and Asia-Pacific representing significant market segments. Continued technological advancements, especially in areas like ergonomic design and self-cleaning keyboards, could lead to significant market disruption and expansion in the coming years. The market's evolution will depend on factors such as consumer preferences, technological innovation, and regulatory changes related to environmental sustainability.
Customer Sale Dataset for Data Visualization
kaggle.com
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atul (2025). Customer Sale Dataset for Data Visualization [Dataset]. https://www.kaggle.com/datasets/atulkgoyl/customer-sale-dataset-for-visualization
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atul
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.

Unlike most public datasets, this one includes a diverse mix of column types:

📅 Date columns (for time series and trend plots) 🔢 Numerical columns (for histograms, boxplots, scatter plots) 🏷️ Categorical columns (for bar charts, group analysis)

Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.

Feel free to:

Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations 🛠️ No missing values, no data cleaning needed — just download and start exploring!

Hope you find this helpful. Looking forward to hearing from you all.
f
Table_5_Evaluating a potential model to analyze the function of the gut...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
+1more
Updated Dec 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huang, He; Jiang, Mingfeng; Zhong, Jincheng; Zheng, Lijun; Zeng, Benhua; Zhang, Liang; Fan, Xueyang; Hou, Rong; Zhang, Zhihe; Wang, Hairui; Xie, Junjin; Xia, Shan; Schmitz-Esser, Stephan; Zhang, Wenping (2022). Table_5_Evaluating a potential model to analyze the function of the gut microbiota of the giant panda.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000284622
Explore at:
Dataset updated
Dec 20, 2022
Authors
Huang, He; Jiang, Mingfeng; Zhong, Jincheng; Zheng, Lijun; Zeng, Benhua; Zhang, Liang; Fan, Xueyang; Hou, Rong; Zhang, Zhihe; Wang, Hairui; Xie, Junjin; Xia, Shan; Schmitz-Esser, Stephan; Zhang, Wenping
Description
To contribute to the conservation of endangered animals, the utilization of model systems is critical to analyze the function of their gut microbiota. In this study, the results of a fecal microbial transplantation (FMT) experiment with germ-free (GF) mice receiving giant panda or horse fecal microbiota showed a clear clustering by donor microbial communities in GF mice, which was consistent with the results of blood metabolites from these mice. At the genus level, FMT re-established approximately 9% of the giant panda donor microbiota in GF mice compared to about 32% for the horse donor microbiota. In line with this, the difference between the panda donor microbiota and panda-mice microbiota on whole-community level was significantly larger than that between the horse donor microbiota and the horse-mice microbiota. These results were consistent with source tracking analysis that found a significantly higher retention rate of the horse donor microbiota (30.9%) than the giant panda donor microbiota (4.0%) in GF mice where the microbiota remained stable after FMT. Further analyzes indicated that the possible reason for the low retention rate of the panda donor microbiota in GF mice was a low relative abundance of Clostridiaceae in the panda donor microbiota. Our results indicate that the donor microbiota has a large effect on GF mice microbiota after FMT.
Google Ads sales dataset
kaggle.com
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NayakGanesh007 (2025). Google Ads sales dataset [Dataset]. https://www.kaggle.com/datasets/nayakganesh007/google-ads-sales-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
NayakGanesh007
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Google Ads Sales Dataset for Data Analytics Campaigns (Raw & Uncleaned) 📝 Dataset Overview This dataset contains raw, uncleaned advertising data from a simulated Google Ads campaign promoting data analytics courses and services. It closely mimics what real digital marketers and analysts would encounter when working with exported campaign data — including typos, formatting issues, missing values, and inconsistencies.

It is ideal for practicing:

Data cleaning

Exploratory Data Analysis (EDA)

Marketing analytics

Campaign performance insights

Dashboard creation using tools like Excel, Python, or Power BI

📁 Columns in the Dataset Column Name ----- -Description Ad_ID --------Unique ID of the ad campaign Campaign_Name ------Name of the campaign (with typos and variations) Clicks --Number of clicks received Impressions --Number of ad impressions Cost --Total cost of the ad (in ₹ or $ format with missing values) Leads ---Number of leads generated Conversions ----Number of actual conversions (signups, sales, etc.) Conversion Rate ---Calculated conversion rate (Conversions ÷ Clicks) Sale_Amount ---Revenue generated from the conversions Ad_Date------ Date of the ad activity (in inconsistent formats like YYYY/MM/DD, DD-MM-YY) Location ------------City where the ad was served (includes spelling/case variations) Device------------ Device type (Mobile, Desktop, Tablet with mixed casing) Keyword ----------Keyword that triggered the ad (with typos)

⚠️ Data Quality Issues (Intentional) This dataset was intentionally left raw and uncleaned to reflect real-world messiness, such as:

Inconsistent date formats

Spelling errors (e.g., "analitics", "anaytics")

Duplicate rows

Mixed units and symbols in cost/revenue columns

Missing values

Irregular casing in categorical fields (e.g., "mobile", "Mobile", "MOBILE")

🎯 Use Cases Data cleaning exercises in Python (Pandas), R, Excel

Data preprocessing for machine learning

Campaign performance analysis

Conversion optimization tracking

Building dashboards in Power BI, Tableau, or Looker

💡 Sample Analysis Ideas Track campaign cost vs. return (ROI)

Analyze click-through rates (CTR) by device or location

Clean and standardize campaign names and keywords

Investigate keyword performance vs. conversions

🔖 Tags Digital Marketing · Google Ads · Marketing Analytics · Data Cleaning · Pandas Practice · Business Analytics · CRM Data
CpG Signature Profiling and Heatmap Visualization of SARS-CoV Genomes:...
figshare.com
txt
Updated Apr 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tahir Bhatti (2025). CpG Signature Profiling and Heatmap Visualization of SARS-CoV Genomes: Tracing the Genomic Divergence From SARS-CoV (2003) to SARS-CoV-2 (2019) [Dataset]. http://doi.org/10.6084/m9.figshare.28736501.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28736501.v1
Dataset updated
Apr 5, 2025
Dataset provided by
figshare
Authors
Tahir Bhatti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveThe primary objective of this study was to analyze CpG dinucleotide dynamics in coronaviruses by comparing Wuhan-Hu-1 with its closest and most distant relatives. Heatmaps were generated to visualize CpG counts and O/E ratios across intergenic regions, providing a clear depiction of conserved and divergent CpG patterns.Methods1. Data CollectionSource : The dataset includes CpG counts and O/E ratios for various coronaviruses, extracted from publicly available genomic sequences.Format : Data was compiled into a CSV file containing columns for intergenic regions, CpG counts, and O/E ratios for each virus.2. PreprocessingData Cleaning :Missing values (NaN), infinite values (inf, -inf), and blank entries were handled using Python's pandas library.Missing values were replaced with column means, and infinite values were capped at a large finite value (1e9).Reshaping :The data was reshaped into matrices for CpG counts and O/E ratios using meltpandas[] and pivot[] functions.3. Distance CalculationEuclidean Distance :Pairwise Euclidean distances were calculated between Wuhan-Hu-1 and other viruses using the scipy.spatial.distance.euclidean function.Distances were computed separately for CpG counts and O/E ratios, and the total distance was derived as the sum of both metrics.4. Identification of Closest and Distant RelativesThe virus with the smallest total distance was identified as the closest relative .The virus with the largest total distance was identified as the most distant relative .5. Heatmap GenerationTools :Heatmaps were generated using Python's seaborn library (sns.heatmap) and matplotlib for visualization.Parameters :Heatmaps were annotated with numerical values for clarity.A color gradient (coolwarm) was used to represent varying CpG counts and O/E ratios.Titles and axis labels were added to describe the comparison between Wuhan-Hu-1 and its relatives.ResultsClosest Relative :The closest relative to Wuhan-Hu-1 was identified based on the smallest Euclidean distance.Heatmaps for CpG counts and O/E ratios show high similarity in specific intergenic regions.Most Distant Relative :The most distant relative was identified based on the largest Euclidean distance.Heatmaps reveal significant differences in CpG dynamics compared to Wuhan-Hu-1 .Tools and LibrariesThe following tools and libraries were used in this analysis:Programming Language :Python 3.13Libraries :pandas: For data manipulation and cleaning.numpy: For numerical operations and handling missing/infinite values.scipy.spatial.distance: For calculating Euclidean distances.seaborn: For generating heatmaps.matplotlib: For additional visualization enhancements.File Formats :Input: CSV files containing CpG counts and O/E ratios.Output: PNG images of heatmaps.Files IncludedCSV File :Contains the raw data of CpG counts and O/E ratios for all viruses.Heatmap Images :Heatmaps for CpG counts and O/E ratios comparing Wuhan-Hu-1 with its closest and most distant relatives.Python Script :Full Python code used for data processing, distance calculation, and heatmap generation.Usage NotesResearchers can use this dataset to further explore the evolutionary dynamics of CpG dinucleotides in coronaviruses.The Python script can be adapted to analyze other viral genomes or datasets.Heatmaps provide a visual summary of CpG dynamics, aiding in hypothesis generation and experimental design.AcknowledgmentsSpecial thanks to the open-source community for developing tools like pandas, numpy, seaborn, and matplotlib.This work was conducted as part of an independent research project in molecular biology and bioinformatics.LicenseThis dataset is shared under the CC BY 4.0 License , allowing others to share and adapt the material as long as proper attribution is given.DOI: 10.6084/m9.figshare.28736501
Pre-Processed Power Grid Frequency Time Series
zenodo.org
bin, zip
Updated Jul 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut (2021). Pre-Processed Power Grid Frequency Time Series [Dataset]. http://doi.org/10.5281/zenodo.3744121
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3744121
Dataset updated
Jul 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut
Description
Overview
This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:

Continental Europe

Great Britain

Nordic

This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.

Data sources
We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).

Continental Europe [2]: We downloaded the data from the German TSO TransnetBW GmbH, which retains the Copyright on the data, but allows to re-publish it upon request [3].

Great Britain [4]: The download was supported by National Grid ESO Open Data, which belongs to the British TSO National Grid. They publish the frequency recordings under the NGESO Open License [5].

Nordic [6]: We obtained the data from the Finish TSO Fingrid, which provides the data under the open license CC-BY 4.0 [7].

Content of the repository

A) Scripts

In the "Download_scripts" folder you will find three scripts to automatically download frequency data from the TSO's websites.

In "convert_data_format.py" we save the data with corrected timestamp formats. Missing data is marked as NaN (processing step (1) in the supplementary material of [1]).

In "clean_corrupted_data.py" we load the converted data and identify corrupted recordings. We mark them as NaN and clean some of the resulting data holes (processing step (2) in the supplementary material of [1]).

The python scripts run with Python 3.7 and with the packages found in "requirements.txt".

B) Data_converted and Data_cleansed
The folder "Data_converted" contains the output of "convert_data_format.py" and "Data_cleansed" contains the output of "clean_corrupted_data.py".

File type: The files are zipped csv-files, where each file comprises one year.

Data format: The files contain two columns. The first one represents the time stamps in the format Year-Month-Day Hour-Minute-Second, which is given as naive local time. The second column contains the frequency values in Hz.

NaN representation: We mark corrupted and missing data as "NaN" in the csv-files.

Use cases
We point out that this repository can be used in two different was:

Use pre-processed data: You can directly use the converted or the cleansed data. Note however that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much. If your application cannot deal with NaNs, you could build upon the following commands to select the longest interval of valid data from the cleansed data:

from helper_functions import * import pandas as pd cleansed_data = pd.read_csv('/Path_to_cleansed_data/data.zip', index_col=0, header=None, squeeze=True, parse_dates=[0]) valid_bounds, valid_sizes = true_intervals(~cleansed_data.isnull()) start,end= valid_bounds[ np.argmax(valid_sizes) ] data_without_nan = cleansed_data.iloc[start:end]

Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "Data_converted".

License
We release the code in the folder "Scripts" under the MIT license [8]. In the case of Nationalgrid and Fingrid, we further release the pre-processed data in the folder "Data_converted" and "Data_cleansed" under the CC-BY 4.0 license [7]. TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.
d
Plant diversity in giant panda habitat
datadryad.org
search.dataone.org
zip
Updated Sep 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Ting (2021). Plant diversity in giant panda habitat [Dataset]. http://doi.org/10.5061/dryad.rjdfn2z6b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rjdfn2z6b
Dataset updated
Sep 18, 2021
Dataset provided by
Dryad
Authors
Li Ting
Time period covered
Jun 16, 2020
Description
In 2017, 107 random sampling plots in montane forests were collected from north to south, spanning the entire Sichuan Giant Panda habi- tat. The sampling strategy and field site information are shown in Li et al. (2019). The elevation within the sampling plots varied significantly (from ca. 2,000 to 3,600 m a.s.l.) (Li et al., 2019). The main vegetation types in those plots were coniferous and broad-leaved mixed forests, and evergreen and deciduous broad-leaved mixed forests. Using ques- tionnaires, we surveyed 72 local people from Minshan, Xiaoxiangling, and Qionglai in the Sichuan Giant Panda habitat in 2017. Those local villagers mainly participated in the local Giant Panda habitat conser- vation. The survey information included if there was any interfer- ence in the sampling plots. In addition, we observed the plant species composition and environment in the montane forests to choos...
m
Nanjing Panda Electronics Co Ltd - Ebitda
macro-rankings.com
csv, excel
Updated Sep 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
macro-rankings (2025). Nanjing Panda Electronics Co Ltd - Ebitda [Dataset]. https://www.macro-rankings.com/markets/stocks/600775-shg/income-statement/ebitda
Explore at:
excel, csvAvailable download formats
Dataset updated
Sep 1, 2025
Dataset authored and provided by
macro-rankings
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
china
Description
Ebitda Time Series for Nanjing Panda Electronics Co Ltd. Nanjing Panda Electronics Company Limited, together with its subsidiaries, engages in the smart transportation and safe city, industrial internet and intelligent manufacturing, and green and service-oriented electronic manufacturing businesses in China. The company offers automatic transmission equipment and industrial robots; railway transit auto fare collection (AFC) and AFC clearing center system equipment, building intellectualization products, and system integration; electronic manufacturing services (EMS); mobile, digital, and network communication systems and products; digital products; and property management, warehousing, and catering services. It also provides automatic instruments and battery changing systems, recyclable bag making machine and automatic filling systems, welding machines, CF automatic handling systems, plant system platforms, laminating machines, and TFT-LCD clean workshop equipment and systems; and smart cards, rail transit communication systems, video surveillance integrated platforms, rail transit communication clock systems, logistics services, and LTE/WLAN multi-service bearer schemes, as well as radio dispatchers, dispatching server software, network management systems, and voice short message and multimedia vehicular stations. In addition, the company offers digital processing and acquisition modules, microwave frequency sources, clock synchronization devices, mobile data access gateways, wireless mesh, and dual-mode base stations. Further, it provides research systems, machine and substrate assemblies, automatic mounting, conventional injections, spray painting, molds and dies, information management, quality assurance systems, and special polymer materials; and single mode and push receivers, digital TV set top boxes, digital TV monitor and vehicular equipment, satellite flat antennas, and descramblers. Nanjing Panda Electronics Company Limited was founded in 1936 and is headquartered in Nanjing, the People's Republic of China.
Clean Meta Kaggle
kaggle.com
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoni Kremer (2023). Clean Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/yonikremer/clean-meta-kaggle
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yoni Kremer
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Cleaned Meta-Kaggle Dataset

The Original Dataset - Meta-Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

https://i.imgur.com/2Egeb8R.png" alt="" title="a title">

This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

August 2023 update

In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here

We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

The Problems with the Original Dataset

The original dataset is 32 CSV files, with 268 colums and 7GB of compressed data. Having so many tables and columns makes it hard to understand the data.

The data is not normalized, so when you join tables you get a lot of errors.

Some values refer to non-existing values in other tables. For example, the UserId column in the ForumMessages table has values that do not exist in the Users table.

There are missing values.

There are duplicate values.

There are values that are not valid. For example, Ids that are not positive integers.

The date and time columns are not in the right format.

Some columns only have the same value for all rows, so they are not useful.

The boolean columns have string values True or False.

Incorrect values for the Total columns. For example, the DatasetCount is not the total number of datasets with the Tag according to the DatasetTags table.

Users upvote their own messages.

The Solution

To handle so many tables and columns I use a relational database. I use MySQL, but you can use any relational database.

The steps to create the database are:

Creating the database tables with the right data types and constraints. I do that by running the db_abd_create_tables.sql script.

Downloading the CSV files from Kaggle using the Kaggle API.

Cleaning the data using pandas. I do that by running the clean_data.py script. The script does the following steps for each table:

Drops the columns that are not needed.

Converts each column to the right data type.

Replaces foreign keys that do not exist with NULL.

Replaces some of the missing values with default values.

Removes rows where there are missing values in the primary key/not null columns.

Removes duplicate rows.

Loading the data into the database using the LOAD DATA INFILE command.

Checks that the number of rows in the database tables is the same as the number of rows in the CSV files.

Adds foreign key constraints to the database tables. I do that by running the add_foreign_keys.sql script.

Update the Total columns in the database tables. I do that by running the update_totals.sql script.

Backup the database.
Netflix Data Analysis
kaggle.com
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankul Sharma (2024). Netflix Data Analysis [Dataset]. https://www.kaggle.com/datasets/ankulsharma150/netflix-data-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ankul Sharma
Description
Introduction

This datasets about Netflix Movies & TV Shows. Datasets have 12 columns with some null values. To analysis of dataset are used Pandas, plotly.express and Datetime libraries. Analysis process I divided into several parts for step wise analysis and to find out trending questions on social media for Bollywood actors and actress.

Data Manipulation

Missing Data

There are many representations of missing data. They are Null values, missing values. I used some of methods used in data analysis process to clean missing values.

Data Munging

String Method

There I used some string method on column such as 'cast', 'Lested_in' to extract data

Datetime data type

Converting an object type into datatype objects with the to_datetime function then we have a datatime object, can extract various part of data such as year, month and day

EDA

Here, I find out several eye catching question. the following questions are like as- - Show the all Movies & TV Shows released by month - Count the all types of unique rating & which rating are with most number - Salman, Shah Rukh and Akshay Kumar all movie - Find out the Movies & Series have Maximum time length - Year on Year show added on Netflix by its type - Akshay Kumar all comedies movies, Shah Rukh movies with Kajol and Salman-Akshay Movies - Who Director has made the most TV Shows - Actors and Actress who have given most Number of Movies - Find out which types of genre has most movies and TV Shows
h
long_dense_structured_table
huggingface.co
Updated May 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nanonets (2025). long_dense_structured_table [Dataset]. https://huggingface.co/datasets/nanonets/long_dense_structured_table
Explore at:
Dataset updated
May 9, 2025
Dataset authored and provided by
Nanonets
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset is generated syhthetically to create tables with following characteristics:

Empty cell percentage in following range 0,30 There is clear seperator between rows and columns (Structured). 15 <= num rows <= 30, 7 <= num columns <= 15 (Long)

Load the dataset

import io import pandas as pd from PIL import Image

def bytes_to_image(self, image_bytes: bytes): return Image.open(io.BytesIO(image_bytes))

def parse_annotations(self, annotations: str) -> pd.DataFrame:… See the full description on the dataset page: https://huggingface.co/datasets/nanonets/long_dense_structured_table.
Party strength in each US state
kaggle.com
Updated Jan 13, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeneBurin (2017). Party strength in each US state [Dataset]. https://www.kaggle.com/datasets/kiwiphrases/partystrengthbystate
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 13, 2017
Dataset provided by
Kaggle
Authors
GeneBurin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Data on party strength in each US state

The repository contains data on party strength for each state as shown on each state's corresponding party strength Wikipedia page (for example, here is Virginia )

Each state has a table of a detailed summary of the state of its governing and representing bodies on Wikipedia but there is no data set that collates these entries. I scraped each state's Wikipedia table and collated the entries into a single dataset. The data are stored in the state_party_strength.csv and state_party_strength_cleaned.csv. The code that generated the file can be found in corresponding Python notebooks.

Data contents:

The data contain information from 1980 on each state's: 1. governor and party 2. state house and senate composition 3. state representative composition in congress 4. electoral votes

Clean Version

Data in the clean version has been cleaned and processed substantially. Namely: - all columns now contain homogenous data within the column - names and Wiki-citations have been removed - only the party counts and party identification have been left The notebook that created this file is here

Uncleaned Data Version

The data contained herein have not been altered from their Wikipedia tables except in two instances: - Forced column names to be in accord across states - Any needed data modifications (ie concatenated string columns) to retain information when combining columns

To use the data:

Please note that the right encoding for the dataset is "ISO-8859-1", not 'utf-8' though in future versions I will try to fix that to make it more accessible.

This means that you will likely have to perform further data wrangling prior to doing any substantive analysis. The notebook that has been used to create this data file is located here

Raw scraped data

The raw scraped data can be found in the pickle. This file contains a Python dictionary where each key is a US state name and each element is the raw scraped table in Pandas DataFrame format.

Hope it proves as useful to you in analyzing/using political patterns at the state level in the US for political and policy research.
Flipkart OnlineOrders
kaggle.com
zip
Updated Jun 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabya (2020). Flipkart OnlineOrders [Dataset]. https://www.kaggle.com/sabya40/filpkart-onlineorders
Explore at:
zip(212323 bytes)Available download formats
Dataset updated
Jun 22, 2020
Authors
Sabya
Description
Context

This dataset contains 6 months of Customer online orders. The data is simple but messy and unorganized. This for beginner and Intermediate level who want to improve there skills in Pandas, matplotlib, and seaborn.

Content

Dataset context columns like: crawl_timestamp, product_name, product_category_tree, retail_price, discounted_price, brand.

The main focus is to clean the dataset and make it organized using pandas.

Acknowledgements

I wouldn't be here without the help of data.world. Thank You.

Inspiration

I have some questions for this Dataset: 1. What was the best month for sales? How much was earned that month? 2. What time should we display advertisements to maximize the likelihood of purchases? 3. Which category sold most in that six month period? 4. Top 10 products sold most in that six month period?
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2023). A Replication Dataset for Fundamental Frequency Estimation [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7808

A Replication Dataset for Fundamental Frequency Estimation

Explore at:

jsonAvailable download formats

Dataset updated

Oct 19, 2023

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods.© 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating this quantity have been proposed over the years, and a number of speech and noise corpora have been developed for evaluating their performance. The present dataset contains estimated fundamental frequency tracks of 25 algorithms, six speech corpora, two noise corpora, at nine signal-to-noise ratios between -20 and 20 dB SNR, as well as an additional evaluation of synthetic harmonic tone complexes in white noise.The dataset also contains pre-calculated performance measures both novel and traditional, in reference to each speech corpus’ ground truth, the algorithms’ own clean-speech estimate, and our own consensus truth. It can thus serve as the basis for a comparison study, or to replicate existing studies from a larger dataset, or as a reference for developing new fundamental frequency estimation algorithms. All source code and data is available to download, and entirely reproducible, albeit requiring about one year of processor-time.Included Code and Data

ground truth data.zip is a JBOF dataset of fundamental frequency estimates and ground truths of all speech files in the following corpora:

CMU-ARCTIC (consensus truth) [1]FDA (corpus truth and consensus truth) [2]KEELE (corpus truth and consensus truth) [3]MOCHA-TIMIT (consensus truth) [4]PTDB-TUG (corpus truth and consensus truth) [5]TIMIT (consensus truth) [6]

noisy speech data.zip is a JBOF datasets of fundamental frequency estimates of speech files mixed with noise from the following corpora:NOISEX [7]QUT-NOISE [8]

synthetic speech data.zip is a JBOF dataset of fundamental frequency estimates of synthetic harmonic tone complexes in white noise.noisy_speech.pkl and synthetic_speech.pkl are pickled Pandas dataframes of performance metrics derived from the above data for the following list of fundamental frequency estimation algorithms:AUTOC [9]AMDF [10]BANA [11]CEP [12]CREPE [13]DIO [14]DNN [15]KALDI [16]MAPSMBSC [17]NLS [18]PEFAC [19]PRAAT [20]RAPT [21]SACC [22]SAFE [23]SHR [24]SIFT [25]SRH [26]STRAIGHT [27]SWIPE [28]YAAPT [29]YIN [30]

noisy speech evaluation.py and synthetic speech evaluation.py are Python programs to calculate the above Pandas dataframes from the above JBOF datasets. They calculate the following performance measures:Gross Pitch Error (GPE), the percentage of pitches where the estimated pitch deviates from the true pitch by more than 20%.Fine Pitch Error (FPE), the mean error of grossly correct estimates.High/Low Octave Pitch Error (OPE), the percentage pitches that are GPEs and happens to be at an integer multiple of the true pitch.Gross Remaining Error (GRE), the percentage of pitches that are GPEs but not OPEs.Fine Remaining Bias (FRB), the median error of GREs.True Positive Rate (TPR), the percentage of true positive voicing estimates.False Positive Rate (FPR), the percentage of false positive voicing estimates.False Negative Rate (FNR), the percentage of false negative voicing estimates.F₁, the harmonic mean of precision and recall of the voicing decision.

Pipfile is a pipenv-compatible pipfile for installing all prerequisites necessary for running the above Python programs.

The Python programs take about an hour to compute on a fast 2019 computer, and require at least 32 Gb of memory.References:

John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.Man Mohan Sondhi. New methods of pitch extraction. Audio and Electroacoustics, IEEE Transactions on, 16(2):262—266, 1968.Myron J. Ross, Harry L. Shaffer, Asaf Cohen, Richard Freudberg, and Harold J. Manley. Average magnitude difference function pitch extractor. Acoustics, Speech and Signal Processing, IEEE Transactions on, 22(5):353—362, 1974.Na Yang, He Ba, Weiyang Cai, Ilker Demirkol, and Wendi Heinzelman. BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):1833–1848, December 2014.Michael Noll. Cepstrum Pitch Determination. The Journal of the Acoustical Society of America, 41(2):293–309, 1967.Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello. CREPE: A Convolutional Representation for Pitch Estimation. arXiv:1802.06182 [cs, eess, stat], February 2018. arXiv: 1802.06182.Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications. IEICE Transactions on Information and Systems, E99.D(7):1877–1884, 2016.Kun Han and DeLiang Wang. Neural Network Based Pitch Tracking in Very Noisy Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):2158–2168, Decem- ber 2014.Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal, and Sanjeev Khudanpur. A pitch extraction algorithm tuned for automatic speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 2494–2498. IEEE, 2014.Lee Ngee Tan and Abeer Alwan. Multi-band summary correlogram-based pitch detection for noisy speech. Speech Communication, 55(7-8):841–856, September 2013.Jesper Kjær Nielsen, Tobias Lindstrøm Jensen, Jesper Rindom Jensen, Mads Græsbøll Christensen, and Søren Holdt Jensen. Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient. Signal Processing, 135:188–197, June 2017.Sira Gonzalez and Mike Brookes. PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2):518—530, February 2014.Paul Boersma. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, volume 17, page 97—110. Amsterdam, 1993.David Talkin. A robust algorithm for pitch tracking (RAPT). Speech coding and synthesis, 495:518, 1995.Byung Suk Lee and Daniel PW Ellis. Noise robust pitch tracking by subband autocorrelation classification. In Interspeech, pages 707–710, 2012.Wei Chu and Abeer Alwan. SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech. In INTERSPEECH, pages 2590–2593, 2010.Xuejing Sun. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, volume 1, page I—333. IEEE, 2002.Markel. The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20(5):367—377, December 1972.Thomas Drugman and Abeer Alwan. Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. In Interspeech, page 1973—1976, 2011.Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuichi Nisimura, Toshio Irino, and Hideki Banno. TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In Acous- tics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3933–3936. IEEE, 2008.Arturo Camacho. SWIPE: A sawtooth waveform inspired pitch estimator for speech and music. PhD thesis, University of Florida, 2007.Kavita Kasi and Stephen A. Zahorian. Yet Another Algorithm for Pitch Tracking. In IEEE International Conference on Acoustics Speech and Signal Processing, pages I–361–I–364, Orlando, FL, USA, May 2002. IEEE.Alain de Cheveigné and Hideki Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917, 2002.

Clear search

Close search

Google apps

Main menu

A Replication Dataset for Fundamental Frequency Estimation

rag

CLEAN PANDA (Name) - Reverse Whois Lookup

Clean Keyboard Dust Report

Customer Sale Dataset for Data Visualization

Table_5_Evaluating a potential model to analyze the function of the gut...

Google Ads sales dataset

CpG Signature Profiling and Heatmap Visualization of SARS-CoV Genomes:...

Pre-Processed Power Grid Frequency Time Series

Plant diversity in giant panda habitat

Nanjing Panda Electronics Co Ltd - Ebitda

Clean Meta Kaggle

Cleaned Meta-Kaggle Dataset

The Original Dataset - Meta-Kaggle

August 2023 update

The Problems with the Original Dataset

The Solution

Netflix Data Analysis

Introduction

Data Manipulation

Missing Data

Data Munging

String Method

Datetime data type

EDA

long_dense_structured_table

Party strength in each US state

Data on party strength in each US state

Data contents:

Clean Version

Uncleaned Data Version

To use the data:

Raw scraped data

Flipkart OnlineOrders

Context

Content

Acknowledgements

Inspiration

A Replication Dataset for Fundamental Frequency Estimation