Facebook
TwitterThis dataset was created by Hemanth S
Facebook
TwitterThis dataset was created by Mohammed Ashraf Shaaban Shahata
Released under Other (specified in description)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Kiro Youssef
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Engr Yasir Hussain
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mark A Lavin
Released under CC0: Public Domain
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by amjad ali2018
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Nguyễn Thanh
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Archisman Karmakar
Released under MIT
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Simple normalization of the data provided by the CSSE daily reports on github. Preparations I made: - Normalizing the Timestamp (since they provide four different formats) - Pruning the column labels (Region/Country => Region_Country, etc) - Adding a country code column
Photo by CDC on Unsplash
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset and accompanying paper present a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. That is, a date written "31 May 2014" is spoken as "the thirty first of may twenty fourteen." We present a dataset of general text where the normalizations were generated using an existing text normalization component of a text-to-speech (TTS) system. This dataset was originally released open-source here and is reproduced on Kaggle for the community.
The data in this directory are the English language training, development and test data used in Sproat and Jaitly (2016).
The following divisions of data were used:
Training: output_1 through output_21 (corresponding to output-000[0-8]?-of-00100 in the original dataset)
Runtime eval: output_91 (corresponding to output-0009[0-4]-of-00100 in the original dataset)
Test data: output_96 (corresponding to output-0009[5-9]-of-00100 in the original dataset)
In practice for the results reported in the paper only the first 100,002 lines of output-00099-of-00100 were used (for English).
Lines with "
Facebook
TwitterThis dataset was created by Nguyễn Thanh
Facebook
TwitterThis dataset was created by Jay Oza
Facebook
TwitterThis dataset was created by Akalya Subramanian
Facebook
TwitterThis dataset was created by Nguyễn Thanh
Facebook
TwitterThis dataset was created by mcasshy
Facebook
TwitterThis dataset was created by Vinay sharma
Facebook
TwitterThis dataset was created by VMHieu02
Facebook
TwitterThis dataset was created by SeaLeopard
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Skylark4
Released under CC0: Public Domain
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Ngo Tri Si
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Hemanth S