Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Japanese Telephone Dialogues Dataset - 513 Hours
Dataset comprises 513 hours of high-quality telephone audio recordings in Japanese, featuring 800+ native speakers and achieving a 95% sentence accuracy rate. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data
Dataset characteristics:… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/japanese-speech-recognition-dataset.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the English-Japanese Bilingual Parallel Corpora Dataset for the Environment domain, a comprehensive collection of professionally translated bilingual text data. This dataset has been carefully curated to support the development of environment-specific language models, machine translation engines, and domain-aware NLP applications.
This dataset contains 633 hours of Japanese spontaneous dialogues, dialogues are based on given topics. Transcribed with text content, timestamp, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers(around 1000 native speakers), geographicly speaking, enhancing model performance in real and complex tasks like Automatic Speech Recognition (ASR), Text-to-Speech (TTS) systems, and NLP research. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A corpus database managed by the MedNLP Laboratory, Kyoto University, Japan. This corpus was compiled using data from 22 people aged 74 to 86 years (mean age: 78.32 years; standard deviation [SD]: 3.36) who agreed to provide data for research purposes. This corpus also includes under 74 data (total 30 data).
This dataset contains 234 hours of Japanese speech audio, collected from monologue based on given scripts, covering 210,000 formal or informal expressions. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(799 Japanese recorded in mixed condition, such as indoor, roadside, restaurant, etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in Japanese Investment in the United States: Superior Performance, Increasing Integration, PIIE Policy Brief 15-3. If you use the data, please cite as: Oldenski, Lindsay, and Theodore H. Moran. (2015). Japanese Investment in the United States: Superior Performance, Increasing Integration. PIIE Policy Brief 15-3. Peterson Institute for International Economics.
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
The Japanese Kids Speech database (Upper Grade) contains the total recordings of 232 Japanese Kids speakers (104 males and 128 females), from 9 to 13 years’ old (fourth, fifth and sixth graders in elementary school), recorded in quiet rooms using smartphones. This database may be combined with the Japanese Kids Speech database (Lower Grade) also available in the ELRA Catalogue under reference ELRA-S0411.
Number of speakers, utterances and duration, age are as follows :
Number of speakers 232 (104 male/128 female)
Number of utterances (average): 385 utterances per speaker
Total number of utterances: 89,454
Age: from 9 to 13 years' old
Total hours of data: 145.4
1018 sentences were used. Recordings were made through smartphones and audio data stored in .wav files as sequences of 16KHz Mono, 16 bits, Linear PCM.
Database:
・Audio data: WAV format, 16KHz, 16bit, mono (recorded with smartphone)
・Recording scripts: TSV format(tab-delimited), UTF-8 (without BOM)
・Transcription data: TSV format(tab-delimited), UTF-8 (without BOM)
・Size: 16.2GB
Number of speakers per age:
9 years' old: 56 (21 male, 35 female)
10 years' old: 71 (30 male, 41 female)
11 years' old: 65 (28 male, 37 female)
12 years' old: 38 (24 male, 14 female)
13 years' old: 2 (1 male, 1 female)
Structure of database:
├─ readme.txt
├─ Japanese Kids Speech Database.pdf Description document of the database
├─ Transcription.tsv Transcription
├─ scripts.tsv Script
│
└─ voices/ directory of audio data
├─ high/ directory of upper grade
└─(speaker_ID/) directory of speaker ID (six digits)
└─(audio_file) audio file (WAV format, 16KHz, 16bit, mono)
File naming conventions of audio files are as follows:
Field number | Contents | Description | Remarks
0 | Language ID | “JA” (fixed) | Japanese
1 | Speaker ID | Six digit | 5XXXXX
2 | Script ID | HXXXX | XXXX: four digits
3 | Age | Two digits
4 | Gender | M: male, F: female
Filed separation character is “_”.
For example, if the audio file name is “JA_500002_H0001_10_F.wav, this file has the following meaning:
JA: Language ID (Japanese)
500002: speaker ID
H0001: script ID
10: age (ten years old)
F: gender (female)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 60 verified Japanese language instructor businesses in United States with complete contact information, ratings, reviews, and location data.
The Corpus of Spontaneous Japanese: Its design and evaluation [30] is a dataset of spontaneous Japanese speech.
A panel data set for use in cross-cultural analyses of aging, health, and well-being between the U.S. and Japan. The questionnaires were designed to be partially comparable to many surveys of the aged, including Americans'' Changing Lives; 1984 National Health Interview Survey Supplement on Aging; Health and Retirement Study (HRS), and Well-Being Among the Aged: Personal Control and Self-Esteem (WBA). NSJE questionnaire topics include: * Demographics (age, sex, marital status, education, employment) * Social Integration (interpersonal contacts, social supports) * Health Limitations on daily life and activities * Health Conditions * Health Status (ratings of present health) * Level of physical activity * Subjective Well-Being and Mental Health Status (life satisfaction, morale), * Psychological Indicators (life events, locus of control, self-esteem) * Financial situation (financial status) * Memory (measures of cognitive functioning) * Interviewer observations (assessments of respondents) The NSJE was based on a national sample of 2,200 noninstitutionalized elderly aged 60+ in Japan. This cohort has been interviewed once every 3 years since 1987. To ensure that the data are representative of the 60+ population, the samples in 1990 and 1996 were refreshed to add individuals aged 60-62. In 1999, a new cohort of Japanese adults aged 70+ was added to the surviving members of previous cohorts to form a database of 3,990 respondents 63+, of which some 3,000 were 70+. Currently a 6-wave longitudinal database (1987, 1990, 1993, 1996, 1999, & 2002) is in place; wave 7 began in 2006. Data Availability: Data from the first three waves of the National Survey of the Japanese Elderly are currently in the public domain and can be obtained from ICPSR. Additional data are being prepared for future public release. * Dates of Study: 1987-2006 * Study Features: Longitudinal, International * Sample Size: ** 1987: 2,200 ** 1990: 2,780 ** 1993: 2,780 ** 1996: ** 1999: 3,990 ** 2002: ** 2006: Links: * 1987 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06842 * 1990 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/03407 * 1993 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/04145 * 1996 (ICPSR): http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/26621
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vital Statistics: Japanese Only: Natural Increase data was reported at -394,373.000 Person in 2017. This records a decrease from the previous number of -330,770.000 Person for 2016. Vital Statistics: Japanese Only: Natural Increase data is updated yearly, averaging 768,649.000 Person from Dec 1947 (Median) to 2017, with 71 observations. The data reached an all-time high of 1,751,194.000 Person in 1949 and a record low of -394,373.000 Person in 2017. Vital Statistics: Japanese Only: Natural Increase data remains active status in CEIC and is reported by Ministry of Health, Labour and Welfare. The data is categorized under Global Database’s Japan – Table JP.G005: Vital Statistics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Japanese Cedar is a dataset for classification tasks - it contains JC annotations for 200 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The gravity station data (4,381 records) were compiled by the Japanese Oceanographic Data Center. This data base was received in July 1988. The data are in the 'MGD77' exchange format. Principal gravity parameters include Free-air Anomalies and Observed gravity corrected for Eotvos, drift, and tares. The observed gravity values are referenced to the International Gravity Standardization Net 1971 (IGSN 71). The gravity anomaly computation uses the Geodetic Reference System 1967 (GRS 67) theoretical gravity formula. The data are randomly distributed within the boundaries of Japan.
nairaxo/japanese-tts dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vital Statistics (VS): Japanese Only: Marriage: Total data was reported at 606,863.000 Person in 2017. This records a decrease from the previous number of 620,531.000 Person for 2016. Vital Statistics (VS): Japanese Only: Marriage: Total data is updated yearly, averaging 774,702.000 Person from Dec 1947 (Median) to 2017, with 71 observations. The data reached an all-time high of 1,099,984.000 Person in 1972 and a record low of 606,863.000 Person in 2017. Vital Statistics (VS): Japanese Only: Marriage: Total data remains active status in CEIC and is reported by Ministry of Health, Labour and Welfare. The data is categorized under Global Database’s Japan – Table JP.G010: Vital Statistics: Marriage.
This ranking displays the results of the worldwide Made-In-Country Index 2017, a survey conducted to show how positively products "made in..." are perceived in various countries all over the world. During this survey, all respondents (100 percent) from Vietnam perceived products made in Japan as "slightly positive" or "very positive".
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
This corpus comprises 2,000 Japanese place names uttered by 200 speakers of different dialects, ages and various educational levels, recorded over 4 channels. Speech samples are stored as a sequence of 16-bit 48kHz WAV for 3.93 hours of speech per channel. The total capacity of the data is 3.96 Gb. Each speaker read 10 items. Text files are stored in Unicode format. All data have been proofread manually. The corpus aims to be applied to the testing and telephone natural speech recognition system. This corpus is partly included in ELRA-S0228-54.
The majority of Japanese consumers in Japan, more than ** percent, have not used a recurring service recently, as revealed in a survey conducted in January 2025. The most commonly used subscription service was of the flat-rate model, which allows unlimited-use of online services and digital content as long as it remains within service terms.
The market size of digital publications in Japan was estimated at 566 billion Japanese yen in 2024, which was an increase of 30.9 billion yen compared to the previous year. Digital publishing and print publishing together constitute the larger publishing market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States Imports from Japan was US$152.07 Billion during 2024, according to the United Nations COMTRADE database on international trade. United States Imports from Japan - data, historical chart and statistics - was last updated on September of 2025.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Japanese Telephone Dialogues Dataset - 513 Hours
Dataset comprises 513 hours of high-quality telephone audio recordings in Japanese, featuring 800+ native speakers and achieving a 95% sentence accuracy rate. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data
Dataset characteristics:… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/japanese-speech-recognition-dataset.