33 datasets found

NTR Vaidya Seva 2017
kaggle.com
Updated Oct 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Srikar (2018). NTR Vaidya Seva 2017 [Dataset]. https://www.kaggle.com/srikarkashyap/ntr-arogya-seva-2017/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Srikar
Description
About

This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.

Acknowledgements

Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.

Also thanks to Unsplash for the cover pic!

Inspiration

A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!

Ownership

Dataset owned by the Government of Andhra Pradesh but released freely on official website.
d
Year-wise Population Estimates of Tigers
dataful.in
Updated Jul 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). Year-wise Population Estimates of Tigers [Dataset]. https://dataful.in/datasets/584
Explore at:
xlsx, application/x-parquet, csvAvailable download formats
Dataset updated
Jul 25, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
India
Variables measured
Number of tigers
Description
The dataset gives the population estimates of tigers. In the dataset, states have been categorized as Shivalik-Gangetic Plain Landscape Complex, Uttarakhand, Uttar Pradesh, Bihar. Shivalik-Gangetic includes: Central India Landscape Complex, Andhra Pradesh (Including Telangana), Chhattisgarh, Madhya Pradesh, Maharashtra, Odisha, Rajasthan, Jharkhand, Central Indian, Western Ghats Landscape Complex, Karnataka, Kerala, Tamil Nadu, Goa. Western Ghats includes: North East Hills and Brahmaputra Flood Plains, Assam, Arunachal Pradesh, Mizoram, Northern West Bengal, North East Hills and Brahmaputra includes Sundarbans. NB: Ranipur (Uttar Pradesh) is added in Shivalik landscape for convenience. State population estimate does not add up to the landscape estimate due to common tigers, tiger outside protected areas, and model range limits.
d
Year and State wise Per Capita Availability of Power
dataful.in
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). Year and State wise Per Capita Availability of Power [Dataset]. https://dataful.in/datasets/21005
Explore at:
xlsx, application/x-parquet, csvAvailable download formats
Dataset updated
Jul 29, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
States of India
Variables measured
Per Capita Availability of Power
Description
The dataset contains State wise Per Capita Availability of Power from Handbook of Statistics on Indian States

Note: 1. Per Capita Availability of Power is worked out based on Census Population and the population for Andhra Pradesh and Telangana drawn from both Governments’ portals for the years 2014-15 and 2018-19, respectively. 2. Combined figures for Dadra and Nagar Haveli and Daman and Diu are available from 2022-23 onwards.
s
Andhra Pradesh, India: Village Points with Socio-Demographic and Economic...
searchworks.stanford.edu
zip
Updated Jan 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Andhra Pradesh, India: Village Points with Socio-Demographic and Economic Census Data, 1991 [Dataset]. https://searchworks.stanford.edu/view/pr764fd8168
Explore at:
zipAvailable download formats
Dataset updated
Jan 22, 2021
Area covered
Andhra Pradesh, India
Description
This dataset is intended for researchers, students, and policy makers for reference and mapping purposes, and may be used for village level demographic analysis within basic applications to support graphical overlays and analysis with other spatial data.
COVID-19-Related Shocks in Rural India 2020, Rounds 1-3 - India
datacatalog.ihsn.org
catalog.ihsn.org
+1more
Updated Mar 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2021). COVID-19-Related Shocks in Rural India 2020, Rounds 1-3 - India [Dataset]. https://datacatalog.ihsn.org/catalog/9553
Explore at:
Dataset updated
Mar 22, 2021
Dataset authored and provided by
World Bankhttp://worldbank.org/
Time period covered
2020
Area covered
India
Description
Abstract

An effective policy response to the economic impacts of the COVID-19 pandemic requires an enormous range of data to inform the design and response of programs. Public health measures require data on the spread of the disease, beliefs in the population, and capacity of the health system. Relief efforts depend on an understanding of hardships being faced by various segments of the population. Food policy requires measurement of agricultural production and hunger. In such a rapidly evolving pandemic, these data must be collected at a high frequency. Given the unexpected nature of the shock and urgency with which a response was required, Indian policymakers needed to formulate policies affecting India’s 1.4 billion people, without the detailed evidence required to construct effective programs. To help overcome this evidence gap, the World Bank, IDinsight, and the Development Data Lab sought to produce rigorous and responsive data for policymakers across six states in India: Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh.

Geographic coverage

Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, and Uttar Pradesh

Analysis unit

Household

Kind of data

Sample survey data [ssd]

Sampling procedure

This dataset includes observations covering six states (Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh) and three survey rounds. The survey did not have a single, unified frame from which to sample phone numbers. The final sample was assembled from several different sample frames, and the choice of frame sample frames varied across states and survey rounds.

These frames comprise four prior IDinsight projects and from an impact evaluation of the National Rural Livelihoods project conducted by the Ministry of Rural Development. Each of these surveys sought to represent distinct populations, and employed idiosyncratic sample designs and weighting schemes.

A detailed note covering key features of each sample frame is available for download.

Mode of data collection

Computer Assisted Telephone Interview [cati]

Research instrument

The survey questionnaires covered the following subjects:

Agriculture: COVID-19-related changes in price realisation, acreage decisions, input expenditure, access to credit, access to fertilisers, etc.

Income and consumption: Changes in wage rates, employment duration, consumption expenditure, prices of essential commodities, status of food security etc.

Migration: Rates of in-migration, migrant income and employment status, return migration plans etc.

Access to relief: Access to in-kind, cash and workfare relief, quantities of relief received, and constraints on the access to relief.

Health: Access to health facilities and rates of foregone healthcare, knowledge of COVID-19 related symptoms and protective behaviours.

While a number of indicators were consistent across all three rounds, questions were added and removed as and when necessary to account for seasonal changes (i.e: in the agricultural cycle).

Response rate

Round 1: ~55% Round 2: ~46% Round 3: ~55%
f
COVID-19 Related Shocks Survey (CRSS) in Rural India 2020 - India
microdata.fao.org
Updated Nov 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The World Bank (2022). COVID-19 Related Shocks Survey (CRSS) in Rural India 2020 - India [Dataset]. https://microdata.fao.org/index.php/catalog/1768
Explore at:
Dataset updated
Nov 8, 2022
Dataset authored and provided by
The World Bank
Time period covered
2020
Area covered
India
Description
Abstract

An effective policy response to the economic impacts of the COVID-19 pandemic requires an enormous range of data to inform the design and response of programs. Public health measures require data on the spread of the disease, beliefs in the population, and capacity of the health system. Relief efforts depend on an understanding of hardships being faced by various segments of the population. Food policy requires measurement of agricultural production and hunger. In such a rapidly evolving pandemic, these data must be collected at a high frequency. Given the unexpected nature of the shock and urgency with which a response was required, Indian policymakers needed to formulate policies affecting India's 1.4 billion people, without the detailed evidence required to construct effective programs. To help overcome this evidence gap, researchers from the World Bank, in collaboration with IDinsight, the Development Data Lab, and John Hopkins University sought to produce rigorous and responsive data for policymakers across six states in India: Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh.

Geographic coverage

Regional coverage

Analysis unit

Households

Universe

Households located in Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh

Kind of data

Sample survey data [ssd]

Sampling procedure

This dataset includes observations covering six states (Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh) and three survey rounds. The survey did not have a single, unified frame from which to sample phone numbers. The final sample was assembled from several different sample frames, and the choice of frame sample frames varied across states and survey rounds.

These frames comprise four prior IDinsight projects and from an impact evaluation of the National Rural Livelihoods project conducted by the Ministry of Rural Development. Each of these surveys sought to represent distinct populations, and employed idiosyncratic sample designs and weighting schemes.

A detailed note covering key features of each sample frame is available for download.

Sampling deviation

Details will be made available after all rounds of data collection and analysis is complete.

Mode of data collection

Computer Assisted Telephone Interview [cati]

Research instrument

The survey questionnaires covered the following subjects:

Agriculture: COVID-19-related changes in price realisation, acreage decisions, input expenditure, access to credit, access to fertilisers, etc.

Income and consumption: Changes in wage rates, employment duration, consumption expenditure, prices of essential commodities, status of food security etc.

Migration: Rates of in-migration, migrant income and employment status, return migration plans etc.

Access to relief: Access to in-kind, cash and workfare relief, quantities of relief received, and constraints on the access to relief.

Health: Access to health facilities and rates of foregone healthcare, knowledge of COVID-19 related symptoms and protective behaviours.

While a number of indicators were consistent across all three rounds, questions were added and removed as and when necessary to account for seasonal changes (i.e: in the agricultural cycle).

Cleaning operations

The India COVID-19 surveys were conducted using Computer Assisted Telephone Interview (CATI) techniques. The household questionnaire was implemented using the CATI software, SurveyCTO. The software was deployed through surveyors’ smartphones, who called respondents via mobile, and recorded their responses over the phone. If unreached, surveyors would attempt to call back respondents up to 7 times, often seeking explicit appointments for suitable times to avoid non-responses.

Validation and consistency checks were incorporated into the SurveyCTO software to avoid human error. Extreme values and outliers were scrutinised through a real time dashboard set up by IDinsight. Surveys were also audio audited by monitors to check for consistency and accuracy of question phrasing and answer recording. Finally, supervisors also randomly back-checked a subset of interviews to further ensure data accuracy.

IDinsight cleaned and labelled the data for further processing and analysis. The Development Data Lab examined the data for discrepancies and errors and merged the dataset with their proprietary spatial data.

All personally identifiable information has been removed from the datasets.

Response rate

Round 1: ~55% Round 2: ~46% Round 3: ~55%
m
ISL-CSLTR: Indian Sign Language Dataset for Continuous Sign Language...
data.mendeley.com
Updated Jan 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elakkiya R (2021). ISL-CSLTR: Indian Sign Language Dataset for Continuous Sign Language Translation and Recognition [Dataset]. http://doi.org/10.17632/kcmpdxky7p.1
Explore at:
Unique identifier
https://doi.org/10.17632/kcmpdxky7p.1
Dataset updated
Jan 22, 2021
Authors
Elakkiya R
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
Sign language is a cardinal element for communication between deaf and dumb community. Sign language has its own grammatical structure and gesticulation nature. Research on SLRT focuses a lot of attention in gesture identification. Sign language comprises of manual gestures performed by hand poses and non-manual features expressed through eye, mouth and gaze movements. The sentence-level completely labelled Indian Sign Language dataset for Sign Language Translation and Recognition (SLTR) research is developed. The ISL-CSLTR dataset assists the research community to explore intuitive insights and to build the SLTR framework for establishing communication with the deaf and dumb community using advanced deep learning and computer vision methods for SLTR purposes. This ISL-CSLTR dataset aims in contributing to the sentence level dataset created with two native signers from Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Andhra Pradesh, India and four student volunteers from SASTRA Deemed University, Thanjavur, Tamilnadu. The ISL-CSLTR corpus consists of a large vocabulary of 700 fully annotated videos, 18863 Sentence level frames, and 1036 word level images for 100 Spoken language Sentences performed by 7 different Signers. This corpus is arranged based on signer variants and time boundaries with fully annotated details and it is made available publicly. The main objective of creating this sentence level ISL-CSLRT corpus is to explore more research outcomes in the area of SLTR. This completely labelled video corpus assists the researchers to build framework for converting spoken language sentences into sign language and vice versa. This corpus has been created to address the various challenges faced by the researchers in SLRT and significantly improves translation and recognition performance. The videos are annotated with relevant spoken language sentences provide clear and easy understanding of the corpus data. Acknowledgements: The research was funded by the Science and Engineering Research Board (SERB), India under Start-up Research Grant (SRG)/2019–2021 (Grant no. SRG/2019/001338). And also, we thank all the signers for their contribution in collecting the sign videos and the successful completion of the ISL-CSLTR corpus. We would like to thank Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Andhra Pradesh, India for their support and contribution.
w
India - National Family Health Survey 1998-1999 - Dataset - waterdata
wbwaterdata.org
Updated Mar 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). India - National Family Health Survey 1998-1999 - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/india-national-family-health-survey-1998-1999
Explore at:
Dataset updated
Mar 16, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The second National Family Health Survey (NFHS-2), conducted in 1998-99, provides information on fertility, mortality, family planning, and important aspects of nutrition, health, and health care. The International Institute for Population Sciences (IIPS) coordinated the survey, which collected information from a nationally representative sample of more than 90,000 ever-married women age 15-49. The NFHS-2 sample covers 99 percent of India's population living in all 26 states. This report is based on the survey data for 25 of the 26 states, however, since data collection in Tripura was delayed due to local problems in the state. IIPS also coordinated the first National Family Health Survey (NFHS-1) in 1992-93. Most of the types of information collected in NFHS-2 were also collected in the earlier survey, making it possible to identify trends over the intervening period of six and one-half years. In addition, the NFHS-2 questionnaire covered a number of new or expanded topics with important policy implications, such as reproductive health, women's autonomy, domestic violence, women's nutrition, anaemia, and salt iodization. The NFHS-2 survey was carried out in two phases. Ten states were surveyed in the first phase which began in November 1998 and the remaining states (except Tripura) were surveyed in the second phase which began in March 1999. The field staff collected information from 91,196 households in these 25 states and interviewed 89,199 eligible women in these households. In addition, the survey collected information on 32,393 children born in the three years preceding the survey. One health investigator on each survey team measured the height and weight of eligible women and children and took blood samples to assess the prevalence of anaemia. SUMMARY OF FINDINGS POPULATION CHARACTERISTICS Three-quarters (73 percent) of the population lives in rural areas. The age distribution is typical of populations that have recently experienced a fertility decline, with relatively low proportions in the younger and older age groups. Thirty-six percent of the population is below age 15, and 5 percent is age 65 and above. The sex ratio is 957 females for every 1,000 males in rural areas but only 928 females for every 1,000 males in urban areas, suggesting that more men than women have migrated to urban areas. The survey provides a variety of demographic and socioeconomic background information. In the country as a whole, 82 percent of household heads are Hindu, 12 percent are Muslim, 3 percent are Christian, and 2 percent are Sikh. Muslims live disproportionately in urban areas, where they comprise 15 percent of household heads. Nineteen percent of household heads belong to scheduled castes, 9 percent belong to scheduled tribes, and 32 percent belong to other backward classes (OBCs). Two-fifths of household heads do not belong to any of these groups. Questions about housing conditions and the standard of living of households indicate some improvements since the time of NFHS-1. Sixty percent of households in India now have electricity and 39 percent have piped drinking water compared with 51 percent and 33 percent, respectively, at the time of NFHS-1. Sixty-four percent of households have no toilet facility compared with 70 percent at the time of NFHS-1. About three-fourths (75 percent) of males and half (51 percent) of females age six and above are literate, an increase of 6-8 percentage points from literacy rates at the time of NFHS-1. The percentage of illiterate males varies from 6-7 percent in Mizoram and Kerala to 37 percent in Bihar and the percentage of illiterate females varies from 11 percent in Mizoram and 15 percent in Kerala to 65 percent in Bihar. Seventy-nine percent of children age 6-14 are attending school, up from 68 percent in NFHS-1. The proportion of children attending school has increased for all ages, particularly for girls, but girls continue to lag behind boys in school attendance. Moreover, the disparity in school attendance by sex grows with increasing age of children. At age 6-10, 85 percent of boys attend school compared with 78 percent of girls. By age 15-17, 58 percent of boys attend school compared with 40 percent of girls. The percentage of girls 6-17 attending school varies from 51 percent in Bihar and 56 percent in Rajasthan to over 90 percent in Himachal Pradesh and Kerala. Women in India tend to marry at an early age. Thirty-four percent of women age 15-19 are already married including 4 percent who are married but gauna has yet to be performed. These proportions are even higher in the rural areas. Older women are more likely than younger women to have married at an early age: 39 percent of women currently age 45-49 married before age 15 compared with 14 percent of women currently age 15-19. Although this indicates that the proportion of women who marry young is declining rapidly, half the women even in the age group 20-24 have married before reaching the legal minimum age of 18 years. On average, women are five years younger than the men they marry. The median age at marriage varies from about 15 years in Madhya Pradesh, Bihar, Uttar Pradesh, Rajasthan, and Andhra Pradesh to 23 years in Goa. As part of an increasing emphasis on gender issues, NFHS-2 asked women about their participation in household decisionmaking. In India, 91 percent of women are involved in decision-making on at least one of four selected topics. A much lower proportion (52 percent), however, are involved in making decisions about their own health care. There are large variations among states in India with regard to women's involvement in household decisionmaking. More than three out of four women are involved in decisions about their own health care in Himachal Pradesh, Meghalaya, and Punjab compared with about two out of five or less in Madhya Pradesh, Orissa, and Rajasthan. Thirty-nine percent of women do work other than housework, and more than two-thirds of these women work for cash. Only 41 percent of women who earn cash can decide independently how to spend the money that they earn. Forty-three percent of working women report that their earnings constitute at least half of total family earnings, including 18 percent who report that the family is entirely dependent on their earnings. Women's work-participation rates vary from 9 percent in Punjab and 13 percent in Haryana to 60-70 percent in Manipur, Nagaland, and Arunachal Pradesh. FERTILITY AND FAMILY PLANNING Fertility continues to decline in India. At current fertility levels, women will have an average of 2.9 children each throughout their childbearing years. The total fertility rate (TFR) is down from 3.4 children per woman at the time of NFHS-1, but is still well above the replacement level of just over two children per woman. There are large variations in fertility among the states in India. Goa and Kerala have attained below replacement level fertility and Karnataka, Himachal Pradesh, Tamil Nadu, and Punjab are at or close to replacement level fertility. By contrast, fertility is 3.3 or more children per woman in Meghalaya, Uttar Pradesh, Rajasthan, Nagaland, Bihar, and Madhya Pradesh. More than one-third to less than half of all births in these latter states are fourth or higher-order births compared with 7-9 percent of births in Kerala, Goa, and Tamil Nadu. Efforts to encourage the trend towards lower fertility might usefully focus on groups within the population that have higher fertility than average. In India, rural women and women from scheduled tribes and scheduled castes have somewhat higher fertility than other women, but fertility is particularly high for illiterate women, poor women, and Muslim women. Another striking feature is the high level of childbearing among young women. More than half of women age 20-49 had their first birth before reaching age 20, and women age 15-19 account for almost one-fifth of total fertility. Studies in India and elsewhere have shown that health and mortality risks increase when women give birth at such young ages?both for the women themselves and for their children. Family planning programmes focusing on women in this age group could make a significant impact on maternal and child health and help to reduce fertility. INFANT AND CHILD MORTALITY NFHS-2 provides estimates of infant and child mortality and examines factors associated with the survival of young children. During the five years preceding the survey, the infant mortality rate was 68 deaths at age 0-11 months per 1,000 live births, substantially lower than 79 per 1,000 in the five years preceding the NFHS-1 survey. The child mortality rate, 29 deaths at age 1-4 years per 1,000 children reaching age one, also declined from the corresponding rate of 33 per 1,000 in NFHS-1. Ninety-five children out of 1,000 born do not live to age five years. Expressed differently, 1 in 15 children die in the first year of life, and 1 in 11 die before reaching age five. Child-survival programmes might usefully focus on specific groups of children with particularly high infant and child mortality rates, such as children who live in rural areas, children whose mothers are illiterate, children belonging to scheduled castes or scheduled tribes, and children from poor households. Infant mortality rates are more than two and one-half times as high for women who did not receive any of the recommended types of maternity related medical care than for mothers who did receive all recommended types of care. HEALTH, HEALTH CARE, AND NUTRITION Promotion of maternal and child health has been one of the most important components of the Family Welfare Programme of the Government of India. One goal is for each pregnant woman to receive at least three antenatal check-ups plus two tetanus toxoid injections and a full course of iron and folic acid supplementation. In India, mothers of 65 percent of the children born in the three years preceding NFHS-2 received at least one antenatal
F
Telugu Call Center Data for Healthcare AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Telugu Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-telugu-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Telugu Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Telugu speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
Speech Data
The dataset features 30 Hours of dual-channel call center conversations between native Telugu speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
•Participant Diversity:
•
Speakers: 60 verified native Telugu speakers from our contributor community.

•
Regions: Diverse regions across Andhra Pradesh and Telangana to ensure broad dialectal representation.

•
Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.

•RecordingDetails:
•
Conversation Nature: Naturally flowing, unscripted conversations.

•
Call Duration: Each session ranges between 5 to 15 minutes.

•
Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clear conditions without background noise or echo.

Topic Diversity
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
•Inbound Calls:
•Appointment Scheduling
•New Patient Registration
•Surgical Consultation
•Dietary Advice and Consultations
•Insurance Coverage Inquiries
•Follow-up Treatment Requests, and more
•OutboundCalls:
•Appointment Reminders
•Preventive Care Campaigns
•Test Results & Lab Reports
•Health Risk Assessment Calls
•Vaccination Updates
•Wellness Subscription Outreach, and more
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Transcription
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
•Transcription Includes:
•Speaker-identified Dialogues
•Time-coded Segments
•Non-speech Annotations (e.g., silence, cough)
•High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.
Metadata
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
•
Participant Metadata: ID, gender, age, region, accent, and dialect.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

Usage and Applications
This dataset can be used across a range of healthcare and voice AI use cases:
•
<b
F
Telugu Scripted Monologue Speech Data for Healthcare
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Telugu Scripted Monologue Speech Data for Healthcare [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/healthcare-scripted-speech-monologues-telugu-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Introducing the Telugu Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of Telugu language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.
Speech Data
This dataset includes over 6,000 high-quality scripted audio prompts recorded in Telugu, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.
•Participant Diversity
•
Speakers: 60 native Telugu speakers.

•
Regional Balance: Participants are sourced from multiple regions across Andhra Pradesh and Telangana, reflecting diverse dialects and linguistic traits.

•
Demographics: Includes a mix of male and female participants (60:40 ratio), aged between 18 and 70 years.

•Recording Specifications
•
Nature of Recordings: Scripted monologues based on healthcare-related use cases.

•
Duration: Each clip ranges between 5 to 30 seconds, offering short, context-rich speech samples.

•
Audio Format: WAV files recorded in mono, with 16-bit depth and sample rates of 8 kHz and 16 kHz.

•
Environment: Clean and echo-free spaces ensure clear and noise-free audio capture.

Topic Coverage
The prompts span a broad range of healthcare-specific interactions, such as:
•Patient check-in and follow-up communication
•Appointment booking and cancellation dialogues
•Insurance and regulatory support queries
•Medication, test results, and consultation discussions
•General health tips and wellness advice
•Emergency and urgent care communication
•Technical support for patient portals and apps
•Domain-specific scripted statements and FAQs
Contextual Depth
To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:
•
Names: Gender- and region-appropriate Andhra Pradesh and Telangana names

•
Addresses: Varied local address formats spoken naturally

•
Dates & Times: References to appointment dates, times, follow-ups, and schedules

•
Medical Terminology: Common medical procedures, symptoms, and treatment references

•
Numbers & Measurements: Health data like dosages, vitals, and test result values

•
Healthcare Institutions: Names of clinics, hospitals, and diagnostic centers

These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.
Transcription
Every audio recording is accompanied by a verbatim, manually verified transcription.
•
Content: The transcription mirrors the exact scripted prompt recorded by the speaker.

•
Format: Files are delivered in plain text (.TXT) format with consistent naming conventions for seamless integration.

•
f
Implications of Cardiovascular Disease Risk Assessment Using the WHO/ISH...
plos.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Raghu; Devarsetty Praveen; David Peiris; Lionel Tarassenko; Gari Clifford (2023). Implications of Cardiovascular Disease Risk Assessment Using the WHO/ISH Risk Prediction Charts in Rural India [Dataset]. http://doi.org/10.1371/journal.pone.0133618
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0133618
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Arvind Raghu; Devarsetty Praveen; David Peiris; Lionel Tarassenko; Gari Clifford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cardiovascular disease (CVD) risk in India is currently assessed using the World Health Organization/International Society for Hypertension (WHO/ISH) risk prediction charts since no population-specific models exist. The WHO/ISH risk prediction charts have two versions—one with total cholesterol as a predictor (the high information (HI) model) and the other without (the low information (LI) model). However, information on the WHO/ISH risk prediction charts including guidance on which version to use and when, as well as relative performance of the LI and HI models, is limited. This article aims to, firstly, quantify the relative performance of the LI and HI WHO/ISH risk prediction (for WHO-South East Asian Region D) using data from rural India. Secondly, we propose a pre-screening (simplified) point-of-care (POC) test to identify patients who are likely to benefit from a total cholesterol (TC) test, and subsequently when the LI model is preferential to HI model. Analysis was performed using cross-sectional data from rural Andhra Pradesh collected in 2005 with recorded blood cholesterol measurements (N = 1066). CVD risk was computed using both LI and HI models, and high risk individuals who needed treatment(THR) were subsequently identified based on clinical guidelines. Model development for the POC assessment of a TC test was performed through three machine learning techniques: Support Vector Machine (SVM), Regularised Logistic Regression (RLR), and Random Forests (RF) along with a feature selection process. Disagreement in CVD risk predicted by LI and HI WHO/ISH models was 14.5% (n = 155; p
w
World - Young Lives: An International Study of Childhood Poverty 2013-2014 -...
wbwaterdata.org
Updated Mar 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). World - Young Lives: An International Study of Childhood Poverty 2013-2014 - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/world-young-lives-international-study-childhood-poverty-2013-2014
Explore at:
Dataset updated
Mar 16, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood. The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country. Further information about the survey, including publications, can be downloaded from the Young Lives website.
F
Telugu Call Center Data for Telecom AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Telugu Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-telugu-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Telugu Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Telugu-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native Telugu speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
•Participant Diversity:
•
Speakers: 60 native Telugu speakers from our verified contributor pool.

•
Regions: Representing multiple regions across Andhra Pradesh and Telangana to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
•Inbound Calls:
•Phone Number Porting
•Network Connectivity Issues
•Billing and Payments
•Technical Support
•Service Activation
•International Roaming Enquiry
•Refund Requests and Billing Adjustments
•Emergency Service Access, and others
•Outbound Calls:
•Welcome Calls & Onboarding
•Payment Reminders
•Customer Satisfaction Surveys
•Technical Updates
•Service Usage Reviews
•Network Complaint Status Calls, and more
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., pauses, coughs)
•High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
d
India - National Family Health Survey 2005-2006 - Dataset - waterdata
waterdata3.staging.derilinx.com
Updated Mar 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). India - National Family Health Survey 2005-2006 - Dataset - waterdata [Dataset]. https://waterdata3.staging.derilinx.com/dataset/india-national-family-health-survey-2005-2006
Explore at:
Dataset updated
Mar 16, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The National Family Health Surveys (NFHS) programme, initiated in the early 1990s, has emerged as a nationally important source of data on population, health, and nutrition for India and its states. The 2005-06 National Family Health Survey (NFHS-3), the third in the series of these national surveys, was preceded by NFHS-1 in 1992-93 and NFHS-2 in 1998-99. Like NFHS-1 and NFHS-2, NFHS-3 was designed to provide estimates of important indicators on family welfare, maternal and child health, and nutrition. In addition, NFHS-3 provides information on several new and emerging issues, including family life education, safe injections, perinatal mortality, adolescent reproductive health, high-risk sexual behaviour, tuberculosis, and malaria. Further, unlike the earlier surveys in which only ever-married women age 15-49 were eligible for individual interviews, NFHS-3 interviewed all women age 15-49 and all men age 15-54. Information on nutritional status, including the prevalence of anaemia, is provided in NFHS3 for women age 15-49, men age 15-54, and young children. A special feature of NFHS-3 is the inclusion of testing of the adult population for HIV. NFHS-3 is the first nationwide community-based survey in India to provide an estimate of HIV prevalence in the general population. Specifically, NFHS-3 provides estimates of HIV prevalence among women age 15-49 and men age 15-54 for all of India, and separately for Uttar Pradesh and for Andhra Pradesh, Karnataka, Maharashtra, Manipur, and Tamil Nadu, five out of the six states classified by the National AIDS Control Organization (NACO) as high HIV prevalence states. No estimate of HIV prevalence is being provided for Nagaland, the sixth high HIV prevalence state, due to strong local opposition to the collection of blood samples. NFHS-3 covered all 29 states in India, which comprise more than 99 percent of India's population. NFHS-3 is designed to provide estimates of key indicators for India as a whole and, with the exception of HIV prevalence, for all 29 states by urban-rural residence. Additionally, NFHS-3 provides estimates for the slum and non-slum populations of eight cities, namely Chennai, Delhi, Hyderabad, Indore, Kolkata, Meerut, Mumbai, and Nagpur. NFHS-3 was conducted under the stewardship of the Ministry of Health and Family Welfare (MOHFW), Government of India, and is the result of the collaborative efforts of a large number of organizations. The International Institute for Population Sciences (IIPS), Mumbai, was designated by MOHFW as the nodal agency for the project. Funding for NFHS-3 was provided by the United States Agency for International Development (USAID), DFID, the Bill and Melinda Gates Foundation, UNICEF, UNFPA, and MOHFW. Macro International, USA, provided technical assistance at all stages of the NFHS-3 project. NACO and the National AIDS Research Institute (NARI) provided technical assistance for the HIV component of NFHS-3. Eighteen Research Organizations, including six Population Research Centres, shouldered the responsibility of conducting the survey in the different states of India and producing electronic data files. The survey used a uniform sample design, questionnaires (translated into 18 Indian languages), field procedures, and procedures for biomarker measurements throughout the country to facilitate comparability across the states and to ensure the highest possible data quality. The contents of the questionnaires were decided through an extensive collaborative process in early 2005. Based on provisional data, two national-level fact sheets and 29 state fact sheets that provide estimates of more than 50 key indicators of population, health, family welfare, and nutrition have already been released. The basic objective of releasing fact sheets within a very short period after the completion of data collection was to provide immediate feedback to planners and programme managers on key process indicators.
w
Ethiopia - Young Lives: School Survey 2012-2013 - Dataset - waterdata
wbwaterdata.org
Updated Mar 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Ethiopia - Young Lives: School Survey 2012-2013 - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/ethiopia-young-lives-school-survey-2012-2013
Explore at:
Dataset updated
Mar 16, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ethiopia
Description
The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood. The Young Lives study aims to track the lives of 12,000 children over a 15-year period, surveyed once every 3-4 years. Round 1 of Young Lives surveyed two groups of children in each country, at 1 year old and 5 years old. Round 2 returned to the same children who were then aged 5 and 12 years old. Round 3 surveyed the same children again at aged 7-8 years and 14-15 years, and Round 4 surveyed them at 12 and 19 years old. Thus the younger children are being tracked from infancy to their mid-teens and the older children through into adulthood, when some will become parents themselves. The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country. Further information about the survey, including publications, can be downloaded from the a href="http://www.younglives.org.uk/content/school-survey-0" title="School Survey" School Survey /a webpages.
H
Experiential Learning for Groundwater Governance in India: Groundwater Game...
dataverse.harvard.edu
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2025). Experiential Learning for Groundwater Governance in India: Groundwater Game Surveys [Dataset]. http://doi.org/10.7910/DVN/8EGMOT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/8EGMOT
Dataset updated
Jul 16, 2025
Dataset provided by
Harvard Dataverse
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/8EGMOThttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/8EGMOT
Time period covered
2021 - 2023
Area covered
India, India, India
Dataset funded by
https://ror.org/00e0ttj68
CGIAR Trust Fund
Description
The Scaling up experiential learning tools for sustainable water governance project aims to enhance the capacity of Indian communities to sustainably manage water resources. The intervention combined collective action games, participatory planning tools, and community debriefings to promote behavioral shifts toward sustainable groundwater and surface water management. These tools are designed to support informed decision-making, foster collective action, and strengthen governance of water as a common resource. The project included a mixed-methods impact evaluation. The study took place in 4 districts across 3 Indian states: Chittoor and Anantpur (Andhra Pradesh), Bhilwara (Rajasthan), and Chikbalapur (Karnataka). Data collection took place over two rounds. The baseline survey was conducted between October 2021 and May 2022, followed by the intervention. The endline survey was implemented from January to June 2023. The data available here are from both survey rounds, which included individual surveys, focus group discussions (FGDs), and key informant interviews (KIIs) across treatment and control sites, with baseline and endline results included in the same datasets. Within each survey type, multiple datasets are available and are organized according to the structure of the corresponding survey modules. Some datasets are at the individual or household-member level — for example, roster datasets that include information on all household members, not just the primary respondent. Others, such as the crop and water modules, are organized at the level of specific activities or resources, capturing details on each crop grown or water source used within a household. All datasets include a variable "unique_ID" which relates to the "habitation," (a sub-village administrative division) where that obervation was collected, and a "TreatmentControl" variable which denotes whether or not that observation belonged to the treated group or the control group (note: treatment is assigned at the habitation level). Additionally, the individual surveys include an "individual_id" variable, corresponding to the individual respondent.
"Road Safety & Traffic Rules Awareness Survey"
kaggle.com
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Narayana Sai Chinmai (2025). "Road Safety & Traffic Rules Awareness Survey" [Dataset]. https://www.kaggle.com/datasets/narayanasaichinmai/road-safety-and-traffic-rules-awareness-survey/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Narayana Sai Chinmai
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description of the Survey Data The dataset contains responses to a "Road Safety & Traffic Rules Awareness Survey", capturing people's opinions on traffic rules, driving habits, and road safety concerns. 1. Demographic Information: • Age group (10-20 years, 30-40 years, >40 years) • Gender (Male/Female) • Location (Andhra Pradesh, Kerala, Tamil Nadu, etc.) 2. Awareness & Behavior Towards Traffic Rules: • Importance of traffic rules (Essential, Somewhat important, Not important) • Seatbelt/helmet usage (Always, Sometimes, Rarely, Never) • Leading causes of road accidents (Over-speeding, Drunk Driving, Poor Road Conditions, etc.) • Observing rule violations (Rarely, Daily, Often, Always) • Experience with accidents (Witnessed/Involved/Never) 3. Driving Habits & Safety Measures: • Following speed limits and lane discipline • Measures for pedestrian safety (Awareness programs, More crossings, Stricter enforcement) • Primary mode of transportation (Public transport, Walking/Cycling, etc.) This survey data can help in analyzing people's awareness of road safety and designing better traffic policies.
F
Telugu Scripted Monologue Speech Dataset for BFSI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Telugu Scripted Monologue Speech Dataset for BFSI [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/bfsi-scripted-speech-monologues-telugu-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Telugu Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Telugu speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
Speech Data
This dataset includes over 6,000 scripted prompt recordings in Telugu, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
•Participant Diversity
•
Speakers: 60 native Telugu speakers.

•
Regions: Diverse representation from various Andhra Pradesh and Telangana provinces to ensure dialect and accent coverage.

•
Demographics: Age range of 18–70, with a male-to-female ratio of 60:40.

•Recording Details
•
Nature: Scripted monologues and domain-specific prompt recordings.Duration:

•
Audio Format: WAV, mono channel, 16-bit depth, recorded at 8 kHz and 16 kHz sample rates.

•Environment: Clean, echo-free, and noise-free environments.
Topic & Context Diversity
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
•Customer service interactions
•Financial transactions & balance inquiries
•Banking and insurance product queries
•Loan & credit support
•Regulatory and compliance questions
•Technical help and password resets
•Promotional campaigns and service updates
Contextual Elements
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
•
Names: Region-specific names in multiple formats

•
Addresses: Local address structures and pronunciations

•
Dates & Times: Typical time expressions used in banking

•
Organization Names: Names of banks, financial firms, and institutions

•
Currencies & Amounts: Spoken currency formats, prices, and numeric data

•
IDs & Transaction Numbers: For authentic service simulation

Transcription
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
•
Content: Exact match of each prompt

•
Format: Clean .TXT files, mapped to audio file names

•
Accuracy: Reviewed and validated by native Telugu linguists

Metadata
Each data point is enriched with detailed metadata for advanced training and analysis:
•
Participant Metadata: Unique ID, age, gender, state, country, dialect

•
Recording Metadata: Transcript, recording setup, sample rate, bit depth, device, file format

Applications and Use Cases
This BFSI-focused dataset is ideal
w
Young Lives: School Survey 2011-2012 - Viet Nam
microdata.worldbank.org
datacatalog.ihsn.org
+1more
Updated Oct 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boyden, J. (2023). Young Lives: School Survey 2011-2012 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/2606
Explore at:
Dataset updated
Oct 26, 2023
Dataset authored and provided by
Boyden, J.
Time period covered
2011 - 2012
Area covered
Vietnam
Description
Abstract

The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood.

The Young Lives study aims to track the lives of 12,000 children over a 15-year period, surveyed once every 3-4 years. Round 1 of Young Lives surveyed two groups of children in each country, at 1 year old and 5 years old. Round 2 returned to the same children who were then aged 5 and 12 years old. Round 3 surveyed the same children again at aged 7-8 years and 14-15 years, and Round 4 surveyed them at 12 and 19 years old. Thus the younger children are being tracked from infancy to their mid-teens and the older children through into adulthood, when some will become parents themselves.

The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country.

Further information about the survey, including publications, can be downloaded from the Young Lives website.

School surveys were introduced into Young Lives in 2010 in order to capture detailed information about children's experiences of schooling, and to improve our understanding of: - the relationships between learning outcomes, and children's home backgrounds, gender, work, schools, teachers and class and school peer-groups. - school effectiveness, by analysing factors explaining the development of cognitive and non-cognitive skills in school, including value-added analysis of schooling and comparative analysis of school-systems. - equity issues (including gender) in relation to learning outcomes and the evolution of inequalities within education

The survey allows us to link longitudinal information on household and child characteristics from the household survey with data on the schools attended by the Young Lives children and children's achievements inside and outside the school. It provides policy-relevant information on the relationship between child development (and its determinants) and children's experience of school, including access, quality and progression. This combination of household, child and school-level data over time constitutes the comparative advantage of Young Lives. Findings are all available on our Education theme pages and our publications page. Further information is available from the Young Lives http://www.younglives.org.uk/content/school-survey-0" title="School Survey">School Survey webpages.

Geographic coverage

Lao Cai Hung Yen Danang Phu Yen Ben Tre

Analysis unit

Individuals Institutions/organisations

Kind of data

Sample survey data [ssd]

Sampling procedure

Multi-stage stratified random sample The final sample is formed of 3,284 Grade 5 pupils in 176 classes in 92 school sites (both main and satellite sites); 1,138 of these pupils are Young Lives index children.

Mode of data collection

Face-to-face interview; Self-completion; Educational measurements; Observation

Research instrument

The instruments included in the survey are:

Questionnaires - Wave 1

School roster

Class and teacher roster

Child questionnaire (background information)

Child Maths test

Child language test (Vietnamese)

Teacher questionnaire

Teacher content knowledge test (Maths)

Teacher content knowledge test (Vietnamese)

Head teacher questionnaire

Questionnaires - Wave 2

Child class and peers questionnaire Child Maths test Child language test (Vietnamese)

Survey documentation and questionnaires will be provided shortly at http://www.younglives.org.uk/content/vietnam-school-survey
a
India: Flood Damage (2016-18)
hub.arcgis.com
up-state-observatory-esriindia1.hub.arcgis.com
Updated Sep 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GIS Online (2021). India: Flood Damage (2016-18) [Dataset]. https://hub.arcgis.com/maps/esriindia1::india-flood-damage-2016-18
Explore at:
Dataset updated
Sep 13, 2021
Dataset authored and provided by
GIS Online
Area covered

Description
This web layer contains data of state level flood damage in India (2016 - 2018) and contains information about area affected (Mha) in 2016, population affected (Million) in 2016, area wise (Mha) damages to Crops in 2016, value wise (Rs. Crore) damages to Crops in 2016 etc.Floods in IndiaFloods are recurrent phenomena in India. Due to different climatic and rainfall patterns in different regions, it has been the experience that, while some parts are suffering devastating floods, another part is suffering drought at the same time. With the increase in population and development activity, there has been a tendency to occupy the floodplains, which has resulted in damage of a more serious nature over the years. Often, because of the varying rainfall distribution, areas which are not traditionally prone to floods also experience severe inundation. Thus, floods are the single most frequent disaster faced by the country.Flooding is caused by the inadequate capacity within the banks of the rivers to contain the high flows brought down from the upper catchments due to heavy rainfall. Flooding is accentuated by erosion and silting of the riverbeds, resulting in a reduction of the carrying capacity of river channels; earthquakes and landslides leading to changes in river courses and obstructions to flow; synchronization of floods in the main and tributary rivers; retardation due to tidal effects; encroachment of floodplains; and haphazard and unplanned growth of urban areas. Some parts of the country, mainly coastal areas of Andhra Pradesh, Orissa, Tamil Nadu and West Bengal, experience cyclones, which are often accompanied by heavy rainfall leading to flooding.Flood report2016 Assam floods: Heavy rains in July–August resulted in floods affecting 1.8 million people and flooding the Kaziranga National Park killing around 200 wild animals. 2017 Gujarat flood: Following heavy rain in July 2017, Gujarat state of India was affected by the severe flood resulting in more than 200 deaths. August 2018 Kerala Flood: Following high rain in late August 2018 and heavy Monsoon rainfall from August 8, 2018, severe flooding affected the Indian state of Kerala resulting over 445 deaths.The attributes are given below for this web map:Area Affected (Mha) in 2016Population Affected (Million) in 2016Area Wise (Mha) Damages to Crops in 2016Value Wise (Rs. Crore) Damages to Crops in 2016No. of Houses Damaged in 2016Value (Rs. Crore) of Houses Damaged in 2016No. of Cattle Lost in 2016No. of Human Lives Lost in 2016Damage to Public Utilities (Rs. Crore) in 2016Total Damages - Crops, Houses, & Public Utilities (Rs. Crore) in 2016Area Affected (Mha) in 2017Population Affected (Million) in 2017Area Wise (Mha) Damages to Crops in 2017Value Wise (Rs. Crore) Damages to Crops in 2017No. of Houses Damaged in 2017Value (Rs. Crore) of Houses Damaged in 2017No. of Cattle Lost in 2017No. of Human Lives Lost in 2017Damage to Public Utilities (Rs. Crore) in 2017Total Damages - Crops, Houses & Public Utilities (Rs. Crore) in 2017Area Affected (Mha) in 2018Population Affected (Million) in 2018Area Wise (Mha) Damages to Crops in 2018Value Wise (Rs. Crore) Damages to Crops in 2018No. of Houses Damaged in 2018Value (Rs. Crore) of Houses Damaged in 2018No. of Cattle Lost in 2018No. of Human Lives Lost in 2018Damage to Public Utilities (Rs. Crore) in 2018Total Damages - Crops, Houses, & Public Utilities (Rs. Crore) in 2018This web layer is offered by Esri India, for ArcGIS Online subscribers. If you have any questions or comments, please let us know via content@esri.in.

Facebook

Twitter

Click to copy link

Link copied

Cite

Srikar (2018). NTR Vaidya Seva 2017 [Dataset]. https://www.kaggle.com/srikarkashyap/ntr-arogya-seva-2017/code

NTR Vaidya Seva 2017

Healthcare data from the Indian state of Andhra Pradesh (anonymized)

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 7, 2018

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Srikar

Description

About

This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.

Acknowledgements

Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.

Also thanks to Unsplash for the cover pic!

Inspiration

A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!

Ownership

Dataset owned by the Government of Andhra Pradesh but released freely on official website.

Clear search

Close search

Google apps

Main menu

NTR Vaidya Seva 2017

About

Acknowledgements

Inspiration

Ownership

Year-wise Population Estimates of Tigers

Year and State wise Per Capita Availability of Power

Andhra Pradesh, India: Village Points with Socio-Demographic and Economic...

COVID-19-Related Shocks in Rural India 2020, Rounds 1-3 - India

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

COVID-19 Related Shocks Survey (CRSS) in Rural India 2020 - India

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sampling deviation

Mode of data collection

Research instrument

Cleaning operations

Response rate

ISL-CSLTR: Indian Sign Language Dataset for Continuous Sign Language...

India - National Family Health Survey 1998-1999 - Dataset - waterdata

Telugu Call Center Data for Healthcare AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Telugu Scripted Monologue Speech Data for Healthcare

Introduction

Speech Data

Topic Coverage

Contextual Depth

Transcription

Implications of Cardiovascular Disease Risk Assessment Using the WHO/ISH...

World - Young Lives: An International Study of Childhood Poverty 2013-2014 -...

Telugu Call Center Data for Telecom AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

India - National Family Health Survey 2005-2006 - Dataset - waterdata

Ethiopia - Young Lives: School Survey 2012-2013 - Dataset - waterdata

Experiential Learning for Groundwater Governance in India: Groundwater Game...

"Road Safety & Traffic Rules Awareness Survey"

Telugu Scripted Monologue Speech Dataset for BFSI

Introduction

Speech Data

Topic & Context Diversity

Contextual Elements

Transcription

Metadata

Applications and Use Cases

Young Lives: School Survey 2011-2012 - Viet Nam

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

India: Flood Damage (2016-18)

NTR Vaidya Seva 2017

Healthcare data from the Indian state of Andhra Pradesh (anonymized)

About

Acknowledgements

Inspiration

Ownership