33 datasets found
  1. NTR Vaidya Seva 2017

    • kaggle.com
    Updated Oct 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srikar (2018). NTR Vaidya Seva 2017 [Dataset]. https://www.kaggle.com/srikarkashyap/ntr-arogya-seva-2017/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Srikar
    Description

    About

    This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.

    Acknowledgements

    Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.

    Also thanks to Unsplash for the cover pic!

    Inspiration

    A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!

    Ownership

    Dataset owned by the Government of Andhra Pradesh but released freely on official website.

  2. d

    Year-wise Population Estimates of Tigers

    • dataful.in
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataful (Factly) (2025). Year-wise Population Estimates of Tigers [Dataset]. https://dataful.in/datasets/584
    Explore at:
    xlsx, application/x-parquet, csvAvailable download formats
    Dataset updated
    Jul 25, 2025
    Dataset authored and provided by
    Dataful (Factly)
    License

    https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

    Area covered
    India
    Variables measured
    Number of tigers
    Description

    The dataset gives the population estimates of tigers. In the dataset, states have been categorized as Shivalik-Gangetic Plain Landscape Complex, Uttarakhand, Uttar Pradesh, Bihar. Shivalik-Gangetic includes: Central India Landscape Complex, Andhra Pradesh (Including Telangana), Chhattisgarh, Madhya Pradesh, Maharashtra, Odisha, Rajasthan, Jharkhand, Central Indian, Western Ghats Landscape Complex, Karnataka, Kerala, Tamil Nadu, Goa. Western Ghats includes: North East Hills and Brahmaputra Flood Plains, Assam, Arunachal Pradesh, Mizoram, Northern West Bengal, North East Hills and Brahmaputra includes Sundarbans. NB: Ranipur (Uttar Pradesh) is added in Shivalik landscape for convenience. State population estimate does not add up to the landscape estimate due to common tigers, tiger outside protected areas, and model range limits.

  3. d

    Year and State wise Per Capita Availability of Power

    • dataful.in
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataful (Factly) (2025). Year and State wise Per Capita Availability of Power [Dataset]. https://dataful.in/datasets/21005
    Explore at:
    xlsx, application/x-parquet, csvAvailable download formats
    Dataset updated
    Jul 29, 2025
    Dataset authored and provided by
    Dataful (Factly)
    License

    https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

    Area covered
    States of India
    Variables measured
    Per Capita Availability of Power
    Description

    The dataset contains State wise Per Capita Availability of Power from Handbook of Statistics on Indian States

    Note: 1. Per Capita Availability of Power is worked out based on Census Population and the population for Andhra Pradesh and Telangana drawn from both Governments’ portals for the years 2014-15 and 2018-19, respectively. 2. Combined figures for Dadra and Nagar Haveli and Daman and Diu are available from 2022-23 onwards.

  4. s

    Andhra Pradesh, India: Village Points with Socio-Demographic and Economic...

    • searchworks.stanford.edu
    zip
    Updated Jan 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Andhra Pradesh, India: Village Points with Socio-Demographic and Economic Census Data, 1991 [Dataset]. https://searchworks.stanford.edu/view/pr764fd8168
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 22, 2021
    Area covered
    Andhra Pradesh, India
    Description

    This dataset is intended for researchers, students, and policy makers for reference and mapping purposes, and may be used for village level demographic analysis within basic applications to support graphical overlays and analysis with other spatial data.

  5. COVID-19-Related Shocks in Rural India 2020, Rounds 1-3 - India

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    • +1more
    Updated Mar 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2021). COVID-19-Related Shocks in Rural India 2020, Rounds 1-3 - India [Dataset]. https://datacatalog.ihsn.org/catalog/9553
    Explore at:
    Dataset updated
    Mar 22, 2021
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2020
    Area covered
    India
    Description

    Abstract

    An effective policy response to the economic impacts of the COVID-19 pandemic requires an enormous range of data to inform the design and response of programs. Public health measures require data on the spread of the disease, beliefs in the population, and capacity of the health system. Relief efforts depend on an understanding of hardships being faced by various segments of the population. Food policy requires measurement of agricultural production and hunger. In such a rapidly evolving pandemic, these data must be collected at a high frequency. Given the unexpected nature of the shock and urgency with which a response was required, Indian policymakers needed to formulate policies affecting India’s 1.4 billion people, without the detailed evidence required to construct effective programs. To help overcome this evidence gap, the World Bank, IDinsight, and the Development Data Lab sought to produce rigorous and responsive data for policymakers across six states in India: Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh.

    Geographic coverage

    Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, and Uttar Pradesh

    Analysis unit

    Household

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    This dataset includes observations covering six states (Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh) and three survey rounds. The survey did not have a single, unified frame from which to sample phone numbers. The final sample was assembled from several different sample frames, and the choice of frame sample frames varied across states and survey rounds.

    These frames comprise four prior IDinsight projects and from an impact evaluation of the National Rural Livelihoods project conducted by the Ministry of Rural Development. Each of these surveys sought to represent distinct populations, and employed idiosyncratic sample designs and weighting schemes.

    A detailed note covering key features of each sample frame is available for download.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The survey questionnaires covered the following subjects:

    1. Agriculture: COVID-19-related changes in price realisation, acreage decisions, input expenditure, access to credit, access to fertilisers, etc.

    2. Income and consumption: Changes in wage rates, employment duration, consumption expenditure, prices of essential commodities, status of food security etc.

    3. Migration: Rates of in-migration, migrant income and employment status, return migration plans etc.

    4. Access to relief: Access to in-kind, cash and workfare relief, quantities of relief received, and constraints on the access to relief.

    5. Health: Access to health facilities and rates of foregone healthcare, knowledge of COVID-19 related symptoms and protective behaviours.

    While a number of indicators were consistent across all three rounds, questions were added and removed as and when necessary to account for seasonal changes (i.e: in the agricultural cycle).

    Response rate

    Round 1: ~55% Round 2: ~46% Round 3: ~55%

  6. f

    COVID-19 Related Shocks Survey (CRSS) in Rural India 2020 - India

    • microdata.fao.org
    Updated Nov 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The World Bank (2022). COVID-19 Related Shocks Survey (CRSS) in Rural India 2020 - India [Dataset]. https://microdata.fao.org/index.php/catalog/1768
    Explore at:
    Dataset updated
    Nov 8, 2022
    Dataset authored and provided by
    The World Bank
    Time period covered
    2020
    Area covered
    India
    Description

    Abstract

    An effective policy response to the economic impacts of the COVID-19 pandemic requires an enormous range of data to inform the design and response of programs. Public health measures require data on the spread of the disease, beliefs in the population, and capacity of the health system. Relief efforts depend on an understanding of hardships being faced by various segments of the population. Food policy requires measurement of agricultural production and hunger. In such a rapidly evolving pandemic, these data must be collected at a high frequency. Given the unexpected nature of the shock and urgency with which a response was required, Indian policymakers needed to formulate policies affecting India's 1.4 billion people, without the detailed evidence required to construct effective programs. To help overcome this evidence gap, researchers from the World Bank, in collaboration with IDinsight, the Development Data Lab, and John Hopkins University sought to produce rigorous and responsive data for policymakers across six states in India: Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh.

    Geographic coverage

    Regional coverage

    Analysis unit

    Households

    Universe

    Households located in Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    This dataset includes observations covering six states (Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh) and three survey rounds. The survey did not have a single, unified frame from which to sample phone numbers. The final sample was assembled from several different sample frames, and the choice of frame sample frames varied across states and survey rounds.

    These frames comprise four prior IDinsight projects and from an impact evaluation of the National Rural Livelihoods project conducted by the Ministry of Rural Development. Each of these surveys sought to represent distinct populations, and employed idiosyncratic sample designs and weighting schemes.

    A detailed note covering key features of each sample frame is available for download.

    Sampling deviation

    Details will be made available after all rounds of data collection and analysis is complete.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The survey questionnaires covered the following subjects:

    1. Agriculture: COVID-19-related changes in price realisation, acreage decisions, input expenditure, access to credit, access to fertilisers, etc.

    2. Income and consumption: Changes in wage rates, employment duration, consumption expenditure, prices of essential commodities, status of food security etc.

    3. Migration: Rates of in-migration, migrant income and employment status, return migration plans etc.

    4. Access to relief: Access to in-kind, cash and workfare relief, quantities of relief received, and constraints on the access to relief.

    5. Health: Access to health facilities and rates of foregone healthcare, knowledge of COVID-19 related symptoms and protective behaviours.

    While a number of indicators were consistent across all three rounds, questions were added and removed as and when necessary to account for seasonal changes (i.e: in the agricultural cycle).

    Cleaning operations

    The India COVID-19 surveys were conducted using Computer Assisted Telephone Interview (CATI) techniques. The household questionnaire was implemented using the CATI software, SurveyCTO. The software was deployed through surveyors’ smartphones, who called respondents via mobile, and recorded their responses over the phone. If unreached, surveyors would attempt to call back respondents up to 7 times, often seeking explicit appointments for suitable times to avoid non-responses.

    Validation and consistency checks were incorporated into the SurveyCTO software to avoid human error. Extreme values and outliers were scrutinised through a real time dashboard set up by IDinsight. Surveys were also audio audited by monitors to check for consistency and accuracy of question phrasing and answer recording. Finally, supervisors also randomly back-checked a subset of interviews to further ensure data accuracy.

    IDinsight cleaned and labelled the data for further processing and analysis. The Development Data Lab examined the data for discrepancies and errors and merged the dataset with their proprietary spatial data.

    All personally identifiable information has been removed from the datasets.

    Response rate

    Round 1: ~55% Round 2: ~46% Round 3: ~55%

  7. m

    ISL-CSLTR: Indian Sign Language Dataset for Continuous Sign Language...

    • data.mendeley.com
    Updated Jan 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elakkiya R (2021). ISL-CSLTR: Indian Sign Language Dataset for Continuous Sign Language Translation and Recognition [Dataset]. http://doi.org/10.17632/kcmpdxky7p.1
    Explore at:
    Dataset updated
    Jan 22, 2021
    Authors
    Elakkiya R
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Sign language is a cardinal element for communication between deaf and dumb community. Sign language has its own grammatical structure and gesticulation nature. Research on SLRT focuses a lot of attention in gesture identification. Sign language comprises of manual gestures performed by hand poses and non-manual features expressed through eye, mouth and gaze movements. The sentence-level completely labelled Indian Sign Language dataset for Sign Language Translation and Recognition (SLTR) research is developed. The ISL-CSLTR dataset assists the research community to explore intuitive insights and to build the SLTR framework for establishing communication with the deaf and dumb community using advanced deep learning and computer vision methods for SLTR purposes. This ISL-CSLTR dataset aims in contributing to the sentence level dataset created with two native signers from Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Andhra Pradesh, India and four student volunteers from SASTRA Deemed University, Thanjavur, Tamilnadu. The ISL-CSLTR corpus consists of a large vocabulary of 700 fully annotated videos, 18863 Sentence level frames, and 1036 word level images for 100 Spoken language Sentences performed by 7 different Signers. This corpus is arranged based on signer variants and time boundaries with fully annotated details and it is made available publicly. The main objective of creating this sentence level ISL-CSLRT corpus is to explore more research outcomes in the area of SLTR. This completely labelled video corpus assists the researchers to build framework for converting spoken language sentences into sign language and vice versa. This corpus has been created to address the various challenges faced by the researchers in SLRT and significantly improves translation and recognition performance. The videos are annotated with relevant spoken language sentences provide clear and easy understanding of the corpus data. Acknowledgements: The research was funded by the Science and Engineering Research Board (SERB), India under Start-up Research Grant (SRG)/2019–2021 (Grant no. SRG/2019/001338). And also, we thank all the signers for their contribution in collecting the sign videos and the successful completion of the ISL-CSLTR corpus. We would like to thank Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Andhra Pradesh, India for their support and contribution.

  8. w

    India - National Family Health Survey 1998-1999 - Dataset - waterdata

    • wbwaterdata.org
    Updated Mar 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). India - National Family Health Survey 1998-1999 - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/india-national-family-health-survey-1998-1999
    Explore at:
    Dataset updated
    Mar 16, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The second National Family Health Survey (NFHS-2), conducted in 1998-99, provides information on fertility, mortality, family planning, and important aspects of nutrition, health, and health care. The International Institute for Population Sciences (IIPS) coordinated the survey, which collected information from a nationally representative sample of more than 90,000 ever-married women age 15-49. The NFHS-2 sample covers 99 percent of India's population living in all 26 states. This report is based on the survey data for 25 of the 26 states, however, since data collection in Tripura was delayed due to local problems in the state. IIPS also coordinated the first National Family Health Survey (NFHS-1) in 1992-93. Most of the types of information collected in NFHS-2 were also collected in the earlier survey, making it possible to identify trends over the intervening period of six and one-half years. In addition, the NFHS-2 questionnaire covered a number of new or expanded topics with important policy implications, such as reproductive health, women's autonomy, domestic violence, women's nutrition, anaemia, and salt iodization. The NFHS-2 survey was carried out in two phases. Ten states were surveyed in the first phase which began in November 1998 and the remaining states (except Tripura) were surveyed in the second phase which began in March 1999. The field staff collected information from 91,196 households in these 25 states and interviewed 89,199 eligible women in these households. In addition, the survey collected information on 32,393 children born in the three years preceding the survey. One health investigator on each survey team measured the height and weight of eligible women and children and took blood samples to assess the prevalence of anaemia. SUMMARY OF FINDINGS POPULATION CHARACTERISTICS Three-quarters (73 percent) of the population lives in rural areas. The age distribution is typical of populations that have recently experienced a fertility decline, with relatively low proportions in the younger and older age groups. Thirty-six percent of the population is below age 15, and 5 percent is age 65 and above. The sex ratio is 957 females for every 1,000 males in rural areas but only 928 females for every 1,000 males in urban areas, suggesting that more men than women have migrated to urban areas. The survey provides a variety of demographic and socioeconomic background information. In the country as a whole, 82 percent of household heads are Hindu, 12 percent are Muslim, 3 percent are Christian, and 2 percent are Sikh. Muslims live disproportionately in urban areas, where they comprise 15 percent of household heads. Nineteen percent of household heads belong to scheduled castes, 9 percent belong to scheduled tribes, and 32 percent belong to other backward classes (OBCs). Two-fifths of household heads do not belong to any of these groups. Questions about housing conditions and the standard of living of households indicate some improvements since the time of NFHS-1. Sixty percent of households in India now have electricity and 39 percent have piped drinking water compared with 51 percent and 33 percent, respectively, at the time of NFHS-1. Sixty-four percent of households have no toilet facility compared with 70 percent at the time of NFHS-1. About three-fourths (75 percent) of males and half (51 percent) of females age six and above are literate, an increase of 6-8 percentage points from literacy rates at the time of NFHS-1. The percentage of illiterate males varies from 6-7 percent in Mizoram and Kerala to 37 percent in Bihar and the percentage of illiterate females varies from 11 percent in Mizoram and 15 percent in Kerala to 65 percent in Bihar. Seventy-nine percent of children age 6-14 are attending school, up from 68 percent in NFHS-1. The proportion of children attending school has increased for all ages, particularly for girls, but girls continue to lag behind boys in school attendance. Moreover, the disparity in school attendance by sex grows with increasing age of children. At age 6-10, 85 percent of boys attend school compared with 78 percent of girls. By age 15-17, 58 percent of boys attend school compared with 40 percent of girls. The percentage of girls 6-17 attending school varies from 51 percent in Bihar and 56 percent in Rajasthan to over 90 percent in Himachal Pradesh and Kerala. Women in India tend to marry at an early age. Thirty-four percent of women age 15-19 are already married including 4 percent who are married but gauna has yet to be performed. These proportions are even higher in the rural areas. Older women are more likely than younger women to have married at an early age: 39 percent of women currently age 45-49 married before age 15 compared with 14 percent of women currently age 15-19. Although this indicates that the proportion of women who marry young is declining rapidly, half the women even in the age group 20-24 have married before reaching the legal minimum age of 18 years. On average, women are five years younger than the men they marry. The median age at marriage varies from about 15 years in Madhya Pradesh, Bihar, Uttar Pradesh, Rajasthan, and Andhra Pradesh to 23 years in Goa. As part of an increasing emphasis on gender issues, NFHS-2 asked women about their participation in household decisionmaking. In India, 91 percent of women are involved in decision-making on at least one of four selected topics. A much lower proportion (52 percent), however, are involved in making decisions about their own health care. There are large variations among states in India with regard to women's involvement in household decisionmaking. More than three out of four women are involved in decisions about their own health care in Himachal Pradesh, Meghalaya, and Punjab compared with about two out of five or less in Madhya Pradesh, Orissa, and Rajasthan. Thirty-nine percent of women do work other than housework, and more than two-thirds of these women work for cash. Only 41 percent of women who earn cash can decide independently how to spend the money that they earn. Forty-three percent of working women report that their earnings constitute at least half of total family earnings, including 18 percent who report that the family is entirely dependent on their earnings. Women's work-participation rates vary from 9 percent in Punjab and 13 percent in Haryana to 60-70 percent in Manipur, Nagaland, and Arunachal Pradesh. FERTILITY AND FAMILY PLANNING Fertility continues to decline in India. At current fertility levels, women will have an average of 2.9 children each throughout their childbearing years. The total fertility rate (TFR) is down from 3.4 children per woman at the time of NFHS-1, but is still well above the replacement level of just over two children per woman. There are large variations in fertility among the states in India. Goa and Kerala have attained below replacement level fertility and Karnataka, Himachal Pradesh, Tamil Nadu, and Punjab are at or close to replacement level fertility. By contrast, fertility is 3.3 or more children per woman in Meghalaya, Uttar Pradesh, Rajasthan, Nagaland, Bihar, and Madhya Pradesh. More than one-third to less than half of all births in these latter states are fourth or higher-order births compared with 7-9 percent of births in Kerala, Goa, and Tamil Nadu. Efforts to encourage the trend towards lower fertility might usefully focus on groups within the population that have higher fertility than average. In India, rural women and women from scheduled tribes and scheduled castes have somewhat higher fertility than other women, but fertility is particularly high for illiterate women, poor women, and Muslim women. Another striking feature is the high level of childbearing among young women. More than half of women age 20-49 had their first birth before reaching age 20, and women age 15-19 account for almost one-fifth of total fertility. Studies in India and elsewhere have shown that health and mortality risks increase when women give birth at such young ages?both for the women themselves and for their children. Family planning programmes focusing on women in this age group could make a significant impact on maternal and child health and help to reduce fertility. INFANT AND CHILD MORTALITY NFHS-2 provides estimates of infant and child mortality and examines factors associated with the survival of young children. During the five years preceding the survey, the infant mortality rate was 68 deaths at age 0-11 months per 1,000 live births, substantially lower than 79 per 1,000 in the five years preceding the NFHS-1 survey. The child mortality rate, 29 deaths at age 1-4 years per 1,000 children reaching age one, also declined from the corresponding rate of 33 per 1,000 in NFHS-1. Ninety-five children out of 1,000 born do not live to age five years. Expressed differently, 1 in 15 children die in the first year of life, and 1 in 11 die before reaching age five. Child-survival programmes might usefully focus on specific groups of children with particularly high infant and child mortality rates, such as children who live in rural areas, children whose mothers are illiterate, children belonging to scheduled castes or scheduled tribes, and children from poor households. Infant mortality rates are more than two and one-half times as high for women who did not receive any of the recommended types of maternity related medical care than for mothers who did receive all recommended types of care. HEALTH, HEALTH CARE, AND NUTRITION Promotion of maternal and child health has been one of the most important components of the Family Welfare Programme of the Government of India. One goal is for each pregnant woman to receive at least three antenatal check-ups plus two tetanus toxoid injections and a full course of iron and folic acid supplementation. In India, mothers of 65 percent of the children born in the three years preceding NFHS-2 received at least one antenatal

  9. F

    Telugu Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Telugu Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-telugu-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Telugu Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Telugu speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native Telugu speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native Telugu speakers from our contributor community.
    Regions: Diverse regions across Andhra Pradesh and Telangana to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

    <b

  10. F

    Telugu Scripted Monologue Speech Data for Healthcare

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Telugu Scripted Monologue Speech Data for Healthcare [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/healthcare-scripted-speech-monologues-telugu-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Introducing the Telugu Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of Telugu language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.

    Speech Data

    This dataset includes over 6,000 high-quality scripted audio prompts recorded in Telugu, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.

    Participant Diversity
    Speakers: 60 native Telugu speakers.
    Regional Balance: Participants are sourced from multiple regions across Andhra Pradesh and Telangana, reflecting diverse dialects and linguistic traits.
    Demographics: Includes a mix of male and female participants (60:40 ratio), aged between 18 and 70 years.
    Recording Specifications
    Nature of Recordings: Scripted monologues based on healthcare-related use cases.
    Duration: Each clip ranges between 5 to 30 seconds, offering short, context-rich speech samples.
    Audio Format: WAV files recorded in mono, with 16-bit depth and sample rates of 8 kHz and 16 kHz.
    Environment: Clean and echo-free spaces ensure clear and noise-free audio capture.

    Topic Coverage

    The prompts span a broad range of healthcare-specific interactions, such as:

    Patient check-in and follow-up communication
    Appointment booking and cancellation dialogues
    Insurance and regulatory support queries
    Medication, test results, and consultation discussions
    General health tips and wellness advice
    Emergency and urgent care communication
    Technical support for patient portals and apps
    Domain-specific scripted statements and FAQs

    Contextual Depth

    To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:

    Names: Gender- and region-appropriate Andhra Pradesh and Telangana names
    Addresses: Varied local address formats spoken naturally
    Dates & Times: References to appointment dates, times, follow-ups, and schedules
    Medical Terminology: Common medical procedures, symptoms, and treatment references
    Numbers & Measurements: Health data like dosages, vitals, and test result values
    Healthcare Institutions: Names of clinics, hospitals, and diagnostic centers

    These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.

    Transcription

    Every audio recording is accompanied by a verbatim, manually verified transcription.

    Content: The transcription mirrors the exact scripted prompt recorded by the speaker.
    Format: Files are delivered in plain text (.TXT) format with consistent naming conventions for seamless integration.

  11. f

    Implications of Cardiovascular Disease Risk Assessment Using the WHO/ISH...

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Raghu; Devarsetty Praveen; David Peiris; Lionel Tarassenko; Gari Clifford (2023). Implications of Cardiovascular Disease Risk Assessment Using the WHO/ISH Risk Prediction Charts in Rural India [Dataset]. http://doi.org/10.1371/journal.pone.0133618
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Arvind Raghu; Devarsetty Praveen; David Peiris; Lionel Tarassenko; Gari Clifford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cardiovascular disease (CVD) risk in India is currently assessed using the World Health Organization/International Society for Hypertension (WHO/ISH) risk prediction charts since no population-specific models exist. The WHO/ISH risk prediction charts have two versions—one with total cholesterol as a predictor (the high information (HI) model) and the other without (the low information (LI) model). However, information on the WHO/ISH risk prediction charts including guidance on which version to use and when, as well as relative performance of the LI and HI models, is limited. This article aims to, firstly, quantify the relative performance of the LI and HI WHO/ISH risk prediction (for WHO-South East Asian Region D) using data from rural India. Secondly, we propose a pre-screening (simplified) point-of-care (POC) test to identify patients who are likely to benefit from a total cholesterol (TC) test, and subsequently when the LI model is preferential to HI model. Analysis was performed using cross-sectional data from rural Andhra Pradesh collected in 2005 with recorded blood cholesterol measurements (N = 1066). CVD risk was computed using both LI and HI models, and high risk individuals who needed treatment(THR) were subsequently identified based on clinical guidelines. Model development for the POC assessment of a TC test was performed through three machine learning techniques: Support Vector Machine (SVM), Regularised Logistic Regression (RLR), and Random Forests (RF) along with a feature selection process. Disagreement in CVD risk predicted by LI and HI WHO/ISH models was 14.5% (n = 155; p

  12. w

    World - Young Lives: An International Study of Childhood Poverty 2013-2014 -...

    • wbwaterdata.org
    Updated Mar 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). World - Young Lives: An International Study of Childhood Poverty 2013-2014 - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/world-young-lives-international-study-childhood-poverty-2013-2014
    Explore at:
    Dataset updated
    Mar 16, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood. The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country. Further information about the survey, including publications, can be downloaded from the Young Lives website.

  13. F

    Telugu Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Telugu Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-telugu-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Telugu Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Telugu-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Telugu speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native Telugu speakers from our verified contributor pool.
    Regions: Representing multiple regions across Andhra Pradesh and Telangana to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  14. d

    India - National Family Health Survey 2005-2006 - Dataset - waterdata

    • waterdata3.staging.derilinx.com
    Updated Mar 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). India - National Family Health Survey 2005-2006 - Dataset - waterdata [Dataset]. https://waterdata3.staging.derilinx.com/dataset/india-national-family-health-survey-2005-2006
    Explore at:
    Dataset updated
    Mar 16, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The National Family Health Surveys (NFHS) programme, initiated in the early 1990s, has emerged as a nationally important source of data on population, health, and nutrition for India and its states. The 2005-06 National Family Health Survey (NFHS-3), the third in the series of these national surveys, was preceded by NFHS-1 in 1992-93 and NFHS-2 in 1998-99. Like NFHS-1 and NFHS-2, NFHS-3 was designed to provide estimates of important indicators on family welfare, maternal and child health, and nutrition. In addition, NFHS-3 provides information on several new and emerging issues, including family life education, safe injections, perinatal mortality, adolescent reproductive health, high-risk sexual behaviour, tuberculosis, and malaria. Further, unlike the earlier surveys in which only ever-married women age 15-49 were eligible for individual interviews, NFHS-3 interviewed all women age 15-49 and all men age 15-54. Information on nutritional status, including the prevalence of anaemia, is provided in NFHS3 for women age 15-49, men age 15-54, and young children. A special feature of NFHS-3 is the inclusion of testing of the adult population for HIV. NFHS-3 is the first nationwide community-based survey in India to provide an estimate of HIV prevalence in the general population. Specifically, NFHS-3 provides estimates of HIV prevalence among women age 15-49 and men age 15-54 for all of India, and separately for Uttar Pradesh and for Andhra Pradesh, Karnataka, Maharashtra, Manipur, and Tamil Nadu, five out of the six states classified by the National AIDS Control Organization (NACO) as high HIV prevalence states. No estimate of HIV prevalence is being provided for Nagaland, the sixth high HIV prevalence state, due to strong local opposition to the collection of blood samples. NFHS-3 covered all 29 states in India, which comprise more than 99 percent of India's population. NFHS-3 is designed to provide estimates of key indicators for India as a whole and, with the exception of HIV prevalence, for all 29 states by urban-rural residence. Additionally, NFHS-3 provides estimates for the slum and non-slum populations of eight cities, namely Chennai, Delhi, Hyderabad, Indore, Kolkata, Meerut, Mumbai, and Nagpur. NFHS-3 was conducted under the stewardship of the Ministry of Health and Family Welfare (MOHFW), Government of India, and is the result of the collaborative efforts of a large number of organizations. The International Institute for Population Sciences (IIPS), Mumbai, was designated by MOHFW as the nodal agency for the project. Funding for NFHS-3 was provided by the United States Agency for International Development (USAID), DFID, the Bill and Melinda Gates Foundation, UNICEF, UNFPA, and MOHFW. Macro International, USA, provided technical assistance at all stages of the NFHS-3 project. NACO and the National AIDS Research Institute (NARI) provided technical assistance for the HIV component of NFHS-3. Eighteen Research Organizations, including six Population Research Centres, shouldered the responsibility of conducting the survey in the different states of India and producing electronic data files. The survey used a uniform sample design, questionnaires (translated into 18 Indian languages), field procedures, and procedures for biomarker measurements throughout the country to facilitate comparability across the states and to ensure the highest possible data quality. The contents of the questionnaires were decided through an extensive collaborative process in early 2005. Based on provisional data, two national-level fact sheets and 29 state fact sheets that provide estimates of more than 50 key indicators of population, health, family welfare, and nutrition have already been released. The basic objective of releasing fact sheets within a very short period after the completion of data collection was to provide immediate feedback to planners and programme managers on key process indicators.

  15. w

    Ethiopia - Young Lives: School Survey 2012-2013 - Dataset - waterdata

    • wbwaterdata.org
    Updated Mar 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Ethiopia - Young Lives: School Survey 2012-2013 - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/ethiopia-young-lives-school-survey-2012-2013
    Explore at:
    Dataset updated
    Mar 16, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ethiopia
    Description

    The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood. The Young Lives study aims to track the lives of 12,000 children over a 15-year period, surveyed once every 3-4 years. Round 1 of Young Lives surveyed two groups of children in each country, at 1 year old and 5 years old. Round 2 returned to the same children who were then aged 5 and 12 years old. Round 3 surveyed the same children again at aged 7-8 years and 14-15 years, and Round 4 surveyed them at 12 and 19 years old. Thus the younger children are being tracked from infancy to their mid-teens and the older children through into adulthood, when some will become parents themselves. The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country. Further information about the survey, including publications, can be downloaded from the a href="http://www.younglives.org.uk/content/school-survey-0" title="School Survey" School Survey /a webpages.

  16. H

    Experiential Learning for Groundwater Governance in India: Groundwater Game...

    • dataverse.harvard.edu
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2025). Experiential Learning for Groundwater Governance in India: Groundwater Game Surveys [Dataset]. http://doi.org/10.7910/DVN/8EGMOT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    Harvard Dataverse
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/8EGMOThttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/8EGMOT

    Time period covered
    2021 - 2023
    Area covered
    India, India, India
    Dataset funded by
    https://ror.org/00e0ttj68
    CGIAR Trust Fund
    Description

    The Scaling up experiential learning tools for sustainable water governance project aims to enhance the capacity of Indian communities to sustainably manage water resources. The intervention combined collective action games, participatory planning tools, and community debriefings to promote behavioral shifts toward sustainable groundwater and surface water management. These tools are designed to support informed decision-making, foster collective action, and strengthen governance of water as a common resource. The project included a mixed-methods impact evaluation. The study took place in 4 districts across 3 Indian states: Chittoor and Anantpur (Andhra Pradesh), Bhilwara (Rajasthan), and Chikbalapur (Karnataka). Data collection took place over two rounds. The baseline survey was conducted between October 2021 and May 2022, followed by the intervention. The endline survey was implemented from January to June 2023. The data available here are from both survey rounds, which included individual surveys, focus group discussions (FGDs), and key informant interviews (KIIs) across treatment and control sites, with baseline and endline results included in the same datasets. Within each survey type, multiple datasets are available and are organized according to the structure of the corresponding survey modules. Some datasets are at the individual or household-member level — for example, roster datasets that include information on all household members, not just the primary respondent. Others, such as the crop and water modules, are organized at the level of specific activities or resources, capturing details on each crop grown or water source used within a household. All datasets include a variable "unique_ID" which relates to the "habitation," (a sub-village administrative division) where that obervation was collected, and a "TreatmentControl" variable which denotes whether or not that observation belonged to the treated group or the control group (note: treatment is assigned at the habitation level). Additionally, the individual surveys include an "individual_id" variable, corresponding to the individual respondent.

  17. "Road Safety & Traffic Rules Awareness Survey"

    • kaggle.com
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Narayana Sai Chinmai (2025). "Road Safety & Traffic Rules Awareness Survey" [Dataset]. https://www.kaggle.com/datasets/narayanasaichinmai/road-safety-and-traffic-rules-awareness-survey/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Narayana Sai Chinmai
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description of the Survey Data The dataset contains responses to a "Road Safety & Traffic Rules Awareness Survey", capturing people's opinions on traffic rules, driving habits, and road safety concerns. 1. Demographic Information: • Age group (10-20 years, 30-40 years, >40 years) • Gender (Male/Female) • Location (Andhra Pradesh, Kerala, Tamil Nadu, etc.) 2. Awareness & Behavior Towards Traffic Rules: • Importance of traffic rules (Essential, Somewhat important, Not important) • Seatbelt/helmet usage (Always, Sometimes, Rarely, Never) • Leading causes of road accidents (Over-speeding, Drunk Driving, Poor Road Conditions, etc.) • Observing rule violations (Rarely, Daily, Often, Always) • Experience with accidents (Witnessed/Involved/Never) 3. Driving Habits & Safety Measures: • Following speed limits and lane discipline • Measures for pedestrian safety (Awareness programs, More crossings, Stricter enforcement) • Primary mode of transportation (Public transport, Walking/Cycling, etc.) This survey data can help in analyzing people's awareness of road safety and designing better traffic policies.

  18. F

    Telugu Scripted Monologue Speech Dataset for BFSI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Telugu Scripted Monologue Speech Dataset for BFSI [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/bfsi-scripted-speech-monologues-telugu-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Telugu Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Telugu speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.

    Speech Data

    This dataset includes over 6,000 scripted prompt recordings in Telugu, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.

    Participant Diversity
    Speakers: 60 native Telugu speakers.
    Regions: Diverse representation from various Andhra Pradesh and Telangana provinces to ensure dialect and accent coverage.
    Demographics: Age range of 18–70, with a male-to-female ratio of 60:40.
    Recording Details
    Nature: Scripted monologues and domain-specific prompt recordings.Duration:
    Audio Format: WAV, mono channel, 16-bit depth, recorded at 8 kHz and 16 kHz sample rates.
    Environment: Clean, echo-free, and noise-free environments.

    Topic & Context Diversity

    This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:

    Customer service interactions
    Financial transactions & balance inquiries
    Banking and insurance product queries
    Loan & credit support
    Regulatory and compliance questions
    Technical help and password resets
    Promotional campaigns and service updates

    Contextual Elements

    To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:

    Names: Region-specific names in multiple formats
    Addresses: Local address structures and pronunciations
    Dates & Times: Typical time expressions used in banking
    Organization Names: Names of banks, financial firms, and institutions
    Currencies & Amounts: Spoken currency formats, prices, and numeric data
    IDs & Transaction Numbers: For authentic service simulation

    Transcription

    Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.

    Content: Exact match of each prompt
    Format: Clean .TXT files, mapped to audio file names
    Accuracy: Reviewed and validated by native Telugu linguists

    Metadata

    Each data point is enriched with detailed metadata for advanced training and analysis:

    Participant Metadata: Unique ID, age, gender, state, country, dialect
    Recording Metadata: Transcript, recording setup, sample rate, bit depth, device, file format

    Applications and Use Cases

    This BFSI-focused dataset is ideal

  19. w

    Young Lives: School Survey 2011-2012 - Viet Nam

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boyden, J. (2023). Young Lives: School Survey 2011-2012 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/2606
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    Boyden, J.
    Time period covered
    2011 - 2012
    Area covered
    Vietnam
    Description

    Abstract

    The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood.

    The Young Lives study aims to track the lives of 12,000 children over a 15-year period, surveyed once every 3-4 years. Round 1 of Young Lives surveyed two groups of children in each country, at 1 year old and 5 years old. Round 2 returned to the same children who were then aged 5 and 12 years old. Round 3 surveyed the same children again at aged 7-8 years and 14-15 years, and Round 4 surveyed them at 12 and 19 years old. Thus the younger children are being tracked from infancy to their mid-teens and the older children through into adulthood, when some will become parents themselves.

    The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country.

    Further information about the survey, including publications, can be downloaded from the Young Lives website.

    School surveys were introduced into Young Lives in 2010 in order to capture detailed information about children's experiences of schooling, and to improve our understanding of: - the relationships between learning outcomes, and children's home backgrounds, gender, work, schools, teachers and class and school peer-groups. - school effectiveness, by analysing factors explaining the development of cognitive and non-cognitive skills in school, including value-added analysis of schooling and comparative analysis of school-systems. - equity issues (including gender) in relation to learning outcomes and the evolution of inequalities within education

    The survey allows us to link longitudinal information on household and child characteristics from the household survey with data on the schools attended by the Young Lives children and children's achievements inside and outside the school. It provides policy-relevant information on the relationship between child development (and its determinants) and children's experience of school, including access, quality and progression. This combination of household, child and school-level data over time constitutes the comparative advantage of Young Lives. Findings are all available on our Education theme pages and our publications page. Further information is available from the Young Lives http://www.younglives.org.uk/content/school-survey-0" title="School Survey">School Survey webpages.

    Geographic coverage

    Lao Cai Hung Yen Danang Phu Yen Ben Tre

    Analysis unit

    Individuals Institutions/organisations

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Multi-stage stratified random sample The final sample is formed of 3,284 Grade 5 pupils in 176 classes in 92 school sites (both main and satellite sites); 1,138 of these pupils are Young Lives index children.

    Mode of data collection

    Face-to-face interview; Self-completion; Educational measurements; Observation

    Research instrument

    The instruments included in the survey are:

    Questionnaires - Wave 1

    • School roster
    • Class and teacher roster
    • Child questionnaire (background information)
    • Child Maths test
    • Child language test (Vietnamese)
    • Teacher questionnaire
    • Teacher content knowledge test (Maths)
    • Teacher content knowledge test (Vietnamese)
    • Head teacher questionnaire

    Questionnaires - Wave 2

    Child class and peers questionnaire Child Maths test Child language test (Vietnamese)

    Survey documentation and questionnaires will be provided shortly at http://www.younglives.org.uk/content/vietnam-school-survey

  20. a

    India: Flood Damage (2016-18)

    • hub.arcgis.com
    • up-state-observatory-esriindia1.hub.arcgis.com
    Updated Sep 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GIS Online (2021). India: Flood Damage (2016-18) [Dataset]. https://hub.arcgis.com/maps/esriindia1::india-flood-damage-2016-18
    Explore at:
    Dataset updated
    Sep 13, 2021
    Dataset authored and provided by
    GIS Online
    Area covered
    Description

    This web layer contains data of state level flood damage in India (2016 - 2018) and contains information about area affected (Mha) in 2016, population affected (Million) in 2016, area wise (Mha) damages to Crops in 2016, value wise (Rs. Crore) damages to Crops in 2016 etc.Floods in IndiaFloods are recurrent phenomena in India. Due to different climatic and rainfall patterns in different regions, it has been the experience that, while some parts are suffering devastating floods, another part is suffering drought at the same time. With the increase in population and development activity, there has been a tendency to occupy the floodplains, which has resulted in damage of a more serious nature over the years. Often, because of the varying rainfall distribution, areas which are not traditionally prone to floods also experience severe inundation. Thus, floods are the single most frequent disaster faced by the country.Flooding is caused by the inadequate capacity within the banks of the rivers to contain the high flows brought down from the upper catchments due to heavy rainfall. Flooding is accentuated by erosion and silting of the riverbeds, resulting in a reduction of the carrying capacity of river channels; earthquakes and landslides leading to changes in river courses and obstructions to flow; synchronization of floods in the main and tributary rivers; retardation due to tidal effects; encroachment of floodplains; and haphazard and unplanned growth of urban areas. Some parts of the country, mainly coastal areas of Andhra Pradesh, Orissa, Tamil Nadu and West Bengal, experience cyclones, which are often accompanied by heavy rainfall leading to flooding.Flood report2016 Assam floods: Heavy rains in July–August resulted in floods affecting 1.8 million people and flooding the Kaziranga National Park killing around 200 wild animals. 2017 Gujarat flood: Following heavy rain in July 2017, Gujarat state of India was affected by the severe flood resulting in more than 200 deaths. August 2018 Kerala Flood: Following high rain in late August 2018 and heavy Monsoon rainfall from August 8, 2018, severe flooding affected the Indian state of Kerala resulting over 445 deaths.The attributes are given below for this web map:Area Affected (Mha) in 2016Population Affected (Million) in 2016Area Wise (Mha) Damages to Crops in 2016Value Wise (Rs. Crore) Damages to Crops in 2016No. of Houses Damaged in 2016Value (Rs. Crore) of Houses Damaged in 2016No. of Cattle Lost in 2016No. of Human Lives Lost in 2016Damage to Public Utilities (Rs. Crore) in 2016Total Damages - Crops, Houses, & Public Utilities (Rs. Crore) in 2016Area Affected (Mha) in 2017Population Affected (Million) in 2017Area Wise (Mha) Damages to Crops in 2017Value Wise (Rs. Crore) Damages to Crops in 2017No. of Houses Damaged in 2017Value (Rs. Crore) of Houses Damaged in 2017No. of Cattle Lost in 2017No. of Human Lives Lost in 2017Damage to Public Utilities (Rs. Crore) in 2017Total Damages - Crops, Houses & Public Utilities (Rs. Crore) in 2017Area Affected (Mha) in 2018Population Affected (Million) in 2018Area Wise (Mha) Damages to Crops in 2018Value Wise (Rs. Crore) Damages to Crops in 2018No. of Houses Damaged in 2018Value (Rs. Crore) of Houses Damaged in 2018No. of Cattle Lost in 2018No. of Human Lives Lost in 2018Damage to Public Utilities (Rs. Crore) in 2018Total Damages - Crops, Houses, & Public Utilities (Rs. Crore) in 2018This web layer is offered by Esri India, for ArcGIS Online subscribers. If you have any questions or comments, please let us know via content@esri.in.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Srikar (2018). NTR Vaidya Seva 2017 [Dataset]. https://www.kaggle.com/srikarkashyap/ntr-arogya-seva-2017/code
Organization logo

NTR Vaidya Seva 2017

Healthcare data from the Indian state of Andhra Pradesh (anonymized)

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Srikar
Description

About

This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.

Acknowledgements

Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.

Also thanks to Unsplash for the cover pic!

Inspiration

A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!

Ownership

Dataset owned by the Government of Andhra Pradesh but released freely on official website.

Search
Clear search
Close search
Google apps
Main menu