This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.
Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.
Also thanks to Unsplash for the cover pic!
A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!
Dataset owned by the Government of Andhra Pradesh but released freely on official website.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset gives the population estimates of tigers. In the dataset, states have been categorized as Shivalik-Gangetic Plain Landscape Complex, Uttarakhand, Uttar Pradesh, Bihar. Shivalik-Gangetic includes: Central India Landscape Complex, Andhra Pradesh (Including Telangana), Chhattisgarh, Madhya Pradesh, Maharashtra, Odisha, Rajasthan, Jharkhand, Central Indian, Western Ghats Landscape Complex, Karnataka, Kerala, Tamil Nadu, Goa. Western Ghats includes: North East Hills and Brahmaputra Flood Plains, Assam, Arunachal Pradesh, Mizoram, Northern West Bengal, North East Hills and Brahmaputra includes Sundarbans. NB: Ranipur (Uttar Pradesh) is added in Shivalik landscape for convenience. State population estimate does not add up to the landscape estimate due to common tigers, tiger outside protected areas, and model range limits.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset contains State wise Per Capita Availability of Power from Handbook of Statistics on Indian States
Note: 1. Per Capita Availability of Power is worked out based on Census Population and the population for Andhra Pradesh and Telangana drawn from both Governments’ portals for the years 2014-15 and 2018-19, respectively. 2. Combined figures for Dadra and Nagar Haveli and Daman and Diu are available from 2022-23 onwards.
This dataset is intended for researchers, students, and policy makers for reference and mapping purposes, and may be used for village level demographic analysis within basic applications to support graphical overlays and analysis with other spatial data.
An effective policy response to the economic impacts of the COVID-19 pandemic requires an enormous range of data to inform the design and response of programs. Public health measures require data on the spread of the disease, beliefs in the population, and capacity of the health system. Relief efforts depend on an understanding of hardships being faced by various segments of the population. Food policy requires measurement of agricultural production and hunger. In such a rapidly evolving pandemic, these data must be collected at a high frequency. Given the unexpected nature of the shock and urgency with which a response was required, Indian policymakers needed to formulate policies affecting India’s 1.4 billion people, without the detailed evidence required to construct effective programs. To help overcome this evidence gap, the World Bank, IDinsight, and the Development Data Lab sought to produce rigorous and responsive data for policymakers across six states in India: Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh.
Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, and Uttar Pradesh
Household
Sample survey data [ssd]
This dataset includes observations covering six states (Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh) and three survey rounds. The survey did not have a single, unified frame from which to sample phone numbers. The final sample was assembled from several different sample frames, and the choice of frame sample frames varied across states and survey rounds.
These frames comprise four prior IDinsight projects and from an impact evaluation of the National Rural Livelihoods project conducted by the Ministry of Rural Development. Each of these surveys sought to represent distinct populations, and employed idiosyncratic sample designs and weighting schemes.
A detailed note covering key features of each sample frame is available for download.
Computer Assisted Telephone Interview [cati]
The survey questionnaires covered the following subjects:
Agriculture: COVID-19-related changes in price realisation, acreage decisions, input expenditure, access to credit, access to fertilisers, etc.
Income and consumption: Changes in wage rates, employment duration, consumption expenditure, prices of essential commodities, status of food security etc.
Migration: Rates of in-migration, migrant income and employment status, return migration plans etc.
Access to relief: Access to in-kind, cash and workfare relief, quantities of relief received, and constraints on the access to relief.
Health: Access to health facilities and rates of foregone healthcare, knowledge of COVID-19 related symptoms and protective behaviours.
While a number of indicators were consistent across all three rounds, questions were added and removed as and when necessary to account for seasonal changes (i.e: in the agricultural cycle).
Round 1: ~55% Round 2: ~46% Round 3: ~55%
An effective policy response to the economic impacts of the COVID-19 pandemic requires an enormous range of data to inform the design and response of programs. Public health measures require data on the spread of the disease, beliefs in the population, and capacity of the health system. Relief efforts depend on an understanding of hardships being faced by various segments of the population. Food policy requires measurement of agricultural production and hunger. In such a rapidly evolving pandemic, these data must be collected at a high frequency. Given the unexpected nature of the shock and urgency with which a response was required, Indian policymakers needed to formulate policies affecting India's 1.4 billion people, without the detailed evidence required to construct effective programs. To help overcome this evidence gap, researchers from the World Bank, in collaboration with IDinsight, the Development Data Lab, and John Hopkins University sought to produce rigorous and responsive data for policymakers across six states in India: Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh.
Regional coverage
Households
Households located in Jharkhand, Rajasthan, Uttar Pradesh, Andhra Pradesh, Bihar, and Madhya Pradesh
Sample survey data [ssd]
This dataset includes observations covering six states (Andhra Pradesh, Bihar, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh) and three survey rounds. The survey did not have a single, unified frame from which to sample phone numbers. The final sample was assembled from several different sample frames, and the choice of frame sample frames varied across states and survey rounds.
These frames comprise four prior IDinsight projects and from an impact evaluation of the National Rural Livelihoods project conducted by the Ministry of Rural Development. Each of these surveys sought to represent distinct populations, and employed idiosyncratic sample designs and weighting schemes.
A detailed note covering key features of each sample frame is available for download.
Details will be made available after all rounds of data collection and analysis is complete.
Computer Assisted Telephone Interview [cati]
The survey questionnaires covered the following subjects:
Agriculture: COVID-19-related changes in price realisation, acreage decisions, input expenditure, access to credit, access to fertilisers, etc.
Income and consumption: Changes in wage rates, employment duration, consumption expenditure, prices of essential commodities, status of food security etc.
Migration: Rates of in-migration, migrant income and employment status, return migration plans etc.
Access to relief: Access to in-kind, cash and workfare relief, quantities of relief received, and constraints on the access to relief.
Health: Access to health facilities and rates of foregone healthcare, knowledge of COVID-19 related symptoms and protective behaviours.
While a number of indicators were consistent across all three rounds, questions were added and removed as and when necessary to account for seasonal changes (i.e: in the agricultural cycle).
The India COVID-19 surveys were conducted using Computer Assisted Telephone Interview (CATI) techniques. The household questionnaire was implemented using the CATI software, SurveyCTO. The software was deployed through surveyors’ smartphones, who called respondents via mobile, and recorded their responses over the phone. If unreached, surveyors would attempt to call back respondents up to 7 times, often seeking explicit appointments for suitable times to avoid non-responses.
Validation and consistency checks were incorporated into the SurveyCTO software to avoid human error. Extreme values and outliers were scrutinised through a real time dashboard set up by IDinsight. Surveys were also audio audited by monitors to check for consistency and accuracy of question phrasing and answer recording. Finally, supervisors also randomly back-checked a subset of interviews to further ensure data accuracy.
IDinsight cleaned and labelled the data for further processing and analysis. The Development Data Lab examined the data for discrepancies and errors and merged the dataset with their proprietary spatial data.
All personally identifiable information has been removed from the datasets.
Round 1: ~55% Round 2: ~46% Round 3: ~55%
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sign language is a cardinal element for communication between deaf and dumb community. Sign language has its own grammatical structure and gesticulation nature. Research on SLRT focuses a lot of attention in gesture identification. Sign language comprises of manual gestures performed by hand poses and non-manual features expressed through eye, mouth and gaze movements. The sentence-level completely labelled Indian Sign Language dataset for Sign Language Translation and Recognition (SLTR) research is developed. The ISL-CSLTR dataset assists the research community to explore intuitive insights and to build the SLTR framework for establishing communication with the deaf and dumb community using advanced deep learning and computer vision methods for SLTR purposes. This ISL-CSLTR dataset aims in contributing to the sentence level dataset created with two native signers from Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Andhra Pradesh, India and four student volunteers from SASTRA Deemed University, Thanjavur, Tamilnadu. The ISL-CSLTR corpus consists of a large vocabulary of 700 fully annotated videos, 18863 Sentence level frames, and 1036 word level images for 100 Spoken language Sentences performed by 7 different Signers. This corpus is arranged based on signer variants and time boundaries with fully annotated details and it is made available publicly. The main objective of creating this sentence level ISL-CSLRT corpus is to explore more research outcomes in the area of SLTR. This completely labelled video corpus assists the researchers to build framework for converting spoken language sentences into sign language and vice versa. This corpus has been created to address the various challenges faced by the researchers in SLRT and significantly improves translation and recognition performance. The videos are annotated with relevant spoken language sentences provide clear and easy understanding of the corpus data. Acknowledgements: The research was funded by the Science and Engineering Research Board (SERB), India under Start-up Research Grant (SRG)/2019–2021 (Grant no. SRG/2019/001338). And also, we thank all the signers for their contribution in collecting the sign videos and the successful completion of the ISL-CSLTR corpus. We would like to thank Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Andhra Pradesh, India for their support and contribution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The second National Family Health Survey (NFHS-2), conducted in 1998-99, provides information on fertility, mortality, family planning, and important aspects of nutrition, health, and health care. The International Institute for Population Sciences (IIPS) coordinated the survey, which collected information from a nationally representative sample of more than 90,000 ever-married women age 15-49. The NFHS-2 sample covers 99 percent of India's population living in all 26 states. This report is based on the survey data for 25 of the 26 states, however, since data collection in Tripura was delayed due to local problems in the state. IIPS also coordinated the first National Family Health Survey (NFHS-1) in 1992-93. Most of the types of information collected in NFHS-2 were also collected in the earlier survey, making it possible to identify trends over the intervening period of six and one-half years. In addition, the NFHS-2 questionnaire covered a number of new or expanded topics with important policy implications, such as reproductive health, women's autonomy, domestic violence, women's nutrition, anaemia, and salt iodization. The NFHS-2 survey was carried out in two phases. Ten states were surveyed in the first phase which began in November 1998 and the remaining states (except Tripura) were surveyed in the second phase which began in March 1999. The field staff collected information from 91,196 households in these 25 states and interviewed 89,199 eligible women in these households. In addition, the survey collected information on 32,393 children born in the three years preceding the survey. One health investigator on each survey team measured the height and weight of eligible women and children and took blood samples to assess the prevalence of anaemia. SUMMARY OF FINDINGS POPULATION CHARACTERISTICS Three-quarters (73 percent) of the population lives in rural areas. The age distribution is typical of populations that have recently experienced a fertility decline, with relatively low proportions in the younger and older age groups. Thirty-six percent of the population is below age 15, and 5 percent is age 65 and above. The sex ratio is 957 females for every 1,000 males in rural areas but only 928 females for every 1,000 males in urban areas, suggesting that more men than women have migrated to urban areas. The survey provides a variety of demographic and socioeconomic background information. In the country as a whole, 82 percent of household heads are Hindu, 12 percent are Muslim, 3 percent are Christian, and 2 percent are Sikh. Muslims live disproportionately in urban areas, where they comprise 15 percent of household heads. Nineteen percent of household heads belong to scheduled castes, 9 percent belong to scheduled tribes, and 32 percent belong to other backward classes (OBCs). Two-fifths of household heads do not belong to any of these groups. Questions about housing conditions and the standard of living of households indicate some improvements since the time of NFHS-1. Sixty percent of households in India now have electricity and 39 percent have piped drinking water compared with 51 percent and 33 percent, respectively, at the time of NFHS-1. Sixty-four percent of households have no toilet facility compared with 70 percent at the time of NFHS-1. About three-fourths (75 percent) of males and half (51 percent) of females age six and above are literate, an increase of 6-8 percentage points from literacy rates at the time of NFHS-1. The percentage of illiterate males varies from 6-7 percent in Mizoram and Kerala to 37 percent in Bihar and the percentage of illiterate females varies from 11 percent in Mizoram and 15 percent in Kerala to 65 percent in Bihar. Seventy-nine percent of children age 6-14 are attending school, up from 68 percent in NFHS-1. The proportion of children attending school has increased for all ages, particularly for girls, but girls continue to lag behind boys in school attendance. Moreover, the disparity in school attendance by sex grows with increasing age of children. At age 6-10, 85 percent of boys attend school compared with 78 percent of girls. By age 15-17, 58 percent of boys attend school compared with 40 percent of girls. The percentage of girls 6-17 attending school varies from 51 percent in Bihar and 56 percent in Rajasthan to over 90 percent in Himachal Pradesh and Kerala. Women in India tend to marry at an early age. Thirty-four percent of women age 15-19 are already married including 4 percent who are married but gauna has yet to be performed. These proportions are even higher in the rural areas. Older women are more likely than younger women to have married at an early age: 39 percent of women currently age 45-49 married before age 15 compared with 14 percent of women currently age 15-19. Although this indicates that the proportion of women who marry young is declining rapidly, half the women even in the age group 20-24 have married before reaching the legal minimum age of 18 years. On average, women are five years younger than the men they marry. The median age at marriage varies from about 15 years in Madhya Pradesh, Bihar, Uttar Pradesh, Rajasthan, and Andhra Pradesh to 23 years in Goa. As part of an increasing emphasis on gender issues, NFHS-2 asked women about their participation in household decisionmaking. In India, 91 percent of women are involved in decision-making on at least one of four selected topics. A much lower proportion (52 percent), however, are involved in making decisions about their own health care. There are large variations among states in India with regard to women's involvement in household decisionmaking. More than three out of four women are involved in decisions about their own health care in Himachal Pradesh, Meghalaya, and Punjab compared with about two out of five or less in Madhya Pradesh, Orissa, and Rajasthan. Thirty-nine percent of women do work other than housework, and more than two-thirds of these women work for cash. Only 41 percent of women who earn cash can decide independently how to spend the money that they earn. Forty-three percent of working women report that their earnings constitute at least half of total family earnings, including 18 percent who report that the family is entirely dependent on their earnings. Women's work-participation rates vary from 9 percent in Punjab and 13 percent in Haryana to 60-70 percent in Manipur, Nagaland, and Arunachal Pradesh. FERTILITY AND FAMILY PLANNING Fertility continues to decline in India. At current fertility levels, women will have an average of 2.9 children each throughout their childbearing years. The total fertility rate (TFR) is down from 3.4 children per woman at the time of NFHS-1, but is still well above the replacement level of just over two children per woman. There are large variations in fertility among the states in India. Goa and Kerala have attained below replacement level fertility and Karnataka, Himachal Pradesh, Tamil Nadu, and Punjab are at or close to replacement level fertility. By contrast, fertility is 3.3 or more children per woman in Meghalaya, Uttar Pradesh, Rajasthan, Nagaland, Bihar, and Madhya Pradesh. More than one-third to less than half of all births in these latter states are fourth or higher-order births compared with 7-9 percent of births in Kerala, Goa, and Tamil Nadu. Efforts to encourage the trend towards lower fertility might usefully focus on groups within the population that have higher fertility than average. In India, rural women and women from scheduled tribes and scheduled castes have somewhat higher fertility than other women, but fertility is particularly high for illiterate women, poor women, and Muslim women. Another striking feature is the high level of childbearing among young women. More than half of women age 20-49 had their first birth before reaching age 20, and women age 15-19 account for almost one-fifth of total fertility. Studies in India and elsewhere have shown that health and mortality risks increase when women give birth at such young ages?both for the women themselves and for their children. Family planning programmes focusing on women in this age group could make a significant impact on maternal and child health and help to reduce fertility. INFANT AND CHILD MORTALITY NFHS-2 provides estimates of infant and child mortality and examines factors associated with the survival of young children. During the five years preceding the survey, the infant mortality rate was 68 deaths at age 0-11 months per 1,000 live births, substantially lower than 79 per 1,000 in the five years preceding the NFHS-1 survey. The child mortality rate, 29 deaths at age 1-4 years per 1,000 children reaching age one, also declined from the corresponding rate of 33 per 1,000 in NFHS-1. Ninety-five children out of 1,000 born do not live to age five years. Expressed differently, 1 in 15 children die in the first year of life, and 1 in 11 die before reaching age five. Child-survival programmes might usefully focus on specific groups of children with particularly high infant and child mortality rates, such as children who live in rural areas, children whose mothers are illiterate, children belonging to scheduled castes or scheduled tribes, and children from poor households. Infant mortality rates are more than two and one-half times as high for women who did not receive any of the recommended types of maternity related medical care than for mothers who did receive all recommended types of care. HEALTH, HEALTH CARE, AND NUTRITION Promotion of maternal and child health has been one of the most important components of the Family Welfare Programme of the Government of India. One goal is for each pregnant woman to receive at least three antenatal check-ups plus two tetanus toxoid injections and a full course of iron and folic acid supplementation. In India, mothers of 65 percent of the children born in the three years preceding NFHS-2 received at least one antenatal
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Telugu Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Telugu speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native Telugu speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Telugu Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of Telugu language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.
This dataset includes over 6,000 high-quality scripted audio prompts recorded in Telugu, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.
The prompts span a broad range of healthcare-specific interactions, such as:
To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:
These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.
Every audio recording is accompanied by a verbatim, manually verified transcription.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cardiovascular disease (CVD) risk in India is currently assessed using the World Health Organization/International Society for Hypertension (WHO/ISH) risk prediction charts since no population-specific models exist. The WHO/ISH risk prediction charts have two versions—one with total cholesterol as a predictor (the high information (HI) model) and the other without (the low information (LI) model). However, information on the WHO/ISH risk prediction charts including guidance on which version to use and when, as well as relative performance of the LI and HI models, is limited. This article aims to, firstly, quantify the relative performance of the LI and HI WHO/ISH risk prediction (for WHO-South East Asian Region D) using data from rural India. Secondly, we propose a pre-screening (simplified) point-of-care (POC) test to identify patients who are likely to benefit from a total cholesterol (TC) test, and subsequently when the LI model is preferential to HI model. Analysis was performed using cross-sectional data from rural Andhra Pradesh collected in 2005 with recorded blood cholesterol measurements (N = 1066). CVD risk was computed using both LI and HI models, and high risk individuals who needed treatment(THR) were subsequently identified based on clinical guidelines. Model development for the POC assessment of a TC test was performed through three machine learning techniques: Support Vector Machine (SVM), Regularised Logistic Regression (RLR), and Random Forests (RF) along with a feature selection process. Disagreement in CVD risk predicted by LI and HI WHO/ISH models was 14.5% (n = 155; p
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood. The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country. Further information about the survey, including publications, can be downloaded from the Young Lives website.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Telugu Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Telugu-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native Telugu speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Family Health Surveys (NFHS) programme, initiated in the early 1990s, has emerged as a nationally important source of data on population, health, and nutrition for India and its states. The 2005-06 National Family Health Survey (NFHS-3), the third in the series of these national surveys, was preceded by NFHS-1 in 1992-93 and NFHS-2 in 1998-99. Like NFHS-1 and NFHS-2, NFHS-3 was designed to provide estimates of important indicators on family welfare, maternal and child health, and nutrition. In addition, NFHS-3 provides information on several new and emerging issues, including family life education, safe injections, perinatal mortality, adolescent reproductive health, high-risk sexual behaviour, tuberculosis, and malaria. Further, unlike the earlier surveys in which only ever-married women age 15-49 were eligible for individual interviews, NFHS-3 interviewed all women age 15-49 and all men age 15-54. Information on nutritional status, including the prevalence of anaemia, is provided in NFHS3 for women age 15-49, men age 15-54, and young children. A special feature of NFHS-3 is the inclusion of testing of the adult population for HIV. NFHS-3 is the first nationwide community-based survey in India to provide an estimate of HIV prevalence in the general population. Specifically, NFHS-3 provides estimates of HIV prevalence among women age 15-49 and men age 15-54 for all of India, and separately for Uttar Pradesh and for Andhra Pradesh, Karnataka, Maharashtra, Manipur, and Tamil Nadu, five out of the six states classified by the National AIDS Control Organization (NACO) as high HIV prevalence states. No estimate of HIV prevalence is being provided for Nagaland, the sixth high HIV prevalence state, due to strong local opposition to the collection of blood samples. NFHS-3 covered all 29 states in India, which comprise more than 99 percent of India's population. NFHS-3 is designed to provide estimates of key indicators for India as a whole and, with the exception of HIV prevalence, for all 29 states by urban-rural residence. Additionally, NFHS-3 provides estimates for the slum and non-slum populations of eight cities, namely Chennai, Delhi, Hyderabad, Indore, Kolkata, Meerut, Mumbai, and Nagpur. NFHS-3 was conducted under the stewardship of the Ministry of Health and Family Welfare (MOHFW), Government of India, and is the result of the collaborative efforts of a large number of organizations. The International Institute for Population Sciences (IIPS), Mumbai, was designated by MOHFW as the nodal agency for the project. Funding for NFHS-3 was provided by the United States Agency for International Development (USAID), DFID, the Bill and Melinda Gates Foundation, UNICEF, UNFPA, and MOHFW. Macro International, USA, provided technical assistance at all stages of the NFHS-3 project. NACO and the National AIDS Research Institute (NARI) provided technical assistance for the HIV component of NFHS-3. Eighteen Research Organizations, including six Population Research Centres, shouldered the responsibility of conducting the survey in the different states of India and producing electronic data files. The survey used a uniform sample design, questionnaires (translated into 18 Indian languages), field procedures, and procedures for biomarker measurements throughout the country to facilitate comparability across the states and to ensure the highest possible data quality. The contents of the questionnaires were decided through an extensive collaborative process in early 2005. Based on provisional data, two national-level fact sheets and 29 state fact sheets that provide estimates of more than 50 key indicators of population, health, family welfare, and nutrition have already been released. The basic objective of releasing fact sheets within a very short period after the completion of data collection was to provide immediate feedback to planners and programme managers on key process indicators.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood. The Young Lives study aims to track the lives of 12,000 children over a 15-year period, surveyed once every 3-4 years. Round 1 of Young Lives surveyed two groups of children in each country, at 1 year old and 5 years old. Round 2 returned to the same children who were then aged 5 and 12 years old. Round 3 surveyed the same children again at aged 7-8 years and 14-15 years, and Round 4 surveyed them at 12 and 19 years old. Thus the younger children are being tracked from infancy to their mid-teens and the older children through into adulthood, when some will become parents themselves. The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country. Further information about the survey, including publications, can be downloaded from the a href="http://www.younglives.org.uk/content/school-survey-0" title="School Survey" School Survey /a webpages.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/8EGMOThttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/8EGMOT
The Scaling up experiential learning tools for sustainable water governance project aims to enhance the capacity of Indian communities to sustainably manage water resources. The intervention combined collective action games, participatory planning tools, and community debriefings to promote behavioral shifts toward sustainable groundwater and surface water management. These tools are designed to support informed decision-making, foster collective action, and strengthen governance of water as a common resource. The project included a mixed-methods impact evaluation. The study took place in 4 districts across 3 Indian states: Chittoor and Anantpur (Andhra Pradesh), Bhilwara (Rajasthan), and Chikbalapur (Karnataka). Data collection took place over two rounds. The baseline survey was conducted between October 2021 and May 2022, followed by the intervention. The endline survey was implemented from January to June 2023. The data available here are from both survey rounds, which included individual surveys, focus group discussions (FGDs), and key informant interviews (KIIs) across treatment and control sites, with baseline and endline results included in the same datasets. Within each survey type, multiple datasets are available and are organized according to the structure of the corresponding survey modules. Some datasets are at the individual or household-member level — for example, roster datasets that include information on all household members, not just the primary respondent. Others, such as the crop and water modules, are organized at the level of specific activities or resources, capturing details on each crop grown or water source used within a household. All datasets include a variable "unique_ID" which relates to the "habitation," (a sub-village administrative division) where that obervation was collected, and a "TreatmentControl" variable which denotes whether or not that observation belonged to the treated group or the control group (note: treatment is assigned at the habitation level). Additionally, the individual surveys include an "individual_id" variable, corresponding to the individual respondent.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description of the Survey Data The dataset contains responses to a "Road Safety & Traffic Rules Awareness Survey", capturing people's opinions on traffic rules, driving habits, and road safety concerns. 1. Demographic Information: • Age group (10-20 years, 30-40 years, >40 years) • Gender (Male/Female) • Location (Andhra Pradesh, Kerala, Tamil Nadu, etc.) 2. Awareness & Behavior Towards Traffic Rules: • Importance of traffic rules (Essential, Somewhat important, Not important) • Seatbelt/helmet usage (Always, Sometimes, Rarely, Never) • Leading causes of road accidents (Over-speeding, Drunk Driving, Poor Road Conditions, etc.) • Observing rule violations (Rarely, Daily, Often, Always) • Experience with accidents (Witnessed/Involved/Never) 3. Driving Habits & Safety Measures: • Following speed limits and lane discipline • Measures for pedestrian safety (Awareness programs, More crossings, Stricter enforcement) • Primary mode of transportation (Public transport, Walking/Cycling, etc.) This survey data can help in analyzing people's awareness of road safety and designing better traffic policies.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Telugu Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Telugu speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Telugu, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is ideal
The Young Lives survey is an innovative long-term project investigating the changing nature of childhood poverty in four developing countries. The purpose of the project is to improve understanding of the causes and consequences of childhood poverty and examine how policies affect children's well-being, in order to inform the development of future policy and to target child welfare interventions more effectively. The study is being conducted in Ethiopia, India (in Andhra Pradesh), Peru and Vietnam. These countries were selected because they reflect a range of cultural, geographical and social contexts and experience differing issues facing the developing world; high debt burden, emergence from conflict, and vulnerability to environmental conditions such as drought and flood.
The Young Lives study aims to track the lives of 12,000 children over a 15-year period, surveyed once every 3-4 years. Round 1 of Young Lives surveyed two groups of children in each country, at 1 year old and 5 years old. Round 2 returned to the same children who were then aged 5 and 12 years old. Round 3 surveyed the same children again at aged 7-8 years and 14-15 years, and Round 4 surveyed them at 12 and 19 years old. Thus the younger children are being tracked from infancy to their mid-teens and the older children through into adulthood, when some will become parents themselves.
The survey consists of three main elements: a child questionnaire, a household questionnaire and a community questionnaire. The household data gathered is similar to other cross-sectional datasets (such as the World Bank's Living Standards Measurement Study). It covers a range of topics such as household composition, livelihood and assets, household expenditure, child health and access to basic services, and education. This is supplemented with additional questions that cover caregiver perceptions, attitudes, and aspirations for their child and the family. Young Lives also collects detailed time-use data for all family members, information about the child's weight and height (and that of caregivers), and tests the children for school outcomes (language comprehension and mathematics). An important element of the survey asks the children about their daily activities, their experiences and attitudes to work and school, their likes and dislikes, how they feel they are treated by other people, and their hopes and aspirations for the future. The community questionnaire provides background information about the social, economic and environmental context of each community. It covers topics such as ethnicity, religion, economic activity and employment, infrastructure and services, political representation and community networks, crime and environmental changes. The Young Lives survey is carried out by teams of local researchers, supported by the Principal Investigator and Data Manager in each country.
Further information about the survey, including publications, can be downloaded from the Young Lives website.
School surveys were introduced into Young Lives in 2010 in order to capture detailed information about children's experiences of schooling, and to improve our understanding of: - the relationships between learning outcomes, and children's home backgrounds, gender, work, schools, teachers and class and school peer-groups. - school effectiveness, by analysing factors explaining the development of cognitive and non-cognitive skills in school, including value-added analysis of schooling and comparative analysis of school-systems. - equity issues (including gender) in relation to learning outcomes and the evolution of inequalities within education
The survey allows us to link longitudinal information on household and child characteristics from the household survey with data on the schools attended by the Young Lives children and children's achievements inside and outside the school. It provides policy-relevant information on the relationship between child development (and its determinants) and children's experience of school, including access, quality and progression. This combination of household, child and school-level data over time constitutes the comparative advantage of Young Lives. Findings are all available on our Education theme pages and our publications page. Further information is available from the Young Lives http://www.younglives.org.uk/content/school-survey-0" title="School Survey">School Survey webpages.
Lao Cai Hung Yen Danang Phu Yen Ben Tre
Individuals Institutions/organisations
Sample survey data [ssd]
Multi-stage stratified random sample The final sample is formed of 3,284 Grade 5 pupils in 176 classes in 92 school sites (both main and satellite sites); 1,138 of these pupils are Young Lives index children.
Face-to-face interview; Self-completion; Educational measurements; Observation
The instruments included in the survey are:
Questionnaires - Wave 1
Questionnaires - Wave 2
Child class and peers questionnaire Child Maths test Child language test (Vietnamese)
Survey documentation and questionnaires will be provided shortly at http://www.younglives.org.uk/content/vietnam-school-survey
This web layer contains data of state level flood damage in India (2016 - 2018) and contains information about area affected (Mha) in 2016, population affected (Million) in 2016, area wise (Mha) damages to Crops in 2016, value wise (Rs. Crore) damages to Crops in 2016 etc.Floods in IndiaFloods are recurrent phenomena in India. Due to different climatic and rainfall patterns in different regions, it has been the experience that, while some parts are suffering devastating floods, another part is suffering drought at the same time. With the increase in population and development activity, there has been a tendency to occupy the floodplains, which has resulted in damage of a more serious nature over the years. Often, because of the varying rainfall distribution, areas which are not traditionally prone to floods also experience severe inundation. Thus, floods are the single most frequent disaster faced by the country.Flooding is caused by the inadequate capacity within the banks of the rivers to contain the high flows brought down from the upper catchments due to heavy rainfall. Flooding is accentuated by erosion and silting of the riverbeds, resulting in a reduction of the carrying capacity of river channels; earthquakes and landslides leading to changes in river courses and obstructions to flow; synchronization of floods in the main and tributary rivers; retardation due to tidal effects; encroachment of floodplains; and haphazard and unplanned growth of urban areas. Some parts of the country, mainly coastal areas of Andhra Pradesh, Orissa, Tamil Nadu and West Bengal, experience cyclones, which are often accompanied by heavy rainfall leading to flooding.Flood report2016 Assam floods: Heavy rains in July–August resulted in floods affecting 1.8 million people and flooding the Kaziranga National Park killing around 200 wild animals. 2017 Gujarat flood: Following heavy rain in July 2017, Gujarat state of India was affected by the severe flood resulting in more than 200 deaths. August 2018 Kerala Flood: Following high rain in late August 2018 and heavy Monsoon rainfall from August 8, 2018, severe flooding affected the Indian state of Kerala resulting over 445 deaths.The attributes are given below for this web map:Area Affected (Mha) in 2016Population Affected (Million) in 2016Area Wise (Mha) Damages to Crops in 2016Value Wise (Rs. Crore) Damages to Crops in 2016No. of Houses Damaged in 2016Value (Rs. Crore) of Houses Damaged in 2016No. of Cattle Lost in 2016No. of Human Lives Lost in 2016Damage to Public Utilities (Rs. Crore) in 2016Total Damages - Crops, Houses, & Public Utilities (Rs. Crore) in 2016Area Affected (Mha) in 2017Population Affected (Million) in 2017Area Wise (Mha) Damages to Crops in 2017Value Wise (Rs. Crore) Damages to Crops in 2017No. of Houses Damaged in 2017Value (Rs. Crore) of Houses Damaged in 2017No. of Cattle Lost in 2017No. of Human Lives Lost in 2017Damage to Public Utilities (Rs. Crore) in 2017Total Damages - Crops, Houses & Public Utilities (Rs. Crore) in 2017Area Affected (Mha) in 2018Population Affected (Million) in 2018Area Wise (Mha) Damages to Crops in 2018Value Wise (Rs. Crore) Damages to Crops in 2018No. of Houses Damaged in 2018Value (Rs. Crore) of Houses Damaged in 2018No. of Cattle Lost in 2018No. of Human Lives Lost in 2018Damage to Public Utilities (Rs. Crore) in 2018Total Damages - Crops, Houses, & Public Utilities (Rs. Crore) in 2018This web layer is offered by Esri India, for ArcGIS Online subscribers. If you have any questions or comments, please let us know via content@esri.in.
This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.
Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.
Also thanks to Unsplash for the cover pic!
A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!
Dataset owned by the Government of Andhra Pradesh but released freely on official website.