52 datasets found

i
Global Financial Inclusion (Global Findex) Database 2021 - India
catalog.ihsn.org
microdata.worldbank.org
Updated Dec 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Research Group, Finance and Private Sector Development Unit (2022). Global Financial Inclusion (Global Findex) Database 2021 - India [Dataset]. https://catalog.ihsn.org/catalog/10452
Explore at:
Dataset updated
Dec 16, 2022
Dataset authored and provided by
Development Research Group, Finance and Private Sector Development Unit
Time period covered
2021
Area covered
India
Description
Abstract

The fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.

The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.

The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.

Geographic coverage

Excluded populations living in Northeast states and remote islands and Jammu and Kashmir. The excluded areas represent less than 10 percent of the total population.

Analysis unit

Individual

Kind of data

Observation data/ratings [obs]

Sampling procedure

In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.

In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.

In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.

The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).

For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.

Sample size for India is 3000.

Mode of data collection

Face-to-face [f2f]

Research instrument

Questionnaires are available on the website.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.
H
India - Age and sex structures
data.humdata.org
geotiff
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WorldPop (2025). India - Age and sex structures [Dataset]. https://data.humdata.org/dataset/worldpop-age-and-sex-structures-for-india
Explore at:
geotiffAvailable download formats
Dataset updated
Mar 14, 2025
Dataset provided by
WorldPop
Area covered
India
Description
WorldPop produces different types of gridded population count datasets, depending on the methods used and end application. Please make sure you have read our Mapping Populations overview page before choosing and downloading a dataset.

A description of the modelling methods used for age and sex structures can be found in "https://pophealthmetrics.biomedcentral.com/articles/10.1186/1478-7954-11-11" target="_blank"> Tatem et al and Pezzulo et al. Details of the input population count datasets used can be found here, and age/sex structure proportion datasets here.
Both top-down 'unconstrained' and 'constrained' versions of the datasets are available, and the differences between the two methods are outlined here. The datasets represent the outputs from a project focused on construction of consistent 100m resolution population count datasets for all countries of the World structured by male/female and 5-year age classes (plus a <1 year class). These efforts necessarily involved some shortcuts for consistency. The unconstrained datasets are available for each year from 2000 to 2020.
The constrained datasets are only available for 2020 at present, given the time periods represented by the building footprint and built settlement datasets used in the mapping.
Data for earlier dates is available directly from WorldPop.

WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00646
The ORBIT (Object Recognition for Blind Image Training)-India Dataset
zenodo.org
data.niaid.nih.gov
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones (2025). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. http://doi.org/10.5281/zenodo.12608444
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.12608444
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

REFERENCES:

Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
N
Earth, TX Population Breakdown By Race (Excluding Ethnicity) Dataset:...
neilsberg.com
csv, json
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Earth, TX Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/756d2330-ef82-11ef-9e71-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Texas, Earth
Variables measured
Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Earth by race. It includes the population of Earth across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Earth across relevant racial categories.

Key observations

The percent distribution of Earth population by race (across all racial categories recognized by the U.S. Census Bureau): 60.83% are white, 3.52% are Black or African American, 4.59% are American Indian and Alaska Native, 2.77% are some other race and 28.28% are multiracial.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race: This column displays the racial categories (excluding ethnicity) for the Earth

Population: The population of the racial category (excluding ethnicity) in the Earth is shown in this column.

% of Total Population: This column displays the percentage distribution of each race as a proportion of Earth total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Earth Population by Race & Ethnicity. You can refer the same here
w
Global Financial Inclusion (Global Findex) Database 2017 - India
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Oct 31, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Research Group, Finance and Private Sector Development Unit (2018). Global Financial Inclusion (Global Findex) Database 2017 - India [Dataset]. https://microdata.worldbank.org/index.php/catalog/3362
Explore at:
Dataset updated
Oct 31, 2018
Dataset authored and provided by
Development Research Group, Finance and Private Sector Development Unit
Time period covered
2017
Area covered
India
Description
Abstract

Financial inclusion is critical in reducing poverty and achieving inclusive economic growth. When people can participate in the financial system, they are better able to start and expand businesses, invest in their children’s education, and absorb financial shocks. Yet prior to 2011, little was known about the extent of financial inclusion and the degree to which such groups as the poor, women, and rural residents were excluded from formal financial systems.

By collecting detailed indicators about how adults around the world manage their day-to-day finances, the Global Findex allows policy makers, researchers, businesses, and development practitioners to track how the use of financial services has changed over time. The database can also be used to identify gaps in access to the formal financial system and design policies to expand financial inclusion.

Geographic coverage

Sample excludes Northeast states and remote islands, representing less than 10% of the population.

Analysis unit

Individuals

Universe

The target population is the civilian, non-institutionalized population 15 years and above.

Kind of data

Observation data/ratings [obs]

Sampling procedure

The indicators in the 2017 Global Findex database are drawn from survey data covering almost 150,000 people in 144 economies-representing more than 97 percent of the world’s population (see table A.1 of the Global Findex Database 2017 Report for a list of the economies included). The survey was carried out over the 2017 calendar year by Gallup, Inc., as part of its Gallup World Poll, which since 2005 has annually conducted surveys of approximately 1,000 people in each of more than 160 economies and in over 150 languages, using randomly selected, nationally representative samples. The target population is the entire civilian, noninstitutionalized population age 15 and above. Interview procedure Surveys are conducted face to face in economies where telephone coverage represents less than 80 percent of the population or where this is the customary methodology. In most economies the fieldwork is completed in two to four weeks.

In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used.

Respondents are randomly selected within the selected households. Each eligible household member is listed and the handheld survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer’s gender.

In economies where telephone interviewing is employed, random digit dialing or a nationally representative list of phone numbers is used. In most economies where cell phone penetration is high, a dual sampling frame is used. Random selection of respondents is achieved by using either the latest birthday or household enumeration method. At least three attempts are made to reach a person in each household, spread over different days and times of day.

The sample size was 3000.

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup Inc. also provided valuable input. The questionnaire was piloted in multiple countries, using focus groups, cognitive interviews, and field testing. The questionnaire is available in more than 140 languages upon request.

Questions on cash on delivery, saving using an informal savings club or person outside the family, domestic remittances, and agricultural payments are only asked in developing economies and few other selected countries. The question on mobile money accounts was only asked in economies that were part of the Mobile Money for the Unbanked (MMU) database of the GSMA at the time the interviews were being held.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar, and Jake Hess. 2018. The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech Revolution. Washington, DC: World Bank
Geolocations of Indian Cities
kaggle.com
Updated Oct 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaitanya (2020). Geolocations of Indian Cities [Dataset]. https://www.kaggle.com/crbelhekar619/geolocations-of-indian-cities/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chaitanya
Area covered
India
Description
Content

The data set contains geolocations of all the cities in India with a population of more than 1000.

There are total 10 columns in the dataset. Geoname - Unique Geo-ID for the city Name - Name of the city ACSII Name - ASCII name of the city for interpretability Alternate Names - Alternate names for the city Latitude - Latitude of the city Longitude - Longitude of the city Population - Population of the city Digital Elevation Model - Digital elevation of the city Country - Country of the city Coordinates - Coordinates of the city

Acknowledgements

The data set is contributed by opendatasoft Data Network
F
Indian English General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Indian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Indian English speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple English speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Indian English.

•
Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
Total population of India 2029
statista.com
Updated Nov 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Total population of India 2029 [Dataset]. https://www.statista.com/statistics/263766/total-population-of-india/
Explore at:
Dataset updated
Nov 18, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
India
Description
The statistic shows the total population of India from 2019 to 2029. In 2023, the estimated total population in India amounted to approximately 1.43 billion people.

Total population in India

India currently has the second-largest population in the world and is projected to overtake top-ranking China within forty years. Its residents comprise more than one-seventh of the entire world’s population, and despite a slowly decreasing fertility rate (which still exceeds the replacement rate and keeps the median age of the population relatively low), an increasing life expectancy adds to an expanding population. In comparison with other countries whose populations are decreasing, such as Japan, India has a relatively small share of aged population, which indicates the probability of lower death rates and higher retention of the existing population.

With a land mass of less than half that of the United States and a population almost four times greater, India has recognized potential problems of its growing population. Government attempts to implement family planning programs have achieved varying degrees of success. Initiatives such as sterilization programs in the 1970s have been blamed for creating general antipathy to family planning, but the combined efforts of various family planning and contraception programs have helped halve fertility rates since the 1960s. The population growth rate has correspondingly shrunk as well, but has not yet reached less than one percent growth per year.

As home to thousands of ethnic groups, hundreds of languages, and numerous religions, a cohesive and broadly-supported effort to reduce population growth is difficult to create. Despite that, India is one country to watch in coming years. It is also a growing economic power; among other measures, its GDP per capita was expected to triple between 2003 and 2013 and was listed as the third-ranked country for its share of the global gross domestic product.
India Proportion of People Living Below 50 Percent Of Median Income: %
ceicdata.com
Updated Mar 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2017). India Proportion of People Living Below 50 Percent Of Median Income: % [Dataset]. https://www.ceicdata.com/en/india/social-poverty-and-inequality/proportion-of-people-living-below-50-percent-of-median-income-
Explore at:
Dataset updated
Mar 15, 2017
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 1987 - Dec 1, 2021
Area covered
India
Description
India Proportion of People Living Below 50 Percent Of Median Income: % data was reported at 9.800 % in 2021. This records a decrease from the previous number of 10.000 % for 2020. India Proportion of People Living Below 50 Percent Of Median Income: % data is updated yearly, averaging 6.200 % from Dec 1977 (Median) to 2021, with 14 observations. The data reached an all-time high of 10.300 % in 2019 and a record low of 5.100 % in 2004. India Proportion of People Living Below 50 Percent Of Median Income: % data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Social: Poverty and Inequality. The percentage of people in the population who live in households whose per capita income or consumption is below half of the median income or consumption per capita. The median is measured at 2017 Purchasing Power Parity (PPP) using the Poverty and Inequality Platform (http://www.pip.worldbank.org). For some countries, medians are not reported due to grouped and/or confidential data. The reference year is the year in which the underlying household survey data was collected. In cases for which the data collection period bridged two calendar years, the first year in which data were collected is reported.;World Bank, Poverty and Inequality Platform. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. Data for high-income economies are mostly from the Luxembourg Income Study database. For more information and methodology, please see http://pip.worldbank.org.;;The World Bank’s internationally comparable poverty monitoring database now draws on income or detailed consumption data from more than 2000 household surveys across 169 countries. See the Poverty and Inequality Platform (PIP) for details (www.pip.worldbank.org).
Waste Management and Recycling in Indian Cities
kaggle.com
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krishna Yadu (2024). Waste Management and Recycling in Indian Cities [Dataset]. http://doi.org/10.34740/kaggle/dsv/10203312
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10203312
Dataset updated
Dec 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Krishna Yadu
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About the Dataset: Waste Management and Recycling in India

Overview:

This dataset provides comprehensive information on waste management and recycling practices in various cities across India. It includes key data related to waste generation, recycling rates, population density, municipal efficiency, landfill details, and more. The data spans multiple years (2019–2023) and covers a range of waste types, including plastic, organic waste, electronic waste (e-waste), construction waste, and hazardous waste.

Purpose:

The dataset aims to: - Promote efficient waste management practices across Indian cities. - Analyze trends in recycling and waste disposal methods. - Provide insights for improving municipal management systems. - Support research and development in sustainability, environmental science, and urban planning.

Columns:

City/District: The name of the Indian city or district.

Waste Type: Type of waste generated, e.g., Plastic, Organic, E-Waste, Construction, Hazardous.

Waste Generated (Tons/Day): Amount of waste generated in tons per day.

Recycling Rate (%): The percentage of waste that is recycled.

Population Density (People/km²): The number of people per square kilometer in the city.

Municipal Efficiency Score (1-10): A score indicating how effectively the municipality manages waste (e.g., waste segregation, collection, disposal).

Disposal Method: The method used for waste disposal (e.g., Landfill, Recycling, Incineration, Composting).

Cost of Waste Management (₹/Ton): The cost of managing one ton of waste in Indian Rupees.

Awareness Campaigns Count: The number of awareness campaigns organized by the municipality in that year related to waste management.

Landfill Name: The name of the landfill site used by the city.

Landfill Location (Lat, Long): The geographical location (latitude and longitude) of the landfill.

Landfill Capacity (Tons): The total waste capacity (in tons) that the landfill can hold.

Year: The year of the data entry, ranging from 2019 to 2023.

Applications:

Urban Planning: The dataset can be used to analyze and optimize waste management infrastructure in urban areas.

Sustainability Research: It can help in studying the progress of recycling and waste reduction strategies.

Policy Making: Government bodies can use this data to craft policies aimed at improving waste management and recycling rates.

Machine Learning/AI: The dataset can be used to build models for predicting waste generation trends, recycling outcomes, and municipal efficiency.

Sources:

The data is simulated for this dataset based on average waste management practices observed in Indian cities.

Real-world data could come from municipal corporations, environmental agencies, and government reports on waste management.
I
India Census: Population: by Religion: Muslim: Urban
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, India Census: Population: by Religion: Muslim: Urban [Dataset]. https://www.ceicdata.com/en/india/census-population-by-religion/census-population-by-religion-muslim-urban
Explore at:
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 1, 2001 - Mar 1, 2011
Area covered
India
Variables measured
Population
Description
India Census: Population: by Religion: Muslim: Urban data was reported at 68,740,419.000 Person in 2011. This records an increase from the previous number of 49,393,496.000 Person for 2001. India Census: Population: by Religion: Muslim: Urban data is updated yearly, averaging 59,066,957.500 Person from Mar 2001 (Median) to 2011, with 2 observations. The data reached an all-time high of 68,740,419.000 Person in 2011 and a record low of 49,393,496.000 Person in 2001. India Census: Population: by Religion: Muslim: Urban data remains active status in CEIC and is reported by Census of India. The data is categorized under India Premium Database’s Demographic – Table IN.GAE001: Census: Population: by Religion.
N
Globe, AZ Population Breakdown By Race (Excluding Ethnicity) Dataset:...
neilsberg.com
csv, json
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Globe, AZ Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/7573e287-ef82-11ef-9e71-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Arizona, Globe
Variables measured
Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Globe by race. It includes the population of Globe across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Globe across relevant racial categories.

Key observations

The percent distribution of Globe population by race (across all racial categories recognized by the U.S. Census Bureau): 58.09% are white, 2.70% are Black or African American, 5.26% are American Indian and Alaska Native, 2.92% are Asian, 0.12% are Native Hawaiian and other Pacific Islander, 11.37% are some other race and 19.54% are multiracial.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race: This column displays the racial categories (excluding ethnicity) for the Globe

Population: The population of the racial category (excluding ethnicity) in the Globe is shown in this column.

% of Total Population: This column displays the percentage distribution of each race as a proportion of Globe total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Globe Population by Race & Ethnicity. You can refer the same here
COVID-19 Vaccine Progress Dashboard Data
data.chhs.ca.gov
data.ca.gov
+5more
csv, xlsx, zip
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data [Dataset]. https://data.chhs.ca.gov/dataset/vaccine-progress-dashboard
Explore at:
csv(18403068), csv(110928434), xlsx(11534), csv(111682), csv(148732), csv(303068812), xlsx(11249), xlsx(11870), xlsx(7708), csv(188895), csv(638738), csv(503270), xlsx(11731), csv(2641927), csv(12877811), csv(83128924), csv(54906), csv(26828), csv(7777694), csv(82754), csv(724860), csv(675610), csv(2447143), csv(6772350), zipAvailable download formats
Dataset updated
Jul 31, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

On 6/16/2023 CDPH replaced the booster measures with a new “Up to Date” measure based on CDC’s new recommendations, replacing the primary series, boosted, and bivalent booster metrics The definition of “primary series complete” has not changed and is based on previous recommendations that CDC has since simplified. A person cannot complete their primary series with a single dose of an updated vaccine. Whereas the booster measures were calculated using the eligible population as the denominator, the new up to date measure uses the total estimated population. Please note that the rates for some groups may change since the up to date measure is calculated differently than the previous booster and bivalent measures.

This data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/ which summarizes vaccination data at the county level by county of residence. Where county of residence was not reported in a vaccination record, the county of provider that vaccinated the resident is included. This applies to less than 1% of vaccination records. The sum of county-level vaccinations does not equal statewide total vaccinations due to out-of-state residents vaccinated in California.

These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

Totals for the Vaccine Progress Dashboard and this dataset may not match, as the Dashboard totals doses by Report Date and this dataset totals doses by Administration Date. Dose numbers may also change for a particular Administration Date as data is updated.

Previous updates:

On March 3, 2023, with the release of HPI 3.0 in 2022, the previous equity scores have been updated to reflect more recent community survey information. This change represents an improvement to the way CDPH monitors health equity by using the latest and most accurate community data available. The HPI uses a collection of data sources and indicators to calculate a measure of community conditions ranging from the most to the least healthy based on economic, housing, and environmental measures.

Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 16+ and age 5+ denominators have been uploaded as archived tables.

Starting on May 29, 2021 the methodology for calculating on-hand inventory in the shipped/delivered/on-hand dataset has changed. Please see the accompanying data dictionary for details. In addition, this dataset is now down to the ZIP code level.
Indian Economical Data 1990 to 2019
kaggle.com
Updated Nov 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anuj Chavan (2020). Indian Economical Data 1990 to 2019 [Dataset]. https://www.kaggle.com/datasets/anujchavan/indian-economical-data-science-1990-to-2019
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 13, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anuj Chavan
License
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Description
Dataset

This dataset was created by Anuj Chavan

Released under World Bank Dataset Terms of Use

Contents
India - Demographic, Health, Education and Transport indicators
data.humdata.org
cloud.csiss.gmu.edu
+2more
csv
Updated Mar 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United Nations Human Settlements Programmes, Data and Analytics Section (2024). India - Demographic, Health, Education and Transport indicators [Dataset]. https://data.humdata.org/dataset/unhabitat-in-indicators
Explore at:
csv(166264)Available download formats
Dataset updated
Mar 28, 2024
Dataset provided by
United Nationshttp://un.org/
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The urban indicators data available here are analyzed, compiled and published by UN-Habitat’s Global Urban Observatory which supports governments, local authorities and civil society organizations to develop urban indicators, data and statistics. Urban statistics are collected through household surveys and censuses conducted by national statistics authorities. Global Urban Observatory team analyses and compiles urban indicators statistics from surveys and censuses. Additionally, Local urban observatories collect, compile and analyze urban data for national policy development. Population statistics are produced by the United Nations Department of Economic and Social Affairs, World Urbanization Prospects.
NTR Vaidya Seva 2017
kaggle.com
Updated Oct 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Srikar (2018). NTR Vaidya Seva 2017 [Dataset]. https://www.kaggle.com/srikarkashyap/ntr-arogya-seva-2017/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Srikar
Description
About

This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.

Acknowledgements

Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.

Also thanks to Unsplash for the cover pic!

Inspiration

A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!

Ownership

Dataset owned by the Government of Andhra Pradesh but released freely on official website.
India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of...
ceicdata.com
Updated Mar 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2017). India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day [Dataset]. https://www.ceicdata.com/en/india/social-poverty-and-inequality/in-survey-mean-consumption-or-income-per-capita-bottom-40-of-population-2017-ppp-per-day
Explore at:
Dataset updated
Mar 15, 2017
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2015 - Dec 1, 2019
Area covered
India
Description
India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day data was reported at 2.010 Intl $/Day in 2011. This records an increase from the previous number of 1.610 Intl $/Day for 2004. India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day data is updated yearly, averaging 1.810 Intl $/Day from Dec 2004 (Median) to 2011, with 2 observations. The data reached an all-time high of 2.010 Intl $/Day in 2011 and a record low of 1.610 Intl $/Day in 2004. India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Social: Poverty and Inequality. Mean consumption or income per capita (2017 PPP $ per day) of the bottom 40%, used in calculating the growth rate in the welfare aggregate of the bottom 40% of the population in the income distribution in a country.;World Bank, Global Database of Shared Prosperity (GDSP) (http://www.worldbank.org/en/topic/poverty/brief/global-database-of-shared-prosperity).;;The choice of consumption or income for a country is made according to which welfare aggregate is used to estimate extreme poverty in the Poverty and Inequality Platform (PIP). The practice adopted by the World Bank for estimating global and regional poverty is, in principle, to use per capita consumption expenditure as the welfare measure wherever available; and to use income as the welfare measure for countries for which consumption is unavailable. However, in some cases data on consumption may be available but are outdated or not shared with the World Bank for recent survey years. In these cases, if data on income are available, income is used. Whether data are for consumption or income per capita is noted in the footnotes. Because household surveys are infrequent in most countries and are not aligned across countries, comparisons across countries or over time should be made with a high degree of caution.
F
Indian English Call Center Data for Realestate AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Indian English Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-english-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Indian English Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
Speech Data
The dataset features 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
•Participant Diversity:
•
Speakers: 60 native Indian English speakers from our verified contributor community.

•
Regions: Representing different provinces across India to ensure accent and dialect variation.

•
Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted agent-customer discussions.

•
Call Duration: Average 5–15 minutes per call.

•
Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.

•
Recording Environment: Captured in noise-free and echo-free conditions.

Topic Diversity
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
•Inbound Calls:
•Property Inquiries
•Rental Availability
•Renovation Consultation
•Property Features & Amenities
•Investment Property Evaluation
•Ownership History & Legal Info, and more
•Outbound Calls:
•New Listing Notifications
•Post-Purchase Follow-ups
•Property Recommendations
•Value Updates
•Customer Satisfaction Surveys, and others
Such domain-rich variety ensures model generalization across common real estate support conversations.
Transcription
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., background noise, pauses)
•High transcription accuracy with word error rate below 5% via dual-layer human review.
These transcriptions streamline ASR and NLP development for English real estate voice applications.
Metadata
Detailed metadata accompanies each participant and conversation:
•
Participant Metadata: ID, age, gender, location, accent, and dialect.

•
Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

This enables smart filtering, dialect-focused model training, and structured dataset exploration.
Usage and Applications
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
F
Indian English Call Center Data for Travel AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Indian English Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-english-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
India
Dataset funded by
FutureBeeAI
Description
Introduction
This Indian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
Speech Data
The dataset includes 30 hours of dual-channel audio recordings between native Indian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
•Participant Diversity:
•
Speakers: 60 native Indian English contributors from our verified pool.

•
Regions: Covering multiple India provinces to capture accent and dialectal variation.

•
Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).

•Recording Details:
•
Conversation Nature: Naturally flowing, spontaneous customer-agent calls.

•
Call Duration: Between 5 and 15 minutes per session.

•
Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.

•
Recording Environment: Captured in controlled, noise-free, echo-free settings.

Topic Diversity
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
•Inbound Calls:
•Booking Assistance
•Destination Information
•Flight Delays or Cancellations
•Support for Disabled Passengers
•Health and Safety Travel Inquiries
•Lost or Delayed Luggage, and more
•Outbound Calls:
•Promotional Travel Offers
•Customer Feedback Surveys
•Booking Confirmations
•Flight Rescheduling Alerts
•Visa Expiry Notifications, and others
These scenarios help models understand and respond to diverse traveler needs in real-time.
Transcription
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-Stamped Segments
•Non-speech Markers (e.g., pauses, coughs)
•High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.
Metadata
Extensive metadata enriches each call and speaker for better filtering and AI training:
•
Participant Metadata: ID, age, gender, region, accent, and dialect.

•
Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

Usage and Applications
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
•
ASR Systems: Train English speech-to-text engines for travel platforms.

<div style="margin-top:10px; margin-bottom: 10px; padding-left:
N
Blue Earth County, MN Population Breakdown By Race (Excluding Ethnicity)...
neilsberg.com
csv, json
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Blue Earth County, MN Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/7561c717-ef82-11ef-9e71-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Minnesota, Blue Earth County
Variables measured
Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Blue Earth County by race. It includes the population of Blue Earth County across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Blue Earth County across relevant racial categories.

Key observations

The percent distribution of Blue Earth County population by race (across all racial categories recognized by the U.S. Census Bureau): 86.76% are white, 4.58% are Black or African American, 0.21% are American Indian and Alaska Native, 2.36% are Asian, 1.47% are some other race and 4.63% are multiracial.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race: This column displays the racial categories (excluding ethnicity) for the Blue Earth County

Population: The population of the racial category (excluding ethnicity) in the Blue Earth County is shown in this column.

% of Total Population: This column displays the percentage distribution of each race as a proportion of Blue Earth County total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Blue Earth County Population by Race & Ethnicity. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

Development Research Group, Finance and Private Sector Development Unit (2022). Global Financial Inclusion (Global Findex) Database 2021 - India [Dataset]. https://catalog.ihsn.org/catalog/10452

Global Financial Inclusion (Global Findex) Database 2021 - India

Explore at:

Dataset updated

Dec 16, 2022

Dataset authored and provided by

Development Research Group, Finance and Private Sector Development Unit

Time period covered

2021

Area covered

India

Description

Abstract

The fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.

The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.

The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.

Geographic coverage

Excluded populations living in Northeast states and remote islands and Jammu and Kashmir. The excluded areas represent less than 10 percent of the total population.

Analysis unit

Individual

Kind of data

Observation data/ratings [obs]

Sampling procedure

In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.

In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.

In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.

The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).

For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.

Sample size for India is 3000.

Mode of data collection

Face-to-face [f2f]

Research instrument

Questionnaires are available on the website.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.

Clear search

Close search

Google apps

Main menu

Global Financial Inclusion (Global Findex) Database 2021 - India

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Sampling error estimates

India - Age and sex structures

The ORBIT (Object Recognition for Blind Image Training)-India Dataset

Earth, TX Population Breakdown By Race (Excluding Ethnicity) Dataset:...

About this dataset

Content

Inspiration

Recommended for further research

Global Financial Inclusion (Global Findex) Database 2017 - India

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Sampling error estimates

Geolocations of Indian Cities

Content

Acknowledgements

Indian English General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Total population of India 2029

India Proportion of People Living Below 50 Percent Of Median Income: %

Waste Management and Recycling in Indian Cities

About the Dataset: Waste Management and Recycling in India

Overview:

Purpose:

Columns:

Applications:

Sources:

India Census: Population: by Religion: Muslim: Urban

Globe, AZ Population Breakdown By Race (Excluding Ethnicity) Dataset:...

About this dataset

Content

Inspiration

Recommended for further research

COVID-19 Vaccine Progress Dashboard Data

Indian Economical Data 1990 to 2019

Dataset

Contents

India - Demographic, Health, Education and Transport indicators

NTR Vaidya Seva 2017

About

Acknowledgements

Inspiration

Ownership

India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of...

Indian English Call Center Data for Realestate AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Indian English Call Center Data for Travel AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Blue Earth County, MN Population Breakdown By Race (Excluding Ethnicity)...

About this dataset

Content

Inspiration

Global Financial Inclusion (Global Findex) Database 2021 - India