52 datasets found
  1. i

    Global Financial Inclusion (Global Findex) Database 2021 - India

    • catalog.ihsn.org
    • microdata.worldbank.org
    Updated Dec 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2022). Global Financial Inclusion (Global Findex) Database 2021 - India [Dataset]. https://catalog.ihsn.org/catalog/10452
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2021
    Area covered
    India
    Description

    Abstract

    The fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.

    The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.

    The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.

    Geographic coverage

    Excluded populations living in Northeast states and remote islands and Jammu and Kashmir. The excluded areas represent less than 10 percent of the total population.

    Analysis unit

    Individual

    Kind of data

    Observation data/ratings [obs]

    Sampling procedure

    In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.

    In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.

    In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.

    The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).

    For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.

    Sample size for India is 3000.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Questionnaires are available on the website.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.

  2. H

    India - Age and sex structures

    • data.humdata.org
    geotiff
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WorldPop (2025). India - Age and sex structures [Dataset]. https://data.humdata.org/dataset/worldpop-age-and-sex-structures-for-india
    Explore at:
    geotiffAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    WorldPop
    Area covered
    India
    Description

    WorldPop produces different types of gridded population count datasets, depending on the methods used and end application. Please make sure you have read our Mapping Populations overview page before choosing and downloading a dataset.

    A description of the modelling methods used for age and sex structures can be found in "https://pophealthmetrics.biomedcentral.com/articles/10.1186/1478-7954-11-11" target="_blank"> Tatem et al and Pezzulo et al. Details of the input population count datasets used can be found here, and age/sex structure proportion datasets here.
    Both top-down 'unconstrained' and 'constrained' versions of the datasets are available, and the differences between the two methods are outlined here. The datasets represent the outputs from a project focused on construction of consistent 100m resolution population count datasets for all countries of the World structured by male/female and 5-year age classes (plus a <1 year class). These efforts necessarily involved some shortcuts for consistency. The unconstrained datasets are available for each year from 2000 to 2020.
    The constrained datasets are only available for 2020 at present, given the time periods represented by the building footprint and built settlement datasets used in the mapping.
    Data for earlier dates is available directly from WorldPop.

    WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00646

  3. The ORBIT (Object Recognition for Blind Image Training)-India Dataset

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones (2025). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. http://doi.org/10.5281/zenodo.12608444
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

    Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

    The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

    This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

    REFERENCES:

    1. Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

    2. microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

    3. Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641

  4. N

    Earth, TX Population Breakdown By Race (Excluding Ethnicity) Dataset:...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Earth, TX Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/756d2330-ef82-11ef-9e71-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Texas, Earth
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Earth by race. It includes the population of Earth across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Earth across relevant racial categories.

    Key observations

    The percent distribution of Earth population by race (across all racial categories recognized by the U.S. Census Bureau): 60.83% are white, 3.52% are Black or African American, 4.59% are American Indian and Alaska Native, 2.77% are some other race and 28.28% are multiracial.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the Earth
    • Population: The population of the racial category (excluding ethnicity) in the Earth is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Earth total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Earth Population by Race & Ethnicity. You can refer the same here

  5. w

    Global Financial Inclusion (Global Findex) Database 2017 - India

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Oct 31, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2018). Global Financial Inclusion (Global Findex) Database 2017 - India [Dataset]. https://microdata.worldbank.org/index.php/catalog/3362
    Explore at:
    Dataset updated
    Oct 31, 2018
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2017
    Area covered
    India
    Description

    Abstract

    Financial inclusion is critical in reducing poverty and achieving inclusive economic growth. When people can participate in the financial system, they are better able to start and expand businesses, invest in their children’s education, and absorb financial shocks. Yet prior to 2011, little was known about the extent of financial inclusion and the degree to which such groups as the poor, women, and rural residents were excluded from formal financial systems.

    By collecting detailed indicators about how adults around the world manage their day-to-day finances, the Global Findex allows policy makers, researchers, businesses, and development practitioners to track how the use of financial services has changed over time. The database can also be used to identify gaps in access to the formal financial system and design policies to expand financial inclusion.

    Geographic coverage

    Sample excludes Northeast states and remote islands, representing less than 10% of the population.

    Analysis unit

    Individuals

    Universe

    The target population is the civilian, non-institutionalized population 15 years and above.

    Kind of data

    Observation data/ratings [obs]

    Sampling procedure

    The indicators in the 2017 Global Findex database are drawn from survey data covering almost 150,000 people in 144 economies-representing more than 97 percent of the world’s population (see table A.1 of the Global Findex Database 2017 Report for a list of the economies included). The survey was carried out over the 2017 calendar year by Gallup, Inc., as part of its Gallup World Poll, which since 2005 has annually conducted surveys of approximately 1,000 people in each of more than 160 economies and in over 150 languages, using randomly selected, nationally representative samples. The target population is the entire civilian, noninstitutionalized population age 15 and above. Interview procedure Surveys are conducted face to face in economies where telephone coverage represents less than 80 percent of the population or where this is the customary methodology. In most economies the fieldwork is completed in two to four weeks.

    In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used.

    Respondents are randomly selected within the selected households. Each eligible household member is listed and the handheld survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer’s gender.

    In economies where telephone interviewing is employed, random digit dialing or a nationally representative list of phone numbers is used. In most economies where cell phone penetration is high, a dual sampling frame is used. Random selection of respondents is achieved by using either the latest birthday or household enumeration method. At least three attempts are made to reach a person in each household, spread over different days and times of day.

    The sample size was 3000.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup Inc. also provided valuable input. The questionnaire was piloted in multiple countries, using focus groups, cognitive interviews, and field testing. The questionnaire is available in more than 140 languages upon request.

    Questions on cash on delivery, saving using an informal savings club or person outside the family, domestic remittances, and agricultural payments are only asked in developing economies and few other selected countries. The question on mobile money accounts was only asked in economies that were part of the Mobile Money for the Unbanked (MMU) database of the GSMA at the time the interviews were being held.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar, and Jake Hess. 2018. The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech Revolution. Washington, DC: World Bank

  6. Geolocations of Indian Cities

    • kaggle.com
    Updated Oct 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chaitanya (2020). Geolocations of Indian Cities [Dataset]. https://www.kaggle.com/crbelhekar619/geolocations-of-indian-cities/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chaitanya
    Area covered
    India
    Description

    Content

    The data set contains geolocations of all the cities in India with a population of more than 1000.

    There are total 10 columns in the dataset. Geoname - Unique Geo-ID for the city Name - Name of the city ACSII Name - ASCII name of the city for interpretability Alternate Names - Alternate names for the city Latitude - Latitude of the city Longitude - Longitude of the city Population - Population of the city Digital Elevation Model - Digital elevation of the city Country - Country of the city Coordinates - Coordinates of the city

    Acknowledgements

    The data set is contributed by opendatasoft Data Network

  7. F

    Indian English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Indian English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Indian English.
    Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  8. Total population of India 2029

    • statista.com
    Updated Nov 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Total population of India 2029 [Dataset]. https://www.statista.com/statistics/263766/total-population-of-india/
    Explore at:
    Dataset updated
    Nov 18, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    The statistic shows the total population of India from 2019 to 2029. In 2023, the estimated total population in India amounted to approximately 1.43 billion people.

    Total population in India

    India currently has the second-largest population in the world and is projected to overtake top-ranking China within forty years. Its residents comprise more than one-seventh of the entire world’s population, and despite a slowly decreasing fertility rate (which still exceeds the replacement rate and keeps the median age of the population relatively low), an increasing life expectancy adds to an expanding population. In comparison with other countries whose populations are decreasing, such as Japan, India has a relatively small share of aged population, which indicates the probability of lower death rates and higher retention of the existing population.

    With a land mass of less than half that of the United States and a population almost four times greater, India has recognized potential problems of its growing population. Government attempts to implement family planning programs have achieved varying degrees of success. Initiatives such as sterilization programs in the 1970s have been blamed for creating general antipathy to family planning, but the combined efforts of various family planning and contraception programs have helped halve fertility rates since the 1960s. The population growth rate has correspondingly shrunk as well, but has not yet reached less than one percent growth per year.

    As home to thousands of ethnic groups, hundreds of languages, and numerous religions, a cohesive and broadly-supported effort to reduce population growth is difficult to create. Despite that, India is one country to watch in coming years. It is also a growing economic power; among other measures, its GDP per capita was expected to triple between 2003 and 2013 and was listed as the third-ranked country for its share of the global gross domestic product.

  9. India Proportion of People Living Below 50 Percent Of Median Income: %

    • ceicdata.com
    Updated Mar 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2017). India Proportion of People Living Below 50 Percent Of Median Income: % [Dataset]. https://www.ceicdata.com/en/india/social-poverty-and-inequality/proportion-of-people-living-below-50-percent-of-median-income-
    Explore at:
    Dataset updated
    Mar 15, 2017
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 1987 - Dec 1, 2021
    Area covered
    India
    Description

    India Proportion of People Living Below 50 Percent Of Median Income: % data was reported at 9.800 % in 2021. This records a decrease from the previous number of 10.000 % for 2020. India Proportion of People Living Below 50 Percent Of Median Income: % data is updated yearly, averaging 6.200 % from Dec 1977 (Median) to 2021, with 14 observations. The data reached an all-time high of 10.300 % in 2019 and a record low of 5.100 % in 2004. India Proportion of People Living Below 50 Percent Of Median Income: % data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Social: Poverty and Inequality. The percentage of people in the population who live in households whose per capita income or consumption is below half of the median income or consumption per capita. The median is measured at 2017 Purchasing Power Parity (PPP) using the Poverty and Inequality Platform (http://www.pip.worldbank.org). For some countries, medians are not reported due to grouped and/or confidential data. The reference year is the year in which the underlying household survey data was collected. In cases for which the data collection period bridged two calendar years, the first year in which data were collected is reported.;World Bank, Poverty and Inequality Platform. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. Data for high-income economies are mostly from the Luxembourg Income Study database. For more information and methodology, please see http://pip.worldbank.org.;;The World Bank’s internationally comparable poverty monitoring database now draws on income or detailed consumption data from more than 2000 household surveys across 169 countries. See the Poverty and Inequality Platform (PIP) for details (www.pip.worldbank.org).

  10. Waste Management and Recycling in Indian Cities

    • kaggle.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krishna Yadu (2024). Waste Management and Recycling in Indian Cities [Dataset]. http://doi.org/10.34740/kaggle/dsv/10203312
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Krishna Yadu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    About the Dataset: Waste Management and Recycling in India

    Overview:

    This dataset provides comprehensive information on waste management and recycling practices in various cities across India. It includes key data related to waste generation, recycling rates, population density, municipal efficiency, landfill details, and more. The data spans multiple years (2019–2023) and covers a range of waste types, including plastic, organic waste, electronic waste (e-waste), construction waste, and hazardous waste.

    Purpose:

    The dataset aims to: - Promote efficient waste management practices across Indian cities. - Analyze trends in recycling and waste disposal methods. - Provide insights for improving municipal management systems. - Support research and development in sustainability, environmental science, and urban planning.

    Columns:

    1. City/District: The name of the Indian city or district.
    2. Waste Type: Type of waste generated, e.g., Plastic, Organic, E-Waste, Construction, Hazardous.
    3. Waste Generated (Tons/Day): Amount of waste generated in tons per day.
    4. Recycling Rate (%): The percentage of waste that is recycled.
    5. Population Density (People/km²): The number of people per square kilometer in the city.
    6. Municipal Efficiency Score (1-10): A score indicating how effectively the municipality manages waste (e.g., waste segregation, collection, disposal).
    7. Disposal Method: The method used for waste disposal (e.g., Landfill, Recycling, Incineration, Composting).
    8. Cost of Waste Management (₹/Ton): The cost of managing one ton of waste in Indian Rupees.
    9. Awareness Campaigns Count: The number of awareness campaigns organized by the municipality in that year related to waste management.
    10. Landfill Name: The name of the landfill site used by the city.
    11. Landfill Location (Lat, Long): The geographical location (latitude and longitude) of the landfill.
    12. Landfill Capacity (Tons): The total waste capacity (in tons) that the landfill can hold.
    13. Year: The year of the data entry, ranging from 2019 to 2023.

    Applications:

    • Urban Planning: The dataset can be used to analyze and optimize waste management infrastructure in urban areas.
    • Sustainability Research: It can help in studying the progress of recycling and waste reduction strategies.
    • Policy Making: Government bodies can use this data to craft policies aimed at improving waste management and recycling rates.
    • Machine Learning/AI: The dataset can be used to build models for predicting waste generation trends, recycling outcomes, and municipal efficiency.

    Sources:

    • The data is simulated for this dataset based on average waste management practices observed in Indian cities.
    • Real-world data could come from municipal corporations, environmental agencies, and government reports on waste management.
  11. I

    India Census: Population: by Religion: Muslim: Urban

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, India Census: Population: by Religion: Muslim: Urban [Dataset]. https://www.ceicdata.com/en/india/census-population-by-religion/census-population-by-religion-muslim-urban
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 2001 - Mar 1, 2011
    Area covered
    India
    Variables measured
    Population
    Description

    India Census: Population: by Religion: Muslim: Urban data was reported at 68,740,419.000 Person in 2011. This records an increase from the previous number of 49,393,496.000 Person for 2001. India Census: Population: by Religion: Muslim: Urban data is updated yearly, averaging 59,066,957.500 Person from Mar 2001 (Median) to 2011, with 2 observations. The data reached an all-time high of 68,740,419.000 Person in 2011 and a record low of 49,393,496.000 Person in 2001. India Census: Population: by Religion: Muslim: Urban data remains active status in CEIC and is reported by Census of India. The data is categorized under India Premium Database’s Demographic – Table IN.GAE001: Census: Population: by Religion.

  12. N

    Globe, AZ Population Breakdown By Race (Excluding Ethnicity) Dataset:...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Globe, AZ Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/7573e287-ef82-11ef-9e71-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Arizona, Globe
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Globe by race. It includes the population of Globe across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Globe across relevant racial categories.

    Key observations

    The percent distribution of Globe population by race (across all racial categories recognized by the U.S. Census Bureau): 58.09% are white, 2.70% are Black or African American, 5.26% are American Indian and Alaska Native, 2.92% are Asian, 0.12% are Native Hawaiian and other Pacific Islander, 11.37% are some other race and 19.54% are multiracial.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the Globe
    • Population: The population of the racial category (excluding ethnicity) in the Globe is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Globe total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Globe Population by Race & Ethnicity. You can refer the same here

  13. COVID-19 Vaccine Progress Dashboard Data

    • data.chhs.ca.gov
    • data.ca.gov
    • +5more
    csv, xlsx, zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data [Dataset]. https://data.chhs.ca.gov/dataset/vaccine-progress-dashboard
    Explore at:
    csv(18403068), csv(110928434), xlsx(11534), csv(111682), csv(148732), csv(303068812), xlsx(11249), xlsx(11870), xlsx(7708), csv(188895), csv(638738), csv(503270), xlsx(11731), csv(2641927), csv(12877811), csv(83128924), csv(54906), csv(26828), csv(7777694), csv(82754), csv(724860), csv(675610), csv(2447143), csv(6772350), zipAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

    On 6/16/2023 CDPH replaced the booster measures with a new “Up to Date” measure based on CDC’s new recommendations, replacing the primary series, boosted, and bivalent booster metrics The definition of “primary series complete” has not changed and is based on previous recommendations that CDC has since simplified. A person cannot complete their primary series with a single dose of an updated vaccine. Whereas the booster measures were calculated using the eligible population as the denominator, the new up to date measure uses the total estimated population. Please note that the rates for some groups may change since the up to date measure is calculated differently than the previous booster and bivalent measures.

    This data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/ which summarizes vaccination data at the county level by county of residence. Where county of residence was not reported in a vaccination record, the county of provider that vaccinated the resident is included. This applies to less than 1% of vaccination records. The sum of county-level vaccinations does not equal statewide total vaccinations due to out-of-state residents vaccinated in California.

    These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

    Totals for the Vaccine Progress Dashboard and this dataset may not match, as the Dashboard totals doses by Report Date and this dataset totals doses by Administration Date. Dose numbers may also change for a particular Administration Date as data is updated.

    Previous updates:

    • On March 3, 2023, with the release of HPI 3.0 in 2022, the previous equity scores have been updated to reflect more recent community survey information. This change represents an improvement to the way CDPH monitors health equity by using the latest and most accurate community data available. The HPI uses a collection of data sources and indicators to calculate a measure of community conditions ranging from the most to the least healthy based on economic, housing, and environmental measures.

    • Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 16+ and age 5+ denominators have been uploaded as archived tables.

    • Starting on May 29, 2021 the methodology for calculating on-hand inventory in the shipped/delivered/on-hand dataset has changed. Please see the accompanying data dictionary for details. In addition, this dataset is now down to the ZIP code level.

  14. Indian Economical Data 1990 to 2019

    • kaggle.com
    Updated Nov 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anuj Chavan (2020). Indian Economical Data 1990 to 2019 [Dataset]. https://www.kaggle.com/datasets/anujchavan/indian-economical-data-science-1990-to-2019
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anuj Chavan
    License

    https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets

    Description

    Dataset

    This dataset was created by Anuj Chavan

    Released under World Bank Dataset Terms of Use

    Contents

  15. India - Demographic, Health, Education and Transport indicators

    • data.humdata.org
    • cloud.csiss.gmu.edu
    • +2more
    csv
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Nations Human Settlements Programmes, Data and Analytics Section (2024). India - Demographic, Health, Education and Transport indicators [Dataset]. https://data.humdata.org/dataset/unhabitat-in-indicators
    Explore at:
    csv(166264)Available download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    United Nationshttp://un.org/
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    The urban indicators data available here are analyzed, compiled and published by UN-Habitat’s Global Urban Observatory which supports governments, local authorities and civil society organizations to develop urban indicators, data and statistics. Urban statistics are collected through household surveys and censuses conducted by national statistics authorities. Global Urban Observatory team analyses and compiles urban indicators statistics from surveys and censuses. Additionally, Local urban observatories collect, compile and analyze urban data for national policy development. Population statistics are produced by the United Nations Department of Economic and Social Affairs, World Urbanization Prospects.

  16. NTR Vaidya Seva 2017

    • kaggle.com
    Updated Oct 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srikar (2018). NTR Vaidya Seva 2017 [Dataset]. https://www.kaggle.com/srikarkashyap/ntr-arogya-seva-2017/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Srikar
    Description

    About

    This dataset contains around 480,000 records of patients data from the NTR Vaidya Seva scheme of the Government of Andhra Pradesh, India. NTR Vaidya Seva is the flagship healthcare scheme of the government in which lower-middle class and low-income citizens of the state of Andhra Pradesh can obtain free healthcare for many major diseases and ailments. A similar program exists in the neighboring state of Telangana as well.

    Acknowledgements

    Original dataset can be found on the NTR Vaidya Seva's official website. The dataset has been partially anonymized on the official website. I've further anonymized it.

    Also thanks to Unsplash for the cover pic!

    Inspiration

    A useful beginner level real world dataset. I'm tired of seeing the IRIS and Titanic Datasets for exploratory data analysis!

    Ownership

    Dataset owned by the Government of Andhra Pradesh but released freely on official website.

  17. India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of...

    • ceicdata.com
    Updated Mar 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2017). India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day [Dataset]. https://www.ceicdata.com/en/india/social-poverty-and-inequality/in-survey-mean-consumption-or-income-per-capita-bottom-40-of-population-2017-ppp-per-day
    Explore at:
    Dataset updated
    Mar 15, 2017
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2015 - Dec 1, 2019
    Area covered
    India
    Description

    India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day data was reported at 2.010 Intl $/Day in 2011. This records an increase from the previous number of 1.610 Intl $/Day for 2004. India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day data is updated yearly, averaging 1.810 Intl $/Day from Dec 2004 (Median) to 2011, with 2 observations. The data reached an all-time high of 2.010 Intl $/Day in 2011 and a record low of 1.610 Intl $/Day in 2004. India IN: Survey Mean Consumption or Income per Capita: Bottom 40% of Population: 2017 PPP per day data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Social: Poverty and Inequality. Mean consumption or income per capita (2017 PPP $ per day) of the bottom 40%, used in calculating the growth rate in the welfare aggregate of the bottom 40% of the population in the income distribution in a country.;World Bank, Global Database of Shared Prosperity (GDSP) (http://www.worldbank.org/en/topic/poverty/brief/global-database-of-shared-prosperity).;;The choice of consumption or income for a country is made according to which welfare aggregate is used to estimate extreme poverty in the Poverty and Inequality Platform (PIP). The practice adopted by the World Bank for estimating global and regional poverty is, in principle, to use per capita consumption expenditure as the welfare measure wherever available; and to use income as the welfare measure for countries for which consumption is unavailable. However, in some cases data on consumption may be available but are outdated or not shared with the World Bank for recent survey years. In these cases, if data on income are available, income is used. Whether data are for consumption or income per capita is noted in the footnotes. Because household surveys are infrequent in most countries and are not aligned across countries, comparisons across countries or over time should be made with a high degree of caution.

  18. F

    Indian English Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native Indian English speakers from our verified contributor community.
    Regions: Representing different provinces across India to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for English real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  19. F

    Indian English Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    India
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native Indian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native Indian English contributors from our verified pool.
    Regions: Covering multiple India provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train English speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left:

  20. N

    Blue Earth County, MN Population Breakdown By Race (Excluding Ethnicity)...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Blue Earth County, MN Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/7561c717-ef82-11ef-9e71-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Minnesota, Blue Earth County
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Blue Earth County by race. It includes the population of Blue Earth County across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Blue Earth County across relevant racial categories.

    Key observations

    The percent distribution of Blue Earth County population by race (across all racial categories recognized by the U.S. Census Bureau): 86.76% are white, 4.58% are Black or African American, 0.21% are American Indian and Alaska Native, 2.36% are Asian, 1.47% are some other race and 4.63% are multiracial.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the Blue Earth County
    • Population: The population of the racial category (excluding ethnicity) in the Blue Earth County is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Blue Earth County total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Blue Earth County Population by Race & Ethnicity. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Development Research Group, Finance and Private Sector Development Unit (2022). Global Financial Inclusion (Global Findex) Database 2021 - India [Dataset]. https://catalog.ihsn.org/catalog/10452

Global Financial Inclusion (Global Findex) Database 2021 - India

Explore at:
Dataset updated
Dec 16, 2022
Dataset authored and provided by
Development Research Group, Finance and Private Sector Development Unit
Time period covered
2021
Area covered
India
Description

Abstract

The fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.

The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.

The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.

Geographic coverage

Excluded populations living in Northeast states and remote islands and Jammu and Kashmir. The excluded areas represent less than 10 percent of the total population.

Analysis unit

Individual

Kind of data

Observation data/ratings [obs]

Sampling procedure

In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.

In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.

In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.

The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).

For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.

Sample size for India is 3000.

Mode of data collection

Face-to-face [f2f]

Research instrument

Questionnaires are available on the website.

Sampling error estimates

Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.

Search
Clear search
Close search
Google apps
Main menu