http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains Food Prices data for India, sourced from the World Food Programme Price Database. The World Food Programme Price Database covers foods such as maize, rice, beans, fish, and sugar for 98 countries and some 3000 markets. It is updated weekly but contains to a large extent monthly data. The data goes back as far as 1992 for a few countries, although many countries started reporting from 2003 or thereafter.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.
It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.
Each record captures multiple attributes related to individuals in the Indian job market, including:
- Age Group
- Employment Status (Employed/Unemployed)
- Monthly Salary (INR)
- Education Level
- Industry Sector
- Years of Experience
- Location
- Perceived AI Risk
- Date of Data Recording
The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.
This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools
It's also useful for:
- Training ML models with clean inputs
- Data storytelling with visual clarity
- Demonstrating reproducibility in data cleaning pipelines
By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.
CompanyData.com, powered by BoldData, delivers high-quality, verified B2B company information from official trade registers around the world. Our India company database includes 32,468,995 verified business records, giving you powerful insight into one of the fastest-growing economies on the planet.
Each company profile is rich with firmographic data, including company name, CIN (Corporate Identification Number), registration number, legal status, industry classification (NIC codes), revenue range, and employee size. Many records are enhanced with contact details such as email addresses, phone numbers, and names of key decision-makers, supporting direct outreach and smarter segmentation.
Our India dataset is designed for a wide range of business applications — from KYC and AML compliance, due diligence, and regulatory checks, to B2B sales, lead generation, marketing campaigns, CRM enrichment, and AI model training. Whether you’re targeting local startups or large enterprises, our data helps you connect with the right businesses at the right time.
Delivery is flexible to suit your needs. Choose from customized lists, full databases in Excel or CSV, access via our real-time API, or our intuitive self-service platform. We also offer data enrichment and cleansing services to refresh and improve your existing datasets with accurate, up-to-date company information from India.
With access to 32,468,995 verified companies across more than 200 countries, CompanyData.com helps businesses grow confidently — in India and beyond. Rely on our precise, structured data to fuel your strategies and scale with speed and accuracy.
Soil is a key natural resource that provides the foundation of basic ecosystem services. Soil determines the types of farms and forests that can grow on a landscape. Soil filters water. Soil helps regulate the Earth's climate by storing large amounts of carbon. Activities that degrade soils reduce the value of the ecosystem services that soil provides. For example, since 1850 35% of human caused green house gas emissions are linked to land use change. The Soil Science Society of America is a good source of of additional information.The mineral composition of underlying rock, the amount and type of organic material from plants and climatic and other environmental factors affect the chemistry of the soil. Chemical composition and processes determine how and what type of soil forms at a given location and what type of agriculture the areas wil support.Dataset SummaryThis layer provides access to a 30 arc-second (roughly 1 km) cell-sized raster with attributes related to the chemistry of soil derived from the Harmonized World Soil Database v 1.2. The values in this layer are for the dominant soil in each mapping unit (sequence field = 1).Fields for topsoil (0-30 cm) and subsoil (30-100 cm) are available for each of these soil chemistry attributes:Organic Carbon - % weightCalcium Carbonate - % weightGypsum - % weightSalinity - Electrical Conductivity - dS/mpHAdditionally, 4 class description fields were added by Esri based on the document Harmonized World Soil Database Version 1.2 for use in web map pop-ups:pH Class DescriptionCalcium Carbonate Class DescriptionGypsum Class DescriptionSalinity - Electrical Conductivity - Class DescriptionThe layer is symbolized with the Topsoil pH field.The document Harmonized World Soil Database Version 1.2 provides more detail on the soil chemistry attributes contained in this layer.Other attributes contained in this layer include:Soil Mapping Unit Name - the name of the spatially dominant major soil groupSoil Mapping Unit Symbol - a two letter code for labeling the spatially dominant major soil group in thematic mapsData Source - the HWSD is an aggregation of datasets. The data sources are the European Soil Database (ESDB), the 1:1 million soil map of China (CHINA), the Soil and Terrain Database Program (SOTWIS), and the Digital Soil Map of the World (DSMW).Percentage of Mapping Unit covered by dominant componentMore information on the Harmonized World Soil Database is available here.Other layers created from the Harmonized World Soil Database are available on ArcGIS Online:World Soils Harmonized World Soil Database - Bulk DensityWorld Soils Harmonized World Soil Database - Exchange CapacityWorld Soils Harmonized World Soil Database – GeneralWorld Soils Harmonized World Soil Database – HydricWorld Soils Harmonized World Soil Database – TextureThe authors of this data set request that projects using these data include the following citation:FAO/IIASA/ISRIC/ISSCAS/JRC, 2012. Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria.What can you do with this layer?This layer is suitable for both visualization and analysis. It can be used in ArcGIS Online in web maps and applications and can be used in ArcGIS Desktop.This layer has query, identify, and export image services available. This layer is restricted to a maximum area of 16,000 x 16,000 pixels - an area 4,000 kilometers on a side or an area approximately the size of Europe. The source data for this layer are available here. This layer is part of a larger collection of landscape layers that you can use to perform a wide variety of mapping and analysis tasks.The Living Atlas of the World provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Geonet is a good resource for learning more about landscape layers and the Living Atlas of the World. To get started follow these links:Living Atlas Discussion GroupSoil Data Discussion GroupThe Esri Insider Blog provides an introduction to the Ecophysiographic Mapping project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
India IN: Mortality from CVD, Cancer, Diabetes or CRD between Exact Ages 30 and 70: Female data was reported at 19.800 NA in 2016. This records a decrease from the previous number of 20.000 NA for 2015. India IN: Mortality from CVD, Cancer, Diabetes or CRD between Exact Ages 30 and 70: Female data is updated yearly, averaging 21.200 NA from Dec 2000 (Median) to 2016, with 5 observations. The data reached an all-time high of 23.400 NA in 2000 and a record low of 19.800 NA in 2016. India IN: Mortality from CVD, Cancer, Diabetes or CRD between Exact Ages 30 and 70: Female data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Health Statistics. Mortality from CVD, cancer, diabetes or CRD is the percent of 30-year-old-people who would die before their 70th birthday from any of cardiovascular disease, cancer, diabetes, or chronic respiratory disease, assuming that s/he would experience current mortality rates at every age and s/he would not die from any other cause of death (e.g., injuries or HIV/AIDS).; ; World Health Organization, Global Health Observatory Data Repository (http://apps.who.int/ghodata/).; Weighted average;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 1 row and is filtered where the books is The unknown Indians : people who quietly changed our world. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Developing data-driven solutions that address real-world problems requires understanding of these problems’ causes and how their interaction affects the outcome–often with only observational data. Causal Bayesian Networks (BN) have been proposed as a powerful method for discovering and representing the causal relationships from observational data as a Directed Acyclic Graph (DAG). BNs could be especially useful for research in global health in Lower and Middle Income Countries, where there is an increasing abundance of observational data that could be harnessed for policy making, program evaluation, and intervention design. However, BNs have not been widely adopted by global health professionals, and in real-world applications, confidence in the results of BNs generally remains inadequate. This is partially due to the inability to validate against some ground truth, as the true DAG is not available. This is especially problematic if a learned DAG conflicts with pre-existing domain doctrine. Here we conceptualize and demonstrate an idea of a “Causal Datasheet” that could approximate and document BN performance expectations for a given dataset, aiming to provide confidence and sample size requirements to practitioners. To generate results for such a Causal Datasheet, a tool was developed which can generate synthetic Bayesian networks and their associated synthetic datasets to mimic real-world datasets. The results given by well-known structure learning algorithms and a novel implementation of the OrderMCMC method using the Quotient Normalized Maximum Likelihood score were recorded. These results were used to populate the Causal Datasheet, and recommendations could be made dependent on whether expected performance met user-defined thresholds. We present our experience in the creation of Causal Datasheets to aid analysis decisions at different stages of the research process. First, one was deployed to help determine the appropriate sample size of a planned study of sexual and reproductive health in Madhya Pradesh, India. Second, a datasheet was created to estimate the performance of an existing maternal health survey we conducted in Uttar Pradesh, India. Third, we validated generated performance estimates and investigated current limitations on the well-known ALARM dataset. Our experience demonstrates the utility of the Causal Datasheet, which can help global health practitioners gain more confidence when applying BNs.
The fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.
The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.
The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.
Excluded populations living in Northeast states and remote islands and Jammu and Kashmir. The excluded areas represent less than 10 percent of the total population.
Individual
Observation data/ratings [obs]
In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.
In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.
In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.
The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).
For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.
Sample size for India is 3000.
Face-to-face [f2f]
Questionnaires are available on the website.
Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains data from the World Bank's data portal. There is also a consolidated country dataset on HDX.
For the 70 percent of the world's poor who live in rural areas, agriculture is the main source of income and employment. But depletion and degradation of land and water pose serious challenges to producing enough food and other agricultural products to sustain livelihoods here and meet the needs of urban populations. Data presented here include measures of agricultural inputs, outputs, and productivity compiled by the UN's Food and Agriculture Organization.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple English speech and language AI applications:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.
Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.
The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.
This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.
REFERENCES:
Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597
microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset
Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
https://store.poidata.xyz/in Point-of-interest (POI) is defined as a physical entity (such as a business) in a geo location (point) which may be (of interest).
We strive to provide the most accurate, complete and up to date point of interest datasets for all countries of the world. The India POI Dataset is one of our worldwide POI datasets.
This is our process flow:
Our machine learning systems continuously crawl for new POI data
Our geoparsing and geocoding calculates their geo locations
Our categorization systems cleanup and standardize the datasets
Our data pipeline API publishes the datasets on our data store
POI Data is in a constant flux - especially so during times of drastic change such as the Covid-19 pandemic.
Every minute worldwide on an average day over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist.
In today's interconnected world, of the approximately 200 million POIs worldwide, over 94% have a public online presence. As a new POI comes into existence its information will appear very quickly in location based social networks (LBSNs), other social media, pictures, websites, blogs, press releases. Soon after that, our state-of-the-art POI Information retrieval system will pick it up.
We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via a recurring payment plan on our data update pipeline.
The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.
The core attribute coverage for India is as follows: Poi Field Data Coverage (%) poi_name 100 brand 3 poi_tel 17 formatted_address 100 main_category 100 latitude 100 longitude 100 neighborhood 7 source_url 24 email 2 opening_hours 26
The dataset may be viewed online at https://store.poidata.xyz/in and a data sample may be downloaded at https://store.poidata.xyz/datafiles/in_sample.csv
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The India Lights platform shows light output at night for 20 years for 600,000 villages across India. The Defense Meteorological Satellite Program (DMSP) has taken pictures of the Earth every night from 1993 to 2013. Researchers at the University of Michigan, in collaboration with the World Bank, used the DMSP images to extract the data you see on the India Lights platform. Each point you see on the map represents the light output of a specific village at a specific point in time. On the district level, the map also allows you to filter to view villages that have participated in India’s flagship electrification program. This tremendous trove of data can be used to look at changes in light output, which can be used to complement research about electrification in the country. About the Data: The DMSP raster images have a resolution of 30 arc-seconds, equal to roughly 1 square kilometer at the equator. Each pixel of the image is assigned a number on a relative scale from 0 to 63, with 0 indicating no light output and 63 indicating the highest level of output. This number is relative and may change depending on the gain settings of the satellite’s sensor, which constantly adjusts to current conditions as it takes pictures throughout the day and at night. Methodology To derive a single measurement, the light output values were extracted from the raster image for each date for the pixels that correspond to each village's approximate latitude and longitude coordinates. We then processed the data through a series of filtering and aggregation steps. First, we filtered out data with too much cloud cover and solar glare, according to recommendations from the National Oceanic and Atmospheric Administration (NOAA). We aggregated the resulting 4.4 billion data points by taking the median measurement for each village over the course of a month. We adjusted for differences among satellites using a multiple regression on year and satellite to isolate the effect of each satellite. To analyze data on the state and district level, we also determined the median village light output within each administrative boundary for each month in the twenty-year time span. These monthly aggregates for each village, district, and state are the data that we have made accessible through the API. To generate the map and light curve visualizations that are presented on this site, we performed some additional data processing. For the light curves, we used a rolling average to smooth out the noise due to wide fluctuations inherent in satellite measurements. For the map, we took a random sample of 10% of the villages, stratified over districts to ensure good coverage across regions of varying village density. Acknowledgments The India Lights project is a collaboration between Development Seed, The World Bank, and Dr. Brian Min at the University of Michigan. •Satellite base map © Mapbox. •India village locations derived from India VillageMap © 2011-2015 ML Infomap. •India population data and district boundaries © 2011-2015 ML Infomap. •Data for reference map of Uttar Pradesh, India, from Natural Earth Data •Banerjee, Sudeshna Ghosh; Barnes, Douglas; Singh, Bipul; Mayer, Kristy; Samad, Hussain. 2014. Power for all : electricity access challenge in India. A World Bank study. Washington, DC ; World Bank Group. •Hsu, Feng-Chi, Kimberly Baugh, Tilottama Ghosh, Mikhail Zhizhin, and Christopher Elvidge. "DMSP-OLS Radiance Calibrated Nighttime Lights Time Series with Intercalibration." Remote Sensing 7.2 (2015): 1855-876. Web. •Min, Brian. Monitoring Rural Electrification by Satellite. Tech. World Bank, 30 Dec. 2014. Web. •Min, Brian. Power and the Vote: Elections and Electricity in the Developing World. New York and Cambridge: Cambridge University Press. 2015. •Min, Brian, and Kwawu Mensan Gaba. Tracking Electrification in Vietnam Using Nighttime Lights. Remote Sensing 6.10 (2014): 9511-529. •Min, Brian, and Kwawu Mensan Gaba, Ousmane Fall Sarr, Alassane Agalassou. Detection of Rural Electrification in Africa using DMSP-OLS Night Lights Imagery. International Journal of Remote Sensing 34.22 (2013):8118-8141. Disclaimer Country borders or names do not necessarily reflect the World Bank Group's official position. The map is for illustrative purposes and does not imply the expression of any opinion on the part of the World Bank, concerning the legal status of any country or territory or concerning the delimitation of frontiers or boundaries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Looking back 45 years or so, progress against poverty in India has been highly uneven over time and space. It took 20 years for the national poverty rate to fall below—and stay below—its value in the early 1950s. And trend rates of poverty reduction have differed appreciably between states. This research project aimed to understand what influence economy-wide and sectoral factors have played in the evolution of poverty measures for India since the 1950s, and to draw lessons for the future. This database contains detailed statistics on a wide range of topics in India. The data are presented at the state level and at the all-India level separately. The database uses published information to construct comprehensive series in six subject blocks. Period coverage is roughly from 1950 to 1994. The database contains 30 spreadsheets and 89 text files (ASCII) that are grouped into the six subject blocks. The formats and sizes of the 30 spreadsheets vary considerably. The list of variables included: . Expenditures (distribution) . National Accounts . Prices Wages . Population . Rainfall
Financial inclusion is critical in reducing poverty and achieving inclusive economic growth. When people can participate in the financial system, they are better able to start and expand businesses, invest in their children’s education, and absorb financial shocks. Yet prior to 2011, little was known about the extent of financial inclusion and the degree to which such groups as the poor, women, and rural residents were excluded from formal financial systems.
By collecting detailed indicators about how adults around the world manage their day-to-day finances, the Global Findex allows policy makers, researchers, businesses, and development practitioners to track how the use of financial services has changed over time. The database can also be used to identify gaps in access to the formal financial system and design policies to expand financial inclusion.
Sample excludes Northeast states and remote islands, representing less than 10% of the population.
Individuals
The target population is the civilian, non-institutionalized population 15 years and above.
Observation data/ratings [obs]
The indicators in the 2017 Global Findex database are drawn from survey data covering almost 150,000 people in 144 economies-representing more than 97 percent of the world’s population (see table A.1 of the Global Findex Database 2017 Report for a list of the economies included). The survey was carried out over the 2017 calendar year by Gallup, Inc., as part of its Gallup World Poll, which since 2005 has annually conducted surveys of approximately 1,000 people in each of more than 160 economies and in over 150 languages, using randomly selected, nationally representative samples. The target population is the entire civilian, noninstitutionalized population age 15 and above. Interview procedure Surveys are conducted face to face in economies where telephone coverage represents less than 80 percent of the population or where this is the customary methodology. In most economies the fieldwork is completed in two to four weeks.
In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used.
Respondents are randomly selected within the selected households. Each eligible household member is listed and the handheld survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer’s gender.
In economies where telephone interviewing is employed, random digit dialing or a nationally representative list of phone numbers is used. In most economies where cell phone penetration is high, a dual sampling frame is used. Random selection of respondents is achieved by using either the latest birthday or household enumeration method. At least three attempts are made to reach a person in each household, spread over different days and times of day.
The sample size was 3000.
Computer Assisted Personal Interview [capi]
The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup Inc. also provided valuable input. The questionnaire was piloted in multiple countries, using focus groups, cognitive interviews, and field testing. The questionnaire is available in more than 140 languages upon request.
Questions on cash on delivery, saving using an informal savings club or person outside the family, domestic remittances, and agricultural payments are only asked in developing economies and few other selected countries. The question on mobile money accounts was only asked in economies that were part of the Mobile Money for the Unbanked (MMU) database of the GSMA at the time the interviews were being held.
Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar, and Jake Hess. 2018. The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech Revolution. Washington, DC: World Bank
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
India IN: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data was reported at 3.700 % in 2024. This records an increase from the previous number of 3.400 % for 2023. India IN: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data is updated yearly, averaging 2.300 % from Dec 2000 (Median) to 2024, with 25 observations. The data reached an all-time high of 3.700 % in 2024 and a record low of 2.100 % in 2013. India IN: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Social: Health Statistics. Prevalence of overweight children is the percentage of children under age 5 whose weight for height is more than two standard deviations above the median for the international reference population of the corresponding age as established by the WHO's 2006 Child Growth Standards.;UNICEF, WHO, World Bank: Joint child Malnutrition Estimates (JME).;Weighted average;Once considered only a high-income economy problem, overweight children have become a growing concern in developing countries. Research shows an association between childhood obesity and a high prevalence of diabetes, respiratory disease, high blood pressure, and psychosocial and orthopedic disorders (de Onis and Blössner 2003). Childhood obesity is associated with a higher chance of obesity, premature death, and disability in adulthood. In addition to increased future risks, obese children experience breathing difficulties and increased risk of fractures, hypertension, early markers of cardiovascular disease, insulin resistance, and psychological effects. Children in low- and middle-income countries are more vulnerable to inadequate nutrition before birth and in infancy and early childhood. Many of these children are exposed to high-fat, high-sugar, high-salt, calorie-dense, micronutrient-poor foods, which tend be lower in cost than more nutritious foods. These dietary patterns, in conjunction with low levels of physical activity, result in sharp increases in childhood obesity, while under-nutrition continues. Estimates are modeled estimates produced by the JME. Primary data sources of the anthropometric measurements are national surveys. These surveys are administered sporadically, resulting in sparse data for many countries. Furthermore, the trend of the indicators over time is usually not a straight line and varies by country. Tracking the current level and progress of indicators helps determine if countries are on track to meet certain thresholds, such as those indicated in the SDGs. Thus the JME developed statistical models and produced the modeled estimates.
Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.
The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.
The sample excludes the Northeast states and remote islands. The excluded area represents approximately 10% of the total adult population.
Individual
The target population is the civilian, non-institutionalized population 15 years and above.
Sample survey data [ssd]
The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.
Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.
Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.
The sample size in India was 3,518 individuals.
Face-to-face [f2f]
The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.
Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.
Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in India was worth 3912.69 billion US dollars in 2024, according to official data from the World Bank. The GDP value of India represents 3.69 percent of the world economy. This dataset provides the latest reported value for - India GDP - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers. The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters. The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules. The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains Food Prices data for India, sourced from the World Food Programme Price Database. The World Food Programme Price Database covers foods such as maize, rice, beans, fish, and sugar for 98 countries and some 3000 markets. It is updated weekly but contains to a large extent monthly data. The data goes back as far as 1992 for a few countries, although many countries started reporting from 2003 or thereafter.