https://www.pewresearch.org/about/terms-and-conditions/https://www.pewresearch.org/about/terms-and-conditions/
Pew Research Center conducted face-to-face surveys among 29,999 adults (ages 18 and older) across 26 Indian states and three union territories in 17 languages. The sample includes interviews with 22,975 Hindus, 3,336 Muslims, 1,782 Sikhs, 1,011 Christians, 719 Buddhists and 109 Jains. An additional 67 respondents belong to other religions or are religiously unaffiliated. Six groups were targeted for oversampling as part of the survey design: Muslims, Christians, Sikhs, Buddhists, Jains and those living in the Northeast region. Interviews were conducted under the direction of RTI International from November 17, 2019, to March 23, 2020. Data collection used computer-assisted personal interviews (CAPI) after random selection of households.
This project was produced by Pew Research Center as part of the Pew-Templeton Global Religious Futures project, which analyzes religious change and its impact on societies around the world. Funding for the Global Religious Futures project comes from The Pew Charitable Trusts and the John Templeton Foundation.
Two reports focused on the findings from this data: •Religion in India: Tolerance and Segregation: https://www.pewresearch.org/religion/2021/06/29/religion-in-india-tolerance-and-segregation/ •How Indians View Gender Roles in Families and Society: https://www.pewresearch.org/religion/2022/03/02/how-indians-view-gender-roles-in-families-and-society/
BackgroundIn India, acute respiratory infections (ARIs) are a leading cause of mortality in children under 5 years. Mapping the hotspots of ARIs and the associated risk factors can help understand their association at the district level across India.MethodsData on ARIs in children under 5 years and household variables (unclean fuel, improved sanitation, mean maternal BMI, mean household size, mean number of children, median months of breastfeeding the children, percentage of poor households, diarrhea in children, low birth weight, tobacco use, and immunization status of children) were obtained from the National Family Health Survey-4. Surface and ground-monitored PM2.5 and PM10 datasets were collected from the Global Estimates and National Ambient Air Quality Monitoring Programme. Population density and illiteracy data were extracted from the Census of India. The geographic information system was used for mapping, and ARI hotspots were identified using the Getis-Ord Gi* spatial statistic. The quasi-Poisson regression model was used to estimate the association between ARI and household, children, maternal, environmental, and demographic factors.ResultsAcute respiratory infections hotspots were predominantly seen in the north Indian states/UTs of Uttar Pradesh, Bihar, Delhi, Haryana, Punjab, and Chandigarh, and also in the border districts of Uttarakhand, Himachal Pradesh, and Jammu and Kashmir. There is a substantial overlap among PM2.5, PM10, population density, tobacco smoking, and unclean fuel use with hotspots of ARI. The quasi-Poisson regression analysis showed that PM2.5, illiteracy levels, diarrhea in children, and maternal body mass index were associated with ARI.ConclusionTo decrease ARI in children, urgent interventions are required to reduce the levels of PM2.5 and PM10 (major environmental pollutants) in the hotspot districts. Furthermore, improving sanitation, literacy levels, using clean cooking fuel, and curbing indoor smoking may minimize the risk of ARI in children.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the the household distribution across 16 income brackets among four distinct age groups in Indian Trail: Under 25 years, 25-44 years, 45-64 years, and over 65 years. The dataset highlights the variation in household income, offering valuable insights into economic trends and disparities within different age categories, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Income brackets:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Indian Trail median household income by age. You can refer the same here
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Tamil Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.
This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.
Participant Diversity:
- Speakers: 50+ native Tamil speakers from the FutureBeeAI Community.
- Regions: Ensures a balanced representation of Tamil Nadu1 accents, dialects, and demographics.
- Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
Recording Nature: Scripted wake word and command type of audio recordings.
- Duration: Average duration of 5 to 20 seconds per audio recording.
- Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.
Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.
Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.
Different Cars: Data collection was carried out in different types and models of cars.
Different Types of Voice Commands:
- Navigational Voice Commands
- Mobile Control Voice Commands
- Car Control Voice Commands
- Multimedia & Entertainment Commands
- General, Question Answer, Search Commands
Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.
- Morning
- Afternoon
- Evening
Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:
- Noise Level: Silent, Low Noise, Moderate Noise, High Noise
- Parking Location: Indoor, Outdoor
- Car Windows: Open, Closed
- Car AC: On, Off
- Car Engine: On, Off
- Car Movement: Stationary, Moving
The dataset provides comprehensive metadata for each audio recording and participant:
Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.
Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.
This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Tamil voice assistant speech recognition models.
This In-car Speech Dataset is a valuable resource for various applications in the field of in-car voice recognition and AI-driven voice technology. This dataset can be leveraged to enhance the performance and functionality of voice-activated systems across different domains.
Speech Recognition Model Training: Provides high-quality audio data for training models to accurately recognize and respond to in-car voice commands.
Safety and Emergency Response: Supports the development of systems that recognize and respond to emergency commands and safety alerts.
Driver Assistance: Facilitates the creation of advanced driver-assistance systems (ADAS) that leverage voice commands for hands-free operation.
Our proprietary data collection platform, “Yugo,” was used throughout the process of this dataset creation.
Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.
The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
It does not include any personally identifiable information about any participant, which makes the dataset safe to use.
Understanding the importance of diverse environments for robust voice assistant models, our in-car voice dataset is regularly updated with new audio data captured in various real-world conditions.
Customization & Custom Collection Options:
- Environmental Conditions: Custom collection in specific environmental conditions upon request.
- Sample Rates: Customizable from 8kHz to 48kHz.
- Diverse Pace: Custom collection can be done at a diverse pace upon request.
- Device Specific: Recording can be done with the specific mobile brand or operating system.
This Tamil In-car audio dataset is created by FutureBeeAI and is available for commercial use.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset offers a comprehensive overview of Indian School Education Statistics, covering the years 2021-2022. It provides a valuable resource for individuals embarking on their Data Science journey by consolidating various datasets from the Indian Government into a single, easily accessible source. The dataset is available in seven separate .csv files, each with its distinct focus, enabling users to explore diverse aspects of the education landscape in India.
This dataset is a treasure trove of information, offering a window into the dynamic landscape of education in India and its evolution over time. By delving into this dataset, you can unlock answers to various pressing questions and tackle pivotal issues, including:
Sourced from the Open Government Data (OGD) Platform India, this dataset not only serves as a valuable resource for beginners in their Data Science journey but also presents an array of opportunities for in-depth analysis and research within the realm of Indian education.
The dataset contains the many features
• Sl. No- it is index of the dataset
• State - The seven states of the India is given
• District - District of each States is given
• Block- Blocks of each Districts are mentioned
• Village - List of the villages of each districts given
• TYPE - Type of tube well
• SOURCE-Central Groundwater Board (CGWB) and State Groundwater Board (SGWB)
• Site Name
• Latitude
• Longitude
• Well Depth
• Aquifer
• Level Depth Ratio
• Pre_2015
• Pst_2015
• Pre_2016
• Pst_2016
• Pre_2017
• Pst_2017
• Pre_2018
• Pst_2018
• Pre_2019
• Pst_2019
• Final Count
• Pre Diff 19 15
• Pst Diff 19 15
• Diff 2015
• Diff 2016
• Diff 2017
• Diff 2018
• Diff 2019
• Avg Diff
• Avg Pre
• Avg Post
• Avg Level
• Sign Diff 15
• Sign Diff 16
• Sign Diff 17
• Sign Diff 18
• Sign Diff 19
• Sum Signed Diff **
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
The main aim is to find the criticality of the well depth across these states. As we know that many states of India are sufferings from acute water. S on the basis of analysis , we will figure out which well is in dangerous
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This dataset is based on a report 'Crimes against Women(2022)' by National Crime Records Bureau. It contains number of cases been registered across all Indian States/UTs against the crimes been committed against Women (includes adults & minors) that are recognizable within the Indian Penal Code. The gist of crimes mentioned within the dataset are: 1. Murder with Rape/Gang Rape 2. Dowry Deaths (Sec. 304B IPC) 3. Abetment to Suicide of Women (Sec. 305/306 IPC) 4. Miscarriage (Sec. 313 & 314 IPC) 5. Acid Attack (Sec. 326A IPC) 6. Attempt to Acid Attack (Sec. 326B IPC) 7. Cruelty by Husband or his relatives (Sec. 498 A IPC) 8. Kidnapping & Abduction of Women 9. Human Trafficking (Sec. 370 & 370A IPC) 10. Selling of Minor Girls (Sec. 372 IPC) 11. Buying of Minor Girls (Sec. 373 IPC) 12. Rape (Sec. 376 IPC) 13. Attempt to Commit Rape (Sec. 376/511 IPC) 14. Assault on Women with Intent to Outrage her Modesty (Sec. 354 IPC) 15. Insult to the Modesty of Women (Sec. 509 IPC) 16. Dowry Prohibition Act, 1961 17. Immoral Traffic (Prevention) Act 1956 (Women Victims cases only) 18. Protection of Women from Domestic Violence Act 19. Cyber Crimes/Information Technology Act (Women Centric Crimes only) 20. Protection of Children from Sexual Violence Act (Girl Child Victims only) 21. Indecent Representation of Women (Prohibition) Act, 1986
When precipitation falls on the surface of the Earth, much of it is captured in storage (e.g. lakes, aquifers, soil moisture, snowpack, and vegetation). Precipitation that exceeds the storage capacity of the landscape becomes runoff, which flows into river systems. Overland flow is the most visible form of runoff, causing erosion and flash floods, but subsurface flow is the larger contributor in many watersheds. Subsurface flow can emerge on the surface through springs, or more commonly, seep into rivers and lakes through their banks. In urban areas, impervious land cover drastically increases the amount of surface runoff generated, which sweeps trash and urban debris into waterways and increases the likelihood and severity of flash floods. In agricultural areas, surface or subsurface runoff can carry excess salts and nutrients, especially nitrogen and phosphorus. This map contains a historical record showing the amount of runoff generated each month from March 200 to present. It is reported in millimeters, so multiply by a surface area to calculate the total volume of runoff.Dataset SummaryThe GLDAS Runoff layer is a time-enabled image service that shows average monthly runoff from 2000 to the present measured in millimeters. It is calculated by NASA using the Noah land surface model, run at 0.25 degree spatial resolution using satellite and ground-based observational data from the Global Land Data Assimilation System (GLDAS-1). The model is run with 3-hourly time steps and aggregated into monthly averages. Review the complete list of model inputs, explore the output data (in GRIB format), and see the full Hydrology Catalog for all related data and information!What can you do with this layer?This layer is suitable for both visualization and analysis. It can be used in ArcGIS Online in web maps and applications and can be used in ArcGIS Desktop. t is useful for scientific modeling, but only at global scales.Time: This is a time-enabled layer. It shows the total runoff generated during the map's time extent, or if time animation is disabled, a time range can be set using the layer's multidimensional filter. The map shows the sum of all months in the time extent. Minimum temporal resolution is one month; maximum is one year.Variables: This layer has two variables: surface flow and subsurface flow. By default the two are summed, but you can view either by itself using the multidimensional filter. You must disable time animation on the layer before using its multidimensional filter.Important: You must switch from the cartographic renderer to the analytic renderer in the processing template tab in the layer properties window before using this layer as an input to geoprocessing tools.This layer has query, identify, and export image services available. This layer is part of a larger collection of earth observation maps that you can use to perform a wide variety of mapping and analysis tasks.The Living Atlas of the World provides an easy way to explore the earth observation layers and many other beautiful and authoritative maps on hundreds of topics.Geonet is a good resource for learning more about earth observations layers and the Living Atlas of the World. Follow the Living Atlas on GeoNet.
Estimates, total number of people per grid-cell. The dataset is available to download in Geotiff format at a resolution of 3 arc (approximately 100m at the equator). The projection is Geographic Coordinate System, WGS84. The units are number of people per pixel. The mapping approach is Random Forest-based dasymetric redistribution.
More information can be found in the Release Statement
Please note that these data represent 2025 Alpha release versions, constructed in September 2025
SUMMARY:
Vumonic provides its clients email receipt datasets on weekly, monthly, or quarterly subscriptions, for any online consumer vertical. We gain consent-based access to our users' email inboxes through our own proprietary apps, from which we gather and extract all the email receipts and put them into a structured format for consumption of our clients. We currently have over 1M users in our India panel.
If you are not familiar with email receipt data, it provides item and user-level transaction information (all PII-wiped), which allows for deep granular analysis of things like marketshare, growth, competitive intelligence, and more.
VERTICALS:
PRICING/QUOTE:
Our email receipt data is priced market-rate based on the requirement. To give a quote, all we need to know is:
Send us over this info and we can answer any questions you have, provide sample, and more.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Bengali Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.
This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.
Participant Diversity:
- Speakers: 50+ native Bengali speakers from the FutureBeeAI Community.
- Regions: Ensures a balanced representation of West Bengal1 accents, dialects, and demographics.
- Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
Recording Nature: Scripted wake word and command type of audio recordings.
- Duration: Average duration of 5 to 20 seconds per audio recording.
- Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.
Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.
Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.
Different Cars: Data collection was carried out in different types and models of cars.
Different Types of Voice Commands:
- Navigational Voice Commands
- Mobile Control Voice Commands
- Car Control Voice Commands
- Multimedia & Entertainment Commands
- General, Question Answer, Search Commands
Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.
- Morning
- Afternoon
- Evening
Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:
- Noise Level: Silent, Low Noise, Moderate Noise, High Noise
- Parking Location: Indoor, Outdoor
- Car Windows: Open, Closed
- Car AC: On, Off
- Car Engine: On, Off
- Car Movement: Stationary, Moving
The dataset provides comprehensive metadata for each audio recording and participant:
Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.
Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.
This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Bengali voice assistant speech recognition models.
This In-car Speech Dataset is a valuable resource for various applications in the field of in-car voice recognition and AI-driven voice technology. This dataset can be leveraged to enhance the performance and functionality of voice-activated systems across different domains.
Speech Recognition Model Training: Provides high-quality audio data for training models to accurately recognize and respond to in-car voice commands.
Safety and Emergency Response: Supports the development of systems that recognize and respond to emergency commands and safety alerts.
Driver Assistance: Facilitates the creation of advanced driver-assistance systems (ADAS) that leverage voice commands for hands-free operation.
Our proprietary data collection platform, “Yugo,” was used throughout the process of this dataset creation.
Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.
The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
It does not include any personally identifiable information about any participant, which makes the dataset safe to use.
Understanding the importance of diverse environments for robust voice assistant models, our in-car voice dataset is regularly updated with new audio data captured in various real-world conditions.
Customization & Custom Collection Options:
- Environmental Conditions: Custom collection in specific environmental conditions upon request.
- Sample Rates: Customizable from 8kHz to 48kHz.
- Diverse Pace: Custom collection can be done at a diverse pace upon request.
- Device Specific: Recording can be done with the specific mobile brand or operating system.
This Bengali In-car audio dataset is created by FutureBeeAI and is available for commercial use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
India Proportion of People Living Below 50 Percent Of Median Income: % data was reported at 9.800 % in 2021. This records a decrease from the previous number of 10.000 % for 2020. India Proportion of People Living Below 50 Percent Of Median Income: % data is updated yearly, averaging 6.200 % from Dec 1977 (Median) to 2021, with 14 observations. The data reached an all-time high of 10.300 % in 2019 and a record low of 5.100 % in 2004. India Proportion of People Living Below 50 Percent Of Median Income: % data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Social: Poverty and Inequality. The percentage of people in the population who live in households whose per capita income or consumption is below half of the median income or consumption per capita. The median is measured at 2017 Purchasing Power Parity (PPP) using the Poverty and Inequality Platform (http://www.pip.worldbank.org). For some countries, medians are not reported due to grouped and/or confidential data. The reference year is the year in which the underlying household survey data was collected. In cases for which the data collection period bridged two calendar years, the first year in which data were collected is reported.;World Bank, Poverty and Inequality Platform. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. Data for high-income economies are mostly from the Luxembourg Income Study database. For more information and methodology, please see http://pip.worldbank.org.;;The World Bank’s internationally comparable poverty monitoring database now draws on income or detailed consumption data from more than 2000 household surveys across 169 countries. See the Poverty and Inequality Platform (PIP) for details (www.pip.worldbank.org).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
India People Practicing Open Defecation: Rural: % of Rural Population data was reported at 17.006 % in 2022. This records a decrease from the previous number of 20.380 % for 2021. India People Practicing Open Defecation: Rural: % of Rural Population data is updated yearly, averaging 54.448 % from Dec 2000 (Median) to 2022, with 23 observations. The data reached an all-time high of 91.486 % in 2000 and a record low of 17.006 % in 2022. India People Practicing Open Defecation: Rural: % of Rural Population data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s India – Table IN.World Bank.WDI: Social: Access to Services. People practicing open defecation refers to the percentage of the population defecating in the open, such as in fields, forest, bushes, open bodies of water, on beaches, in other open spaces or disposed of with solid waste.;WHO/UNICEF Joint Monitoring Programme (JMP) for Water Supply, Sanitation and Hygiene (washdata.org).;Weighted average;This is a disaggregated indicator for Sustainable Development Goal 6.2.1 [https://unstats.un.org/sdgs/metadata/].
https://store.poidata.xyz/in Point-of-interest (POI) is defined as a physical entity (such as a business) in a geo location (point) which may be (of interest).
We strive to provide the most accurate, complete and up to date point of interest datasets for all countries of the world. The India POI Dataset is one of our worldwide POI datasets.
This is our process flow:
Our machine learning systems continuously crawl for new POI data
Our geoparsing and geocoding calculates their geo locations
Our categorization systems cleanup and standardize the datasets
Our data pipeline API publishes the datasets on our data store
POI Data is in a constant flux - especially so during times of drastic change such as the Covid-19 pandemic.
Every minute worldwide on an average day over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist.
In today's interconnected world, of the approximately 200 million POIs worldwide, over 94% have a public online presence. As a new POI comes into existence its information will appear very quickly in location based social networks (LBSNs), other social media, pictures, websites, blogs, press releases. Soon after that, our state-of-the-art POI Information retrieval system will pick it up.
We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via a recurring payment plan on our data update pipeline.
The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.
The core attribute coverage for India is as follows: Poi Field Data Coverage (%) poi_name 100 brand 3 poi_tel 17 formatted_address 100 main_category 100 latitude 100 longitude 100 neighborhood 7 source_url 24 email 2 opening_hours 26
The dataset may be viewed online at https://store.poidata.xyz/in and a data sample may be downloaded at https://store.poidata.xyz/datafiles/in_sample.csv
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:
-How often do people visit a location? (daily, monthly, absolute, and averages).
-What type of places do they visit ? (parks, schools, hospitals, etc)
-Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors.
-What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?
Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.
Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.
We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.
Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.
Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.
Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.
Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.
POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.
Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.
Delivery schemas We can deliver the data in three different formats:
Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.
Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.
Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
The dataset was created as part of an ESRC-sponsored study, ‘British economic, social, and cultural interactions with Asia, 1760-1833’. It contains statistics relating to the trade and domestic finances of the monopolistic English East India Company primarily between 1755 and 1834, the year in which the Company ceased to function as a commercial organization. Until now quantitative data derived from original sources has only been available in time series for the Company’s trade and some aspects of its domestic finances for the years before 1760. But many of the details, patterns, and trends of trade and finance in the decades after 1760, a most important period when the Company fully embarked on the interlinked processes of military, political, and commercial expansion in Asia, have remained unclear. In creating this dataset, the aim was thus two-fold: i) to establish for the first time a set of statistics detailing the changing value, volume, and geographical structure of the East India Company’s overseas trade for the period when the Company began to exert imperial control over large parts of the Indian subcontinent; and ii) to generate select statistics relating to the Company’s domestic finances, thereby enabling analysis to be undertaken of a range of Company interactions with Britain’s economy and society.
Use this application to view the pattern of concentrations of people by race and Hispanic or Latino ethnicity. Data are provided at the U.S. Census block group level, one of the smallest Census geographies, to provide a detailed picture of these patterns. The data is sourced from the U.S Census Bureau, 2020 Census Redistricting Data (Public Law 94-171) Summary File. Definitions: Definitions of the Census Bureau’s categories are provided below. This interactive map shows patterns for all categories except American Indian or Alaska Native and Native Hawaiian or Other Pacific Islander. The total population countywide for these two categories is small (1,582 and 263 respectively). The Census Bureau uses the following race categories:Population by RaceWhite – A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.Black or African American – A person having origins in any of the Black racial groups of Africa.American Indian or Alaska Native – A person having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment.Asian – A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.Native Hawaiian or Other Pacific Islander – A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.Some Other Race - this category is chosen by people who do not identify with any of the categories listed above. People can identify with more than one race. These people are included in the Two or More Races Hispanic or Latino PopulationThe Hispanic/Latino population is an ethnic group. Hispanic/Latino people may be of any race.Other layers provided in this tool included the Loudoun County Census block groups, towns and Dulles airport, and the Loudoun County 2021 aerial imagery.
https://india-data.org/terms-conditionshttps://india-data.org/terms-conditions
Segmentation of sub-cortical structures from MRI scans is of interest in many neurological diagnoses. Indian Brain Segmentation Dataset (IBSD) consists of high-quality 1.5T T1w MRI data of 114 subjects generated under fixed imaging protocol along with corresponding manual annotation data of 14 sub-cortical structures done by expert radiologists. The number of MR scans in the dataset consists of an approximately equal number of male and female subjects belonging to a young age group (20-30 years). This data has been used to create a template for the young Indian population. This dataset can also be utilized for variety of tasks such as segmenting structures of interest, aligning/ registering images, etc, using traditional methods as well as Deep Learning approaches since it has adequate quantity of high quality data. Focus Area : Neuro and Mental Health.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Disposable Personal Income in India increased to 296383300 INR Million in 2023 from 273364818.90 INR Million in 2022. This dataset provides - India Total Disposable Personal Income - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
dataset contains detailed financial and demographic data for 20,000 individuals, focusing on income, expenses, and potential savings across various categories. The data aims to provide insights into personal financial management and spending patterns.
Income
: Monthly income in currency units.Age
: Age of the individual.Dependents
: Number of dependents supported by the individual.Occupation
: Type of employment or job role.City_Tier
: A categorical variable representing the living area tier (e.g., Tier 1, Tier 2).Rent
, Loan_Repayment
, Insurance
, Groceries
, Transport
, Eating_Out
, Entertainment
, Utilities
, Healthcare
, Education
, and Miscellaneous
record various monthly expenses.Desired_Savings_Percentage
and Desired_Savings
: Targets for monthly savings.Disposable_Income
: Income remaining after all expenses are accounted for.Groceries
, Transport
, Eating_Out
, Entertainment
, Utilities
, Healthcare
, Education
, and Miscellaneous
.https://www.pewresearch.org/about/terms-and-conditions/https://www.pewresearch.org/about/terms-and-conditions/
Pew Research Center conducted face-to-face surveys among 29,999 adults (ages 18 and older) across 26 Indian states and three union territories in 17 languages. The sample includes interviews with 22,975 Hindus, 3,336 Muslims, 1,782 Sikhs, 1,011 Christians, 719 Buddhists and 109 Jains. An additional 67 respondents belong to other religions or are religiously unaffiliated. Six groups were targeted for oversampling as part of the survey design: Muslims, Christians, Sikhs, Buddhists, Jains and those living in the Northeast region. Interviews were conducted under the direction of RTI International from November 17, 2019, to March 23, 2020. Data collection used computer-assisted personal interviews (CAPI) after random selection of households.
This project was produced by Pew Research Center as part of the Pew-Templeton Global Religious Futures project, which analyzes religious change and its impact on societies around the world. Funding for the Global Religious Futures project comes from The Pew Charitable Trusts and the John Templeton Foundation.
Two reports focused on the findings from this data: •Religion in India: Tolerance and Segregation: https://www.pewresearch.org/religion/2021/06/29/religion-in-india-tolerance-and-segregation/ •How Indians View Gender Roles in Families and Society: https://www.pewresearch.org/religion/2022/03/02/how-indians-view-gender-roles-in-families-and-society/