https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.
All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.
Here is the data dictionary for (Indian) Shark Tank season's dataset.
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:
-How often do people visit a location? (daily, monthly, absolute, and averages).
-What type of places do they visit ? (parks, schools, hospitals, etc)
-Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors.
-What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?
Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.
Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.
We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.
Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.
Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.
Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.
Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.
POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.
Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.
Delivery schemas We can deliver the data in three different formats:
Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.
Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.
Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mother earth is enriched and nourished with a variety of plants. These plants are useful in many ways such as drug formulation, production of herbal products, and medicines to cure many common ailments and diseases. For the past 5000 years, Ayurveda, a traditional Indian medicinal system is widely accepted even today. India is a rich country for being the habitat for a variety of medicinal plants. Many parts of the plants such as leaves, bark, root, seeds, fruits, and many more are used as a vital ingredient for the production of herbal medicines. Herbal medicines are preferred in both developing and developed countries as an alternative to synthetic drugs mainly because of no side effects. Recognition of these plants by human sight will be tedious, time-consuming, and inaccurate. Applications of image processing and computer vision techniques for the identification of the medicinal plants are very crucial as many of them are under extinction as per the IUCN records. Hence, the digitization of useful medicinal plants is crucial for the conservation of biodiversity. Studies reveal that to build an intelligent system for recognition of medicinal herbs requires a decent size of plant leaf dataset. The dataset comprises of thirty species of healthy medicinal herbs such as Santalum album (Sandalwood), Muntingia calabura (Jamaica cherry), Plectranthus amboinicus / Coleus amboinicus (Indian Mint, Mexican mint), Brassica juncea (Oriental mustard), and many more. The dataset consists of 1500 images of forty species. Each species consist of 60 to 100 high-quality images. The folders are named as per the species botanical/scientific name. The leaves plucked are from different plants of the same species available in local gardens. It is keenly ensured not to pluck many leaves to build the dataset as it goes to waste after capturing a picture of it. Healthy and mature leaves are selected for the dataset. The instruments used are a Mobile camera (Model: Samsung s9+) and printer (Model: Canon Inkjet Printer). The images of the leaf in the dataset are slightly rotated and tilted to take its utmost advantage in training any machine learning and deep learning models. The contribution of the medicinal plant leaf dataset to develop Artificial Intelligence models (machine learning and deep learning) will assist many researchers and computer scientists to detect, identify the species and its diseases and learn more about the herb existence and medicinal properties. By releasing this dataset to the community, we look forward to stimulate research in medicinal plants where the current lack of public datasets is one of the main barriers for progress.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.
Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html
The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.
Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).
In the following, we describe how the search results have been collected.
Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.
To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.
A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.
The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).
Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.
The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.
Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.
The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.
One term of usage applies:
In any research product whose findings are based on this dataset, please cite
@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4\_3}, doi = {10.1007/978-3-030-11226-4\_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IGLD is a dataset which is a mirror of the data utilised in the SEASON project selected from OpenAlex. It contains Indo-German research articles for research of academic collaboration between 1990 and 2022.
Our paper describing our work in the SEASON project:
Aasif Ahmad Mir, Nina Sminova, Jeyshankar Ramalingam, & Philipp Mayr (2024). The rise of Indo-German collaborative research: 1990-2022. In Global Knowledge, Memory and Communication, 2024. https://doi.org/10.1108/GKMC-09-2023-0328
The following search query: CU (“GERMANY” AND “INDIA”) was used to retrieve the data from WoS. The data were retrieved from the year 1990 till the 30th of November 2022. A total of 36,999 records were retrieved against the employed query. For the present dataset, we retrieved only articles identical to those from WoS.
Our original dataset retrieved from WoS consisted of 36,999 entries. 33,319 entries possess a valid DOI, and 3,680 entries do not have a DOI. Therefore, we developed two approaches for retrieving desired data from the Openalex collection. Articles possessing a DOI were matched by DOI (dataset 1), and articles without DOI (dataset 2) were matched by article title and publication year.
Afterwards, DOIs in dataset 1 were additionally compared to the DOIs from the original WoS dataset, all inconsistencies were removed.
For dataset 2, authors were additionally checked. Authors’ surnames from dataset 2 and authors’ surnames from corresponding articles (matching by title and publication year) from the WoS dataset were compared. Only articles with matching publishing years, author surnames lists and titles were considered for the Openalex dataset. Following, dataset 1 and dataset 2 were combined into one final dataset (Openalex data).
Additionally, all duplicates (by article ID) were removed from the Openalex data. In the final step, we checked if all entries contained both German and Indian affiliations. Some inconsistencies with the WoS data were observed: 5,584 entries, which have both Indian and German affiliations in WoS had only one of the indicated above affiliations in the Openalex. These entries were removed from the final dataset. The final dataset resulted in 22,844 unique entries.
These descriptions are relevant summaries or extracts from the documentation at
https://docs.openalex.org/api-entities/works/work-object,
https://docs.openalex.org/api-entities/authors/author-object and
https://docs.openalex.org/api-entities/institutions/institution-object.
article_id
(Work attribute)
OpenAlex identifier for the article / work.
To retrieve the work you can visit https://openalex.org/works/
doi
(Work attribute)
Digital Object Identifier for the work.
Consists of a URL to doi.org
title
(Work attribute)
Title of the work.
article_display_name
(Work attribute)
Duplicate of "title" column, retained to match other OpenAlex objects' attribute.
publication_year
(Work attribute)
The year in which the work was published.
Please note that this is respective to the version of the work captured by OpenAlex as this particular entry. Other and potentially earlier published versions may be accessible in the work's location field, accessible from OpenAlex.
publication_date
(Work attribute)
An ISO 8601 formatted date for the publication of the work.
The same caveat to publication_year applies to publication_date.
article_type
(Work attribute)
Type of work.
E.g. Article, conference paper, report, dataset, etc.
article_type_crossref
(Work attribute)
Legacy type information inherited from Crossref.
article_cited_by_count
(Work attribute)
Number of citations to the work.
article_cited_by_api_url
(Work attribute)
A OpenAlex URL that allows the user to view the works which cite this work.
article_grants
(Work attribute)
A list of details for the grants which the work is in receipt from.
This information is gathered from Crossref and is described by OpenAlex at time of publication as "limited".
article_referenced_works_count
(Work attribute)
Number of works within OpenAlex that this work cites.
Please note that the total number of references in the work may be higher
language
(Work attribute)
The ISO 639-1 style Language of the work.
This attribute is inferred a software library (langdetect) used by OpenAlex based on the abstract, or title if the abstract is not available.
article_counts_by_year
(Work attribute)
A list of the citation count of this work per year, for up to the last 10 years.
article_locations_count
(Work attribute)
Number of locations this work can be found.
In OpenAlex, "locations" refer to the places on the internet where versions of this work is accessible.
author_id
(Author attribute)
OpenAlex identifier for an author of the work.
To retrieve OpenAlex's bibliography for this user you may visit https://openalex.org/authors/.
The following author attributes are associated with the author identifier in each row, please note that a work with multiple authors may have multiple rows, one for each author in OpenAlex.
orcid
(Author attribute)
ORCID identifier for the author.
author_name
(Author attribute)
Name of the author.
author_name_alternatives
(Author attribute)
Alternative formats for the author's name which OpenAlex has observed.
author_works_count
(Author attribute)
Number of works the author has created.
author_cited_by_count
(Author attribute)
Number of works which cite a work the author has created.
author_last_known_institution
(Author attribute)
Identifier for the institution with which the author is affiliated with, in the most recent publication from the author containing an institutional identifier.
Please note this may differ from the institution associated with the author at time of the work's release, which is listed in this database as "institution_id".
author_summary_stats
(Author attribute)
OpenAlex's citation metrics for the author.
These include citation count, i10-index, h-index and more.
institution_id
(Institution attribute)
OpenAlex identifier for the institution associated with the author when the work was published.
ror
(Institution attribute)
Research Organization Registry (ROR) identifier for the institution.
institution_name
(Institution attribute)
Name of the institution.
institution_country_code
(Institution attribute)
ISO 3166-1 Alpha-2 (two-letter) country code for the country in which the institution is located.
insitution_type
(Institution attribute)
ROR-style primary type for the institution.
institution_homepage_url
(Institution attribute)
A URL for the institution's primary homepage
institution_display_name_acroynyms
(Institution attribute)
Known acronyms or initialisms for the institution.
institution_display_name_alternatives
(Institution attribute)
Alternative names for the institution.
institution_works_count
(Institution attribute)
The number of works created by authors affiliated with this institution.
institution_cited_by_count
(Institution attribute)
The number of works that cite a work created by authors affiliated by the institution.
institution_summary_stats
(Institution attribute)
Citation metrics for the institution
Similar to author_summary_stats.
Nina Smirnova - nina.smirnova@gesis.org
Released to Zenodo 1st Feb
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Shark Tank India Companies’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/devanshu125/shark-tank-india-companies on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Recently, I saw a dataset based on Shark Tank USA. This dataset inspired me to create one for India as well and since season 1 recently ended, I thought this was the perfect time to look at some insights based on the deals.
This dataset contains the following information -
1. episode
- episode number
2. pitch_no
- pitch number (unique)
3. company
- company name
4. idea
- company description
5. deal
- final deal that was taken
6. ashneer
- Did Ashneer invest?
7. namita
- Did Ashneer invest?
8. anupam
- Did Anupam invest?
9. vineeta
- Did Vineeta invest?
10. aman
- Did Aman invest?
11. peyush
- Did - Did Peyush invest?
12. ghazal
- Did Ghazal invest?
This data was scraped from Wikipedia.
--- Original source retains full ownership of the source dataset ---
This Global Dated Landslide Database (GDLDB) is part of the project WeMonitor (Weakly Supervised Deep Learning Models for Detecting and Monitoring Spatio-Temporal Anomalies in Optical and Radar Satellite Time Series), funded by the Helmholtz Imaging Platform. The aim is to develop a deep learning model that uses satellite image time series from Sentinel1/2 to automatically monitor changes caused, for example, by landslides, deforestation, large fires, dam failures, or the emergence of waste dumps. To train such a model, a reference dataset is required that shows the area and date of the changes as precise as possible. To allow for a generic and transferable model, the reference data also needs to cover the diversity of the process to be detected. Thus, the aim of the GDLDB is to comprise landslides of different sizes, shapes, and types, occurring at different seasons and in different regions with varying natural conditions and different triggering mechanisms such as rainfall and earthquake-induced landslides. To build the GDLDB, available local and regional landslide inventories from around the world are combined into one coherent database by verifying their location and date of occurrence with high-resolution remote sensing data. The selection criteria for the source inventories are the definition of the landslide location as polygons, at least a rough indication of the landslide origin date, and that the landslides occurred during the Sentinel-2 data availability from 2016 onwards. A total of 16 individual inventories are included (Table 1), one each from the USA, Dominica, Italy, Zimbabwe, southern India, Nepal, China, Papua New Guinea, and New Zealand, and two each from Kyrgyzstan, Japan, and the Philippines. In addition, a global inventory was added, including a small number of landslides from the USA, Peru, Chile, Europe, Pakistan, Nepal, India, and Taiwan, and a larger number of landslides from Indonesia. From each inventory, approximately 100 landslides were randomly selected to ensure an unbiased selection of landslides in terms of shape, size, and location. The original source inventories are produced using a variety of methods, including manual mapping in airborne data with ground verification and automatic identification in satellite remote sensing data. As a result, the mapping quality of the inventories varies greatly. In cases where landslides could not be verified by us using available optical remote sensing data (e.g. Sentinel-2, Planet Scope, and data available in Google Earth) new polygons are selected until the number of approximately 100 landslides is reached. In some inventories, the number of 100 landslides could not be guaranteed, due to a lack of suitable landslides (e.g., small size, incorrect classification) or the total number of landslides in the selected inventory was less than 100. For inventories with a lot of small landslides, that were difficult or impossible to observe, a size threshold of 1000m2 was introduced.
Vadu Rural Health Program, KEM Hospital Research Centre Pune has a rich tradition in health care and development being in the forefront of needs-based, issue-driven research over almost 35 years. During the decades of 1980 and 1990 the research at Vadu focused on mother and child with epidemiological and social science research exploring low birth weight, child survival, maternal mortality, safe abortion and domestic violence. The research portfolio has ever since expanded to include adult health and aging, non-communicable and communicable diseases and to clinical trials in recent years. It started with establishment of Health and Demographic Surveillance System at Vadu (HDSS Vadu) in August, 2002 that seeks to establish a quasi-experimental design setting to allow evaluation of impact of health interventions as well as monitor secular trends in diseases, risk factors and health behavior of humans.
The term "demographic surveillance" means to keep close track of the population dynamics. Vadu HDSS deals with keeping track of health issues and demographic changes in Vadu rural health program (VRHP) area. It is one of the most promising projects of national relevance that aims at establishing a quasi-experimental intervention research setting with the following objectives: 1) To create a longitudinal data base for efficient service delivery, future research, and linking all past micro-studies in Vadu area 2) Monitoring trends in public health problems 3) Keeping track of population dynamics 4) Evaluating intervention services
This dataset contains the events of all individuals ever resident during the study period (1 Jan. 2009 to 31 Dec. 2015).
Vadu HDSS falls in two administrative blocks: (1) Shirur and (2) Haweli of Pune district in Maharashtra in western India. It covers an area of approximately 232 square kilometers.
Individual
Vadu HDSS covers as many as 50,000 households having 140,000 population spread across 22 villages.
Event history data
Two rounds per year
Vadu area including 22 villages in two administrative blocks is the study area. This area was selected as this is primarily coverage area of Vadu Rural Health Program which is in function since more than four decade. Every individual household is included in HDSS. There is no sampling strategy employed as 100% population coverage in the area is expected.
Proxy Respondent [proxy]
Language of communication is in Marath or Hindi. The form labels are multilingual - in English and Marathi, but the data entered through the forms are in English only.
The following forms were used:
- Field Worker Checklist Form - The checklist provides a guideline to ensure that all the households are covered during the round and the events occurred in each household are captured.
- Enumeration Form: To capture the population details at the start of the HDSS or any addition of villages afterwards.
- Pregnancy Form: To capture pregnancy details of women in the age group 15 to 49.
- Birth Form: To capture the details of the birth events.
- Inmigration Form: To capture inward population movement from outside the HDSS area and also for movement within the HDSS area.
- Outmigration Form: To capture outward population movement from inside the HDSS area and also for movement within the HDSS area.
- Death Form: To capture death events.
Entered data undergo a data cleaning process. During the cleaning process all error data are either corrected in consultaiton with the data QC team or the respective forms are sent back to the field for re collection of correct data. Data editors have the access to the raw dataset for making necessary editing after corrected data are bought from the field.
For all individuals whose enumeration (ENU), Inmigration (IMG) or Birth (BTH) have occurred before the left censoring date (2009-01-01) and have not outmigrated (OMG) or not died (DTH) before the left censoring date (2009-01-01) are included in the dataset as Enumeration (ENU) with EventDate as the left censored date (2009-01-01). But the actual date of observation of the event (ENU, BTH, IMG) is retained in the dataset as observation date for these left censored ENU events. The individual is dropped from the dataset if their end event (OMG or DTH) is prior to the left censoring date (2009-01-01)
On an average the response rate is 99.99% in all rounds over the years.
Not Applicable
Data is cleaned to an acceptable level against the standard data rules using Pentaho Data Integration Comminity Edition (PDI CE) tool. After the cleaning process, quality metrics were as follows:
CentreId MetricTable QMetric Illegal Legal Total Metric RunDate
IN021 MicroDataCleaned Starts 1 301112 301113 0. 2017-05-31 20:06
IN021 MicroDataCleaned Transitions 0 667010 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned Ends 301113 2017-05-31 20:07
IN021 MicroDataCleaned SexValues 29 666981 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned DoBValues 575 666435 667010 0. 2017-05-31 20:07
Note: Except lower under five mortality in 2012 and lower adult mortality among females in 2013, all other estimates are fairly within expected range. Data underwent additional review in terms of electronic data capture, data cleaning and management to look for reasons for lower under five mortality rates in 2013 and lower female adult mortality in 2013. The additional review returned marginally higher rates and this supplements the validity of collected data. Further field related review of 2012 and 2013 data are underway and any revisions to published data/figures will be shared at a later stage.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.
This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:
This diversity ensures robust training for real-world voice assistant applications.
Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The second National Family Health Survey (NFHS-2), conducted in 1998-99, provides information on fertility, mortality, family planning, and important aspects of nutrition, health, and health care. The International Institute for Population Sciences (IIPS) coordinated the survey, which collected information from a nationally representative sample of more than 90,000 ever-married women age 15-49. The NFHS-2 sample covers 99 percent of India's population living in all 26 states. This report is based on the survey data for 25 of the 26 states, however, since data collection in Tripura was delayed due to local problems in the state. IIPS also coordinated the first National Family Health Survey (NFHS-1) in 1992-93. Most of the types of information collected in NFHS-2 were also collected in the earlier survey, making it possible to identify trends over the intervening period of six and one-half years. In addition, the NFHS-2 questionnaire covered a number of new or expanded topics with important policy implications, such as reproductive health, women's autonomy, domestic violence, women's nutrition, anaemia, and salt iodization. The NFHS-2 survey was carried out in two phases. Ten states were surveyed in the first phase which began in November 1998 and the remaining states (except Tripura) were surveyed in the second phase which began in March 1999. The field staff collected information from 91,196 households in these 25 states and interviewed 89,199 eligible women in these households. In addition, the survey collected information on 32,393 children born in the three years preceding the survey. One health investigator on each survey team measured the height and weight of eligible women and children and took blood samples to assess the prevalence of anaemia. SUMMARY OF FINDINGS POPULATION CHARACTERISTICS Three-quarters (73 percent) of the population lives in rural areas. The age distribution is typical of populations that have recently experienced a fertility decline, with relatively low proportions in the younger and older age groups. Thirty-six percent of the population is below age 15, and 5 percent is age 65 and above. The sex ratio is 957 females for every 1,000 males in rural areas but only 928 females for every 1,000 males in urban areas, suggesting that more men than women have migrated to urban areas. The survey provides a variety of demographic and socioeconomic background information. In the country as a whole, 82 percent of household heads are Hindu, 12 percent are Muslim, 3 percent are Christian, and 2 percent are Sikh. Muslims live disproportionately in urban areas, where they comprise 15 percent of household heads. Nineteen percent of household heads belong to scheduled castes, 9 percent belong to scheduled tribes, and 32 percent belong to other backward classes (OBCs). Two-fifths of household heads do not belong to any of these groups. Questions about housing conditions and the standard of living of households indicate some improvements since the time of NFHS-1. Sixty percent of households in India now have electricity and 39 percent have piped drinking water compared with 51 percent and 33 percent, respectively, at the time of NFHS-1. Sixty-four percent of households have no toilet facility compared with 70 percent at the time of NFHS-1. About three-fourths (75 percent) of males and half (51 percent) of females age six and above are literate, an increase of 6-8 percentage points from literacy rates at the time of NFHS-1. The percentage of illiterate males varies from 6-7 percent in Mizoram and Kerala to 37 percent in Bihar and the percentage of illiterate females varies from 11 percent in Mizoram and 15 percent in Kerala to 65 percent in Bihar. Seventy-nine percent of children age 6-14 are attending school, up from 68 percent in NFHS-1. The proportion of children attending school has increased for all ages, particularly for girls, but girls continue to lag behind boys in school attendance. Moreover, the disparity in school attendance by sex grows with increasing age of children. At age 6-10, 85 percent of boys attend school compared with 78 percent of girls. By age 15-17, 58 percent of boys attend school compared with 40 percent of girls. The percentage of girls 6-17 attending school varies from 51 percent in Bihar and 56 percent in Rajasthan to over 90 percent in Himachal Pradesh and Kerala. Women in India tend to marry at an early age. Thirty-four percent of women age 15-19 are already married including 4 percent who are married but gauna has yet to be performed. These proportions are even higher in the rural areas. Older women are more likely than younger women to have married at an early age: 39 percent of women currently age 45-49 married before age 15 compared with 14 percent of women currently age 15-19. Although this indicates that the proportion of women who marry young is declining rapidly, half the women even in the age group 20-24 have married before reaching the legal minimum age of 18 years. On average, women are five years younger than the men they marry. The median age at marriage varies from about 15 years in Madhya Pradesh, Bihar, Uttar Pradesh, Rajasthan, and Andhra Pradesh to 23 years in Goa. As part of an increasing emphasis on gender issues, NFHS-2 asked women about their participation in household decisionmaking. In India, 91 percent of women are involved in decision-making on at least one of four selected topics. A much lower proportion (52 percent), however, are involved in making decisions about their own health care. There are large variations among states in India with regard to women's involvement in household decisionmaking. More than three out of four women are involved in decisions about their own health care in Himachal Pradesh, Meghalaya, and Punjab compared with about two out of five or less in Madhya Pradesh, Orissa, and Rajasthan. Thirty-nine percent of women do work other than housework, and more than two-thirds of these women work for cash. Only 41 percent of women who earn cash can decide independently how to spend the money that they earn. Forty-three percent of working women report that their earnings constitute at least half of total family earnings, including 18 percent who report that the family is entirely dependent on their earnings. Women's work-participation rates vary from 9 percent in Punjab and 13 percent in Haryana to 60-70 percent in Manipur, Nagaland, and Arunachal Pradesh. FERTILITY AND FAMILY PLANNING Fertility continues to decline in India. At current fertility levels, women will have an average of 2.9 children each throughout their childbearing years. The total fertility rate (TFR) is down from 3.4 children per woman at the time of NFHS-1, but is still well above the replacement level of just over two children per woman. There are large variations in fertility among the states in India. Goa and Kerala have attained below replacement level fertility and Karnataka, Himachal Pradesh, Tamil Nadu, and Punjab are at or close to replacement level fertility. By contrast, fertility is 3.3 or more children per woman in Meghalaya, Uttar Pradesh, Rajasthan, Nagaland, Bihar, and Madhya Pradesh. More than one-third to less than half of all births in these latter states are fourth or higher-order births compared with 7-9 percent of births in Kerala, Goa, and Tamil Nadu. Efforts to encourage the trend towards lower fertility might usefully focus on groups within the population that have higher fertility than average. In India, rural women and women from scheduled tribes and scheduled castes have somewhat higher fertility than other women, but fertility is particularly high for illiterate women, poor women, and Muslim women. Another striking feature is the high level of childbearing among young women. More than half of women age 20-49 had their first birth before reaching age 20, and women age 15-19 account for almost one-fifth of total fertility. Studies in India and elsewhere have shown that health and mortality risks increase when women give birth at such young ages?both for the women themselves and for their children. Family planning programmes focusing on women in this age group could make a significant impact on maternal and child health and help to reduce fertility. INFANT AND CHILD MORTALITY NFHS-2 provides estimates of infant and child mortality and examines factors associated with the survival of young children. During the five years preceding the survey, the infant mortality rate was 68 deaths at age 0-11 months per 1,000 live births, substantially lower than 79 per 1,000 in the five years preceding the NFHS-1 survey. The child mortality rate, 29 deaths at age 1-4 years per 1,000 children reaching age one, also declined from the corresponding rate of 33 per 1,000 in NFHS-1. Ninety-five children out of 1,000 born do not live to age five years. Expressed differently, 1 in 15 children die in the first year of life, and 1 in 11 die before reaching age five. Child-survival programmes might usefully focus on specific groups of children with particularly high infant and child mortality rates, such as children who live in rural areas, children whose mothers are illiterate, children belonging to scheduled castes or scheduled tribes, and children from poor households. Infant mortality rates are more than two and one-half times as high for women who did not receive any of the recommended types of maternity related medical care than for mothers who did receive all recommended types of care. HEALTH, HEALTH CARE, AND NUTRITION Promotion of maternal and child health has been one of the most important components of the Family Welfare Programme of the Government of India. One goal is for each pregnant woman to receive at least three antenatal check-ups plus two tetanus toxoid injections and a full course of iron and folic acid supplementation. In India, mothers of 65 percent of the children born in the three years preceding NFHS-2 received at least one antenatal
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides comprehensive information on waste management and recycling practices in various cities across India. It includes key data related to waste generation, recycling rates, population density, municipal efficiency, landfill details, and more. The data spans multiple years (2019–2023) and covers a range of waste types, including plastic, organic waste, electronic waste (e-waste), construction waste, and hazardous waste.
The dataset aims to: - Promote efficient waste management practices across Indian cities. - Analyze trends in recycling and waste disposal methods. - Provide insights for improving municipal management systems. - Support research and development in sustainability, environmental science, and urban planning.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">
People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.
For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.
India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.
https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">
Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source
The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.
The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters
The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.
And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
As Indian Railway Dataset is not thoroughly available, we thought of creating one and give it to the world! There is no missing data in this dataset.
We have made this dataset using some info from data.gov.in, and added distance along with another table train_info, and much more cleaning. There are 2 files in this dataset train_info and train_schedule. train_schedule has more than 186000 rows while train_info consists of 11114 rows.
This dataset was part of our DBMS project which is hosted and live on **http://www.railways.live **
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India
This dataset contains mammal occurrence records from 2022 to 2024 in the Sakleshpura region of central Western Ghats, India. It includes a few occurrence records of other chordates. Occurrence records were gathered in the field by researchers of the Nature Conservation Foundation, India, using a mobile data collection application. Suggested citation is:
Nature Conservation Foundation (2024). Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India. Nature Conservation Foundation, India. Dataset
Keywords: tropical rainforest, plantations, Sakleshpura, animal distribution, Western Ghats
CONTACT #1
1. Name: Anand M Osuri
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: aosuri@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-9909-5633
CONTACT #2
1. Name: Vijay Karthick
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: vijayk@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-6023-3955
CONTACT #3
1. Name: Vijay Kumar
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: vijaykumar@ncf-india.org
5. ORCID: https://orcid.org/0009-0000-4149-0083
Geographic Coverage:
1. Location/Study Area: Sakleshpura, Karnataka, India
2. GPS coordinates: Kadamane Village (12.924647, 75.654650)
Temporal Coverage:
1. Begins: 2022-05-16 (Year, Month, Day)
2. Ends: 2024-05-22 (Year, Month, Day)
Besides the 000_readMe.txt file containing this information and the 14 images associated with individual observations, the dataset includes three comma-delimited text (csv) files, and one R code file as explained below:
1) 001_mammalData.csv -- This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file
2) 002_placeLocs.csv -- This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 or 1000m accuracy
3) 003_nameMatch.csv -- This file matches the name as originally recorded with the correct common name and scientific name
4) 004_GBIF_upload_code.R -- R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)
5) 005_download_images_from_googledrive.R - R code to extract image IDs and download images from googledrive
6) 006_kadamane_mammal_occurrence.xlsx - An excel file that contains the raw data and used in the codes above
FILES INCLUDED IN DATASET
001_mammaldata.csv
This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file
observers: Observers who made the observation
timestamp: Automatic time stamp of date and time when app was used
date: Date of observation
time: Time of observation
decimalLatitude: Latitude in decimal degrees N
decimalLongitude: Longitude in decimal degrees E
GPSaltitude: Altitude in metres
GPSaccuracy: Horizontal accuracy of GPS location in metres
place: Name of locality
habitat: Habitat type
taxa: mammal or reptile/amphibian
species: Species common name
count: Number of individuals observed
countType: Total (solitary or fully counted groups) or Partial (incompletely counted groups)
obsType: Type of observation: sighting, sign (droppings or vocalisation), death, roadkill, electrocution, other
notes: Notes or remarks on observation
imageID: Link to the google drive photo, if photo is available
instanceID: Automatically generated unique identifier of observation
002_placeLocs.csv
This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy
place: Name of locality as recorded
lat: Assigned latitude in decimal degrees N
long: Assigned longitude in decimal degrees E
GPSaccuracy: Assigned as 500 or 1000m – Horizontal accuracy of GPS location in metres
003_nameMatch.csv
This file matches the name as originally recorded with the correct common name and scientific name.
verbatimIdentification: Identification as originally recorded in the ‘species’ column of the mammaldata.csv file
vernacularName: Common or english name
scientificName: Scientific name
004_GBIF_upload_code.R
R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)
005_download_images_from_googledrive.R
R code that extracts imageIDs from the 001_mammalData.csv file and downloads them automatically to a preferred directory
006_kadamane_mammal_occurrence.xlsx
An excel file that contains the raw data and used in the codes above
Purpose: The multi-country Study on Global Ageing and Adult Health (SAGE) is run by the World Health Organization's Multi-Country Studies unit in the Innovation, Information, Evidence and Research Cluster. SAGE is part of the unit's Longitudinal Study Programme which is compiling longitudinal data on the health and well-being of adult populations, and the ageing process, through primary data collection and secondary data analysis. SAGE baseline data (Wave 0, 2002/3) was collected as part of WHO's World Health Survey http://www.who.int/healthinfo/survey/en/index.html (WHS). SAGE Wave 1 (2007/10) provides a comprehensive data set on the health and well-being of adults in six low and middle-income countries: China, Ghana, India, Mexico, Russian Federation and South Africa. Objectives: To obtain reliable, valid and comparable health, health-related and well-being data over a range of key domains for adult and older adult populations in nationally representative samples To examine patterns and dynamics of age-related changes in health and well-being using longitudinal follow-up of a cohort as they age, and to investigate socio-economic consequences of these health changes To supplement and cross-validate self-reported measures of health and the anchoring vignette approach to improving comparability of self-reported measures, through measured performance tests for selected health domains To collect health examination and biomarker data that improves reliability of morbidity and risk factor data and to objectively monitor the effect of interventions
Additional Objectives: To generate large cohorts of older adult populations and comparison cohorts of younger populations for following-up intermediate outcomes, monitoring trends, examining transitions and life events, and addressing relationships between determinants and health, well-being and health-related outcomes To develop a mechanism to link survey data to demographic surveillance site data To build linkages with other national and multi-country ageing studies To improve the methodologies to enhance the reliability and validity of health outcomes and determinants data To provide a public-access information base to engage all stakeholders, including national policy makers and health systems planners, in planning and decision-making processes about the health and well-being of older adults
Methods: SAGE's first full round of data collection included both follow-up and new respondents in most participating countries. The goal of the sampling design was to obtain a nationally representative cohort of persons aged 50 years and older, with a smaller cohort of persons aged 18 to 49 for comparison purposes. In the older households, all persons aged 50+ years (for example, spouses and siblings) were invited to participate. Proxy respondents were identified for respondents who were unable to respond for themselves. Standardized SAGE survey instruments were used in all countries consisting of five main parts: 1) household questionnaire; 2) individual questionnaire; 3) proxy questionnaire; 4) verbal autopsy questionnaire; and, 5) appendices including showcards. A VAQ was completed for deaths in the household over the last 24 months. The procedures for including country-specific adaptations to the standardized questionnaire and translations into local languages from English follow those developed by and used for the World Health Survey.
Content Household questionnaire 0000 Coversheet 0100 Sampling Information 0200 Geocoding and GPS Information 0300 Recontact Information 0350 Contact Record 0400 Household Roster 0450 Kish Tables and Household Consent 0500 Housing 0600 Household and Family Support Networks and Transfers 0700 Assets and Household Income 0800 Household Expenditures 0900 Interviewer Observations
Individual questionnaire 1000 Socio-Demographic Characteristics 1500 Work History and Benefits 2000 Health State Descriptions and Vignettes 2500 Anthropometrics, Performance Tests and Biomarkers 3000 Risk Factors and Preventive Health Behaviours 4000 Chronic Conditions and Health Services Coverage 5000 Health Care Utilization 6000 Social Cohesion 7000 Subjective Well-Being and Quality of Life (WHOQoL-8 and Day Reconstruction Method) 8000 Impact of Caregiving 9000 Interviewer Assessment
National coverage
households and individuals
The household section of the survey covered all households in 19 of the 28 states in India which covers 96% of the population. Institutionalised populations are excluded. The individual section covered all persons aged 18 years and older residing within individual households.
Sample survey data [ssd]
World Health Survey Sampling India has 28 states and seven union territories. 19 of the 28 states were included in the design representing 96% of the population. India used a stratified multistage cluster sample design. Six states were selected in accordance with their geographic location and level of development. Strata were defined by the 6 states:(Assam, Karnataka, Maharashtra, Rajasthan, Uttar Pradesh and West Bengal), and locality (urban or rural). There are 12 strata in total. The 2000 Census demarcation was used as the sampling frame. Two stage and three stage sampling was adopted in rural and urban areas, respectively. In rural areas PSUs(villages) were selected probability proportional to size. The measure of size being the 2001 Census population in the village. SSUs (households) were selected using systematic sampling. TSUs (individuals) were selected using Kish tables. In urban areas, PSUs(city wards) were selected probability proportional to size. SSUs(census enumeration blocks), two were randomly selected from each PSU. TSU (households) were selected using systematic sampling. QSU (individuals) were selected as in rural areas. A sample of 379 EAs was selected as the primary sampling units(PSU).
SAGE Sampling The SAGE sample was pre-determined as all PSUs and households selected for the WHS/SAGE Wave 0 survey were included. Exceptions are three PSUs in Assam which were replaced as they were inaccessible due to flooding. And a further six PSUs were omitted for which the household roster information was not available. In each selected EA, a listing of the households was conducted to classify each household into the following mutually exclusive categories: 1)Households with a WHS/SAGE Wave 0 respondent aged 50-plus: all members aged 50-plus including the WHS/SAGE Wave 0 respondent were eligible for the individual interview. 2)Households with a WHS/SAGE Wave 0 respondent aged 47-49: all members aged 50-plus including the WHS/SAGE Wave 0 respondent aged 47-49 was eligible for the individual interview. 3)Households with a WHS/SAGE Wave 0 female respondent aged 18-46: all females members aged 18-49 including the WHS/SAGE Wave 0 female respondent aged 18-46 were eligible for the individual interview. 4)Households with a WHS/SAGE Wave 0 male respondent aged 18-46: three households were selected using systematic sampling and one male aged 18-49 was eligible for the individual interview. In the households not selected, all members aged 50-plus were eligible for the individual interview.
Stages of selection Strata: State, Locality=12 PSU: EAs=375 surveyed SSU: Households=10424 surveyed TSU: Individual=12198 surveyed
Face-to-face [f2f] PAPI
The questionnaires were based on the WHS Model Questionnaire with some modification and many new additions. A household questionnaire was administered to all households eligible for the study. A Verbal Autopsy questionnaire was administered to households that had a death in the last 24 months. An Individual questionniare was administered to eligible respondents identified from the household roster. A Proxy questionnaire was administered to individual respondents who had cognitive limitations. A Womans Questionnaire was administered to all females aged 18-49 years identified from the household roster. The questionnaires were developed in English and were piloted as part of the SAGE pretest in 2005. All documents were translated into Hindi, Assamese, Kanada and Marathi. SAGE generic questionnaires are available as external resources.
Data editing took place at a number of stages including: (1) office editing and coding (2) during data entry (3) structural checking of the CSPro files (4) range and consistency secondary edits in Stata
Household Response rate=88% Cooperation rate=92%
Individual: Response rate=68% Cooperation rate=92%
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
After some rigorous SQL queries and coding on python. I made this dataset. In this dataset, all stocks of the Indian Stock Market are present a total of 2435 stocks. The data is of 1-year rows represent stock name and column represent date and I have filled the table with closing price. Enjoy and do some stock price predictions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Mammal occurrence records (January 2020 - June 2023) in the Valparai Plateau and Anamalai Tiger Reserve, Western Ghats, India. It includes a few occurrence records of reptiles. Occurrence records were gathered in the field by researchers of the Nature Conservation Foundation, India, using a mobile data collection application. Suggested citation is:
Nature Conservation Foundation (2024). Mammal occurrence records (2020-23) in the Valparai Plateau and Anamalai Tiger Reserve, Western Ghats, India. Nature Conservation Foundation, India. Dataset, Zenodo. DOI: 10.5281/zenodo.11903722
CONTACT #1
1. Name: T. R. Shankar Raman
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: trsr@ncf-india.org
5. ORCID: https://orcid.org/0000-0002-1347-3953
CONTACT #2
1. Name: Divya Mudappa
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: divya@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-9708-4826
Keywords: tropical rainforest, plantations, Anamalai Hills, Western Ghats, animal distribution, mammals
Geographic Coverage:
1. Location/Study Area: Valparai Plateau, Tamil Nadu, India; Anamalai Tiger Reserve, Tamil Nadu, India
2. GPS coordinates: Valparai Plateau (10°15'- 10°22'N, 76°52' - 76°59'E); Anamalai Tiger Reserve (10°12' - 10°35'N, 76°49' - 77°24'E)
Temporal Coverage:
1. Begins: 2020-01-11 (Year, Month, Day)
2. Ends: 2023-06-02 (Year, Month, Day)
Besides the 000_readMe.txt file containing this information, the dataset includes 60 images (photographs), three comma-delimited text (csv) files, and one R markdown text file with R code as explained below:
1) 001_mammalData.csv -- This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file
2) 002_placeLocs.csv -- This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy
3) 003_nameMatch.csv -- This file matches the name as originally recorded with the correct common name and scientific name
4) 004_mammup.Rmd -- R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)
+60 image files (with ".jpg" file extension)
FILES INCLUDED IN DATASET
001_mammdata.csv
This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file
recordedBy: Observer who recorded/made the observation
username: Username of person on whose mobile phone the data were noted
timestamp: Automatic time stamp of date and time when app was used
date: Date of observation
time: Time of observation
decimalLatitude: Latitude in decimal degrees N
decimalLongitude: Longitude in decimal degrees E
GPSaltitude: Altitude in metres
GPSaccuracy: Horizontal accuracy of GPS location in metres
place: Name of locality
habitat: Habitat type
species: Species common name
count: Number of individuals observed
countType: Total (solitary or fully counted groups) or Partial (incompletely counted groups)
obsType: Type of observation: sighting, sign (droppings or vocalisation), death, roadkill, electrocution, other
notes: Notes or remarks on observation
imageID: Image filename if available (NA, if not available)
instanceID: Automatically generated unique identifier of observation
002_placeLocs.csv
This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy
place: Name of locality as recorded
lat: Assigned latitude in decimal degrees N
long: Assigned longitude in decimal degrees E
GPSaccuracy: Assigned as 500 m – Horizontal accuracy of GPS location in metres
003_nameMatch.csv
This file matches the name as originally recorded with the correct common name and scientific name.
verbatimIdentification: Identification as originally recorded in the ‘species’ column of the mammdata.csv file
vernacularName: Common or engish name
scientificName: Scientific name
004_mammup.Rmd
R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)
The main method of the project was a survey interview from which both qualitative and quantitative data were collected. Field research was undertaken in marginalised rural communities in Zambia (Chiawa) and India (Sarguja district, Chhattisgarh state). Two rounds of fieldwork were undertaken in each place, in Zambia August–November 2010 (Zambia T1) and August–October 2012 (Zambia T2); in India February–May 2011 (India T1) and February–June 2013 (India T2). In both locations, we talked to husbands and wives (separately) and women heading households. In India we surveyed 340 people in 2011 and 368 in 2013. 187 respondents were interviewed in both rounds. 7% of respondents were single women. Qualitative data include 105 survey notes. In Zambia we surveyed 412 people in 2010 and 370 in 2012. These included 52 women heading households. 358 respondents were surveyed both years. Qualitative data include notes from 105 survey interviews. This research aims to identify pathways of wellbeing and poverty within rural communities in Zambia and India. It will demonstrate how poverty affects wellbeing and how different constellations of wellbeing in turn affect people's movements into, within and out of poverty. Drawing on the sociology of development and psychology, it adopts a mixed method, cross-cultural longitudinal approach, with qualitative and quantitative data collection across a two year interval, involving 700 respondents. Statistical tests assess the validity and reliability of our model of wellbeing. In-depth case studies provide a deeper sense of people's own understandings and experience. In particular, the research tests a key hypothesis that social and personal relationships constitute critical drivers of wellbeing in developing countries. The project is rooted in research-policy engagement. It involves partnership with NGOs committed to incorporating wellbeing into their programmes, and generates a broader programme of communications activities at national and global level. The Wellbeing and Poverty Pathways project developed a multi-dimensional model of wellbeing called “Inner Wellbeing” (IWB) which reflects what people think and feel they are able to be and do. The project explored relationships between people's subjective experiences of wellbeing and the external conditions in which they live their lives. Inner wellbeing comprises seven domains: economic confidence; agency and participation; social connections; close relationships; physical and mental health; competence and self-worth; values and meaning. It was constructed through a combination of theoretical reflection and empirical analysis in two rural communities, one in Zambia and one in India. The main research instrument was a survey which comprised three sections: an opening section on demographics and health; the central IWB section; and a final section on livelihoods and access to state services. Specifically for the central IWB section, the survey has five questions (or items) for each domain, which are designed to reflect different aspects of that domain. For each question respondents are asked to select one of five graduated answers. These are then scored on a scale from strong negative (1) to weak negative (2) to neutral (3) to weak positive (4) to strong positive wellbeing (5). The questions were extensively grounded and piloted to ensure they captured issues that were important to people’s lives locally. The studied population came from two rural areas of the Global South: Chiawa in Zambia and four villages in the Sarguja district of the Chhattisgarh state in India. No sample selection was applied. Instead, everyone in the study areas who would talk to us was interviewed. Chiawa is a Game Management Area (GMA), located in Kafue district, Lusaka province. To the south east it borders Zimbabwe and to the east the Lower Zambezi National Park. The majority population is Goba, a people-group that originated in what is now Zimbabwe. The research in India focused on four villages located in the historically remote hill and forest regions of northern Chhattisgarh. These villages were selected because they presented a range of contrasts. The communities there are extremely poor and people depend on (largely rainfed) farming, daily labour and gathering non-timber forest products to survive. Reflecting the area’s population as a whole, the majority of respondents (84%) are Adivasi, including Particularly Vulnerable Tribal Groups (PTG), with smaller numbers of Other Backward Caste (OBC) (15%) and Scheduled Caste (1%) people.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CAPTCHA (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). Only humans can successfully complete this test; current computer systems cannot. It is utilized in several applications for both human and machine identification. Text-based CAPTCHAs are the most typical type used on websites. Most of the letters in this protected CAPTCHA script are in English, it is challenging for rural residents who only speak their native tongues to pass the test. Devanagari characters have more complex characters than standard English characters and numeral-based CAPTCHAs, which makes machine recognition much more difficult. The majority of official websites in India only offer information in Devanagari. Unfortunately, websites do not use Devanagari CAPTCHAs.As a result, we have created a new text-based CAPTCHA in Devanagari script in this article. A computer/printed font and handwritten Devanagari character(34 each) and number(10 each) , in total 44+44 = 88 character images are used to design CAPTCHA. General CAPTCHA generation principles are used to add noise to the image using digital image processing techniques. Size of each CAPTCHA image is 250 X 90 pixels. 04 (Four) types of Character Sets are used – Printed Alphabet(34), Handwritten Alphabet(34), Printed Digit(10), and Handwritten Digit(10). Generated 11 Classes from these 04 combinations. The string length of the CAPTCHA image considered here is FIVE, SIX, and SEVEN ( 5, 6, 7). For each class – 03 (THREE) subclasses are created depending upon string length. In total there are 11 classes X 3 subclasses = 33 subclasses. So 33 types of CAPTCHA images were generated. For each class, 10,000 CAPTCHA images were created. For 11 Classes X 10,000 images , a Devanagari CAPTCHA Data set of 1,10,000 ( One Million Ten Thousand) images were created using Python. To make the CAPTCHA image less recognized or not easily broken. Passing a test with identifying Devanagari alphabets is difficult. It is beneficial to researchers who are investigating captcha recognition in this area. This dataset is helpful to researcher to design OCR for recognize Devanagari CAPTCHA and break it.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
With over 600 Million voters voting for 8500+ candidates across 543 constituencies, the general elections in the world's largest democracy are a potential goldmine of data. While there are existing separate datasets about the votes each candidate received and the personal information of each candidate, there was no comprehensive dataset that included both these information. Thus, this dataset will provide more usability than most existing datasets in this domain.
I scraped the website of myneta.info to get the personal information of each candidate (as per their own sworn affidavits) and the website of Election Commission of India to get the data about the votes received. I merged both this datasets to create this comprehensive dataset. Only the candidates who secured at least 1% of the total votes polled in their constituency have been included.
I have collected the data from MyNeta.info maintained by the Association for Democratic Reforms and the website of Election Commission of India.
There are 2 main tasks that can be performed on this dataset: Exploratory Data Analytics to visualize the impact of each feature of the candidate and the use of machine learning to predict the chances of winning of a candidate.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.
All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.
Here is the data dictionary for (Indian) Shark Tank season's dataset.