21 datasets found

🦈 Shark Tank India dataset 🇮🇳
kaggle.com
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Satya Thirumani
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Shark Tank India Data set.

Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

Here is the data dictionary for (Indian) Shark Tank season's dataset.

Season Number - Season number

Startup Name - Company name or product name

Episode Number - Episode number within the season

Pitch Number - Overall pitch number

Season Start - Season first aired date

Season End - Season last aired date

Original Air Date - Episode original/first aired date, on OTT/TV

Episode Title - Episode title in SonyLiv

Anchor - Name of the episode presenter/host

Industry - Industry name or type

Business Description - Business Description

Company Website - Company Website URL

Started in - Year in which startup was started/incorporated

Number of Presenters - Number of presenters

Male Presenters - Number of male presenters

Female Presenters - Number of female presenters

Transgender Presenters - Number of transgender/LGBTQ presenters

Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no

Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old

Pitchers City - Presenter's town/city or place where company head office exists

Pitchers State - Indian state pitcher hails from or state where company head office exists

Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue

Monthly Sales - Total monthly sales, in lakhs

Gross Margin - Gross margin/profit of company, in percentages

Net Margin - Net margin/profit of company, in percentages

EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization

Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)

SKUs - Stock Keeping Units or number of varieties, at the time of pitch

Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch

Bootstrapped - Startup is bootstrapped or not (yes/no)

Part of Match off - Competition between two similar brands, pitched at same time

Original Ask Amount - Original Ask Amount, in lakhs INR

Original Offered Equity - Original Offered Equity, in percentages

Valuation Requested - Valuation Requested, in lakhs INR

Received Offer - Received offer or not, 1-received, 0-not received

Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected

Total Deal Amount - Total Deal Amount, in lakhs INR

Total Deal Equity - Total Deal Equity, in percentages

Total Deal Debt - Total Deal debt/loan amount, in lakhs INR

Debt Interest - Debt interest rate, in percentages

Deal Valuation - Deal Valuation, in lakhs INR

Number of sharks in deal - Number of sharks involved in deal

Deal has conditions - Deal has conditions or not? (yes or no)

Royalty Percentage - Royalty percentage, if it's royalty deal

Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs

Advisory Shares Equity - Deal with Advisory shares or equity, in percentages

Namita Investment Amount - Namita Investment Amount, in lakhs INR

Namita Investment Equity - Namita Investment Equity, in percentages

Namita Debt Amount - Namita Debt Amount, in lakhs INR

Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR

Vineeta Investment Equity - Vineeta Investment Equity, in percentages

Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR

Anupam Investment Amount - Anupam Investment Amount, in lakhs INR

Anupam Investment Equity - Anupam Investment Equity, in percentages

Anupam Debt Amount - Anupam Debt Amount, in lakhs INR

Aman Investment Amount - Aman Investment Amount, in lakhs INR

Aman Investment Equity - Aman Investment Equity, in percentages

Aman Debt Amount - Aman Debt Amount, in lakhs INR

Peyush Investment Amount - Peyush Investment Amount, in lakhs INR

Peyush Investment Equity - Peyush Investment Equity, in percentages

Peyush Debt Amount - Peyush Debt Amount, in lakhs INR

Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR

Ritesh Investment Equity - Ritesh Investment Equity, in percentages

Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR

Amit Investment Amount - Amit Investment Amount, in lakhs INR

Amit Investment Equity - Amit Investment Equity, in percentages

Amit Debt Amount - Amit Debt Amount, in lakhs INR

Guest Investment Amount - Guest Investment Amount, in lakhs INR

Guest Investment Equity - Guest Investment Equity, in percentages

Guest Debt Amount - Guest Debt Amount, in lakhs INR

Invested Guest Name - Name of the guest(s) who invested in deal

All Guest Names - Name of all guests, who are present in episode

Namita Present - Whether Namita present in episode or not

Vineeta Present - Whether Vineeta present in episode or not

Anupam ...
d
PREDIK Data-Driven: Geospatial Data | USA | Tailor-made datasets: Foot...
datarade.ai
Updated Oct 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Predik Data-driven (2021). PREDIK Data-Driven: Geospatial Data | USA | Tailor-made datasets: Foot traffic & Places Data [Dataset]. https://datarade.ai/data-products/predik-data-driven-geospatial-data-usa-tailor-made-datas-predik-data-driven
Explore at:
.json, .csv, .xls, .sqlAvailable download formats
Dataset updated
Oct 13, 2021
Dataset authored and provided by
Predik Data-driven
Area covered
United States
Description
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:

-How often do people visit a location? (daily, monthly, absolute, and averages). -What type of places do they visit ? (parks, schools, hospitals, etc) -Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors. -What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?

Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.

Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.

We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.

Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.

Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.

Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.

Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.

POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.

Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.

Delivery schemas We can deliver the data in three different formats:

Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.

Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.

Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
m
Medicinal Leaf Dataset
data.mendeley.com
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roopashree S (2020). Medicinal Leaf Dataset [Dataset]. http://doi.org/10.17632/nnytj2v3n5.1
Explore at:
Unique identifier
https://doi.org/10.17632/nnytj2v3n5.1
Dataset updated
Oct 22, 2020
Authors
Roopashree S
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mother earth is enriched and nourished with a variety of plants. These plants are useful in many ways such as drug formulation, production of herbal products, and medicines to cure many common ailments and diseases. For the past 5000 years, Ayurveda, a traditional Indian medicinal system is widely accepted even today. India is a rich country for being the habitat for a variety of medicinal plants. Many parts of the plants such as leaves, bark, root, seeds, fruits, and many more are used as a vital ingredient for the production of herbal medicines. Herbal medicines are preferred in both developing and developed countries as an alternative to synthetic drugs mainly because of no side effects. Recognition of these plants by human sight will be tedious, time-consuming, and inaccurate. Applications of image processing and computer vision techniques for the identification of the medicinal plants are very crucial as many of them are under extinction as per the IUCN records. Hence, the digitization of useful medicinal plants is crucial for the conservation of biodiversity. Studies reveal that to build an intelligent system for recognition of medicinal herbs requires a decent size of plant leaf dataset. The dataset comprises of thirty species of healthy medicinal herbs such as Santalum album (Sandalwood), Muntingia calabura (Jamaica cherry), Plectranthus amboinicus / Coleus amboinicus (Indian Mint, Mexican mint), Brassica juncea (Oriental mustard), and many more. The dataset consists of 1500 images of forty species. Each species consist of 60 to 100 high-quality images. The folders are named as per the species botanical/scientific name. The leaves plucked are from different plants of the same species available in local gardens. It is keenly ensured not to pluck many leaves to build the dataset as it goes to waste after capturing a picture of it. Healthy and mature leaves are selected for the dataset. The instruments used are a Mobile camera (Model: Samsung s9+) and printer (Model: Canon Inkjet Printer). The images of the leaf in the dataset are slightly rotated and tilted to take its utmost advantage in training any machine learning and deep learning models. The contribution of the medicinal plant leaf dataset to develop Artificial Intelligence models (machine learning and deep learning) will assist many researchers and computer scientists to detect, identify the species and its diseases and learn more about the herb existence and medicinal properties. By releasing this dataset to the community, we look forward to stimulate research in medicinal plants where the current lack of public datasets is one of the main barriers for progress.
Transparency in Keyword Faceted Search: a dataset of Google Shopping html...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cozza Vittoria; Cozza Vittoria; Hoang Van Tien; Hoang Van Tien; Petrocchi Marinella; Petrocchi Marinella; De Nicola Rocco; De Nicola Rocco (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. http://doi.org/10.5281/zenodo.1491557
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1491557
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cozza Vittoria; Cozza Vittoria; Hoang Van Tien; Hoang Van Tien; Petrocchi Marinella; Petrocchi Marinella; De Nicola Rocco; De Nicola Rocco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

In the following, we describe how the search results have been collected.

Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

One term of usage applies:

In any research product whose findings are based on this dataset, please cite

@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4\_3}, doi = {10.1007/978-3-030-11226-4\_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Indo-German Literature Dataset
zenodo.org
data.niaid.nih.gov
bin, csv
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nina Smirnova; Nina Smirnova; Jack H. Culbert; Jack H. Culbert; Philipp Mayr; Philipp Mayr (2024). Indo-German Literature Dataset [Dataset]. http://doi.org/10.5281/zenodo.10607235
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10607235
Dataset updated
May 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nina Smirnova; Nina Smirnova; Jack H. Culbert; Jack H. Culbert; Philipp Mayr; Philipp Mayr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The IGLD is a dataset which is a mirror of the data utilised in the SEASON project selected from OpenAlex. It contains Indo-German research articles for research of academic collaboration between 1990 and 2022.

Paper

Our paper describing our work in the SEASON project:

Aasif Ahmad Mir, Nina Sminova, Jeyshankar Ramalingam, & Philipp Mayr (2024). The rise of Indo-German collaborative research: 1990-2022. In Global Knowledge, Memory and Communication, 2024. https://doi.org/10.1108/GKMC-09-2023-0328

Usage

Description of Selection and Cleaning

The following search query: CU (“GERMANY” AND “INDIA”) was used to retrieve the data from WoS. The data were retrieved from the year 1990 till the 30th of November 2022. A total of 36,999 records were retrieved against the employed query. For the present dataset, we retrieved only articles identical to those from WoS.

Our original dataset retrieved from WoS consisted of 36,999 entries. 33,319 entries possess a valid DOI, and 3,680 entries do not have a DOI. Therefore, we developed two approaches for retrieving desired data from the Openalex collection. Articles possessing a DOI were matched by DOI (dataset 1), and articles without DOI (dataset 2) were matched by article title and publication year.

Afterwards, DOIs in dataset 1 were additionally compared to the DOIs from the original WoS dataset, all inconsistencies were removed.

For dataset 2, authors were additionally checked. Authors’ surnames from dataset 2 and authors’ surnames from corresponding articles (matching by title and publication year) from the WoS dataset were compared. Only articles with matching publishing years, author surnames lists and titles were considered for the Openalex dataset. Following, dataset 1 and dataset 2 were combined into one final dataset (Openalex data).

Additionally, all duplicates (by article ID) were removed from the Openalex data. In the final step, we checked if all entries contained both German and Indian affiliations. Some inconsistencies with the WoS data were observed: 5,584 entries, which have both Indian and German affiliations in WoS had only one of the indicated above affiliations in the Openalex. These entries were removed from the final dataset. The final dataset resulted in 22,844 unique entries.

Column Descriptions

These descriptions are relevant summaries or extracts from the documentation at
https://docs.openalex.org/api-entities/works/work-object,
https://docs.openalex.org/api-entities/authors/author-object and
https://docs.openalex.org/api-entities/institutions/institution-object.

article_id
(Work attribute)

OpenAlex identifier for the article / work.
To retrieve the work you can visit https://openalex.org/works/

doi
(Work attribute)

Digital Object Identifier for the work.
Consists of a URL to doi.org

title
(Work attribute)

Title of the work.

article_display_name
(Work attribute)

Duplicate of "title" column, retained to match other OpenAlex objects' attribute.

publication_year
(Work attribute)

The year in which the work was published.
Please note that this is respective to the version of the work captured by OpenAlex as this particular entry. Other and potentially earlier published versions may be accessible in the work's location field, accessible from OpenAlex.

publication_date
(Work attribute)

An ISO 8601 formatted date for the publication of the work.
The same caveat to publication_year applies to publication_date.

article_type
(Work attribute)

Type of work.
E.g. Article, conference paper, report, dataset, etc.

article_type_crossref
(Work attribute)

Legacy type information inherited from Crossref.

article_cited_by_count
(Work attribute)

Number of citations to the work.

article_cited_by_api_url
(Work attribute)

A OpenAlex URL that allows the user to view the works which cite this work.

article_grants
(Work attribute)

A list of details for the grants which the work is in receipt from.
This information is gathered from Crossref and is described by OpenAlex at time of publication as "limited".

article_referenced_works_count
(Work attribute)

Number of works within OpenAlex that this work cites.
Please note that the total number of references in the work may be higher

language
(Work attribute)

The ISO 639-1 style Language of the work.
This attribute is inferred a software library (langdetect) used by OpenAlex based on the abstract, or title if the abstract is not available.

article_counts_by_year
(Work attribute)

A list of the citation count of this work per year, for up to the last 10 years.

article_locations_count
(Work attribute)

Number of locations this work can be found.
In OpenAlex, "locations" refer to the places on the internet where versions of this work is accessible.

author_id
(Author attribute)

OpenAlex identifier for an author of the work.
To retrieve OpenAlex's bibliography for this user you may visit https://openalex.org/authors/.
The following author attributes are associated with the author identifier in each row, please note that a work with multiple authors may have multiple rows, one for each author in OpenAlex.

orcid
(Author attribute)

ORCID identifier for the author.

author_name
(Author attribute)

Name of the author.

author_name_alternatives
(Author attribute)

Alternative formats for the author's name which OpenAlex has observed.

author_works_count
(Author attribute)

Number of works the author has created.

author_cited_by_count
(Author attribute)

Number of works which cite a work the author has created.

author_last_known_institution
(Author attribute)

Identifier for the institution with which the author is affiliated with, in the most recent publication from the author containing an institutional identifier.
Please note this may differ from the institution associated with the author at time of the work's release, which is listed in this database as "institution_id".

author_summary_stats
(Author attribute)

OpenAlex's citation metrics for the author.
These include citation count, i10-index, h-index and more.

institution_id
(Institution attribute)

OpenAlex identifier for the institution associated with the author when the work was published.

ror
(Institution attribute)

Research Organization Registry (ROR) identifier for the institution.

institution_name
(Institution attribute)

Name of the institution.

institution_country_code
(Institution attribute)

ISO 3166-1 Alpha-2 (two-letter) country code for the country in which the institution is located.

insitution_type
(Institution attribute)

ROR-style primary type for the institution.

institution_homepage_url
(Institution attribute)

A URL for the institution's primary homepage

institution_display_name_acroynyms
(Institution attribute)

Known acronyms or initialisms for the institution.

institution_display_name_alternatives
(Institution attribute)

Alternative names for the institution.

institution_works_count
(Institution attribute)

The number of works created by authors affiliated with this institution.

institution_cited_by_count
(Institution attribute)

The number of works that cite a work created by authors affiliated by the institution.

institution_summary_stats
(Institution attribute)

Citation metrics for the institution
Similar to author_summary_stats.

Information

Contact Information

Nina Smirnova - nina.smirnova@gesis.org

Publication Information

Released to Zenodo 1st Feb
A
‘Shark Tank India Companies’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Shark Tank India Companies’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-shark-tank-india-companies-8330/latest
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Shark Tank India Companies’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/devanshu125/shark-tank-india-companies on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

Recently, I saw a dataset based on Shark Tank USA. This dataset inspired me to create one for India as well and since season 1 recently ended, I thought this was the perfect time to look at some insights based on the deals.

Content

This dataset contains the following information - 1. episode - episode number 2. pitch_no - pitch number (unique) 3. company - company name 4. idea - company description 5. deal - final deal that was taken 6. ashneer - Did Ashneer invest? 7. namita - Did Ashneer invest? 8. anupam - Did Anupam invest? 9. vineeta - Did Vineeta invest? 10. aman - Did Aman invest? 11. peyush - Did - Did Peyush invest? 12. ghazal - Did Ghazal invest?

Acknowledgements

This data was scraped from Wikipedia.

--- Original source retains full ownership of the source dataset ---
e
Global Dated Landslide Data Base during Sentinel-2 satellite data...
b2find.eudat.eu
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Global Dated Landslide Data Base during Sentinel-2 satellite data availability - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/bb7035e6-f42f-5f42-be92-979ed10665a6
Explore at:
Dataset updated
Apr 17, 2024
Description
This Global Dated Landslide Database (GDLDB) is part of the project WeMonitor (Weakly Supervised Deep Learning Models for Detecting and Monitoring Spatio-Temporal Anomalies in Optical and Radar Satellite Time Series), funded by the Helmholtz Imaging Platform. The aim is to develop a deep learning model that uses satellite image time series from Sentinel1/2 to automatically monitor changes caused, for example, by landslides, deforestation, large fires, dam failures, or the emergence of waste dumps. To train such a model, a reference dataset is required that shows the area and date of the changes as precise as possible. To allow for a generic and transferable model, the reference data also needs to cover the diversity of the process to be detected. Thus, the aim of the GDLDB is to comprise landslides of different sizes, shapes, and types, occurring at different seasons and in different regions with varying natural conditions and different triggering mechanisms such as rainfall and earthquake-induced landslides. To build the GDLDB, available local and regional landslide inventories from around the world are combined into one coherent database by verifying their location and date of occurrence with high-resolution remote sensing data. The selection criteria for the source inventories are the definition of the landslide location as polygons, at least a rough indication of the landslide origin date, and that the landslides occurred during the Sentinel-2 data availability from 2016 onwards. A total of 16 individual inventories are included (Table 1), one each from the USA, Dominica, Italy, Zimbabwe, southern India, Nepal, China, Papua New Guinea, and New Zealand, and two each from Kyrgyzstan, Japan, and the Philippines. In addition, a global inventory was added, including a small number of landslides from the USA, Peru, Chile, Europe, Pakistan, Nepal, India, and Taiwan, and a larger number of landslides from Indonesia. From each inventory, approximately 100 landslides were randomly selected to ensure an unbiased selection of landslides in terms of shape, size, and location. The original source inventories are produced using a variety of methods, including manual mapping in airborne data with ground verification and automatic identification in satellite remote sensing data. As a result, the mapping quality of the inventories varies greatly. In cases where landslides could not be verified by us using available optical remote sensing data (e.g. Sentinel-2, Planet Scope, and data available in Google Earth) new polygons are selected until the number of approximately 100 landslides is reached. In some inventories, the number of 100 landslides could not be guaranteed, due to a lack of suitable landslides (e.g., small size, incorrect classification) or the total number of landslides in the selected inventory was less than 100. For inventories with a lot of small landslides, that were difficult or impossible to observe, a size threshold of 1000m2 was introduced.
i
Vadu HDSS INDEPTH Core Dataset 2009 - 2015 (Release 2017) - India
datacatalog.ihsn.org
catalog.ihsn.org
Updated Mar 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Siddhivinayak Hirve (Founding Investigator: from 2002-2009) (2019). Vadu HDSS INDEPTH Core Dataset 2009 - 2015 (Release 2017) - India [Dataset]. https://datacatalog.ihsn.org/catalog/study/IND_2009-2015_INDEPTH-VHDSS_v01_M
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Dr. Sanjay Juvekar (Founding Co-Investigator and presently Investigator: 2002 to date)
Dr. Siddhivinayak Hirve (Founding Investigator: from 2002-2009)
Time period covered
2009 - 2015
Area covered
India
Description
Abstract

Vadu Rural Health Program, KEM Hospital Research Centre Pune has a rich tradition in health care and development being in the forefront of needs-based, issue-driven research over almost 35 years. During the decades of 1980 and 1990 the research at Vadu focused on mother and child with epidemiological and social science research exploring low birth weight, child survival, maternal mortality, safe abortion and domestic violence. The research portfolio has ever since expanded to include adult health and aging, non-communicable and communicable diseases and to clinical trials in recent years. It started with establishment of Health and Demographic Surveillance System at Vadu (HDSS Vadu) in August, 2002 that seeks to establish a quasi-experimental design setting to allow evaluation of impact of health interventions as well as monitor secular trends in diseases, risk factors and health behavior of humans.

The term "demographic surveillance" means to keep close track of the population dynamics. Vadu HDSS deals with keeping track of health issues and demographic changes in Vadu rural health program (VRHP) area. It is one of the most promising projects of national relevance that aims at establishing a quasi-experimental intervention research setting with the following objectives: 1) To create a longitudinal data base for efficient service delivery, future research, and linking all past micro-studies in Vadu area 2) Monitoring trends in public health problems 3) Keeping track of population dynamics 4) Evaluating intervention services

This dataset contains the events of all individuals ever resident during the study period (1 Jan. 2009 to 31 Dec. 2015).

Geographic coverage

Vadu HDSS falls in two administrative blocks: (1) Shirur and (2) Haweli of Pune district in Maharashtra in western India. It covers an area of approximately 232 square kilometers.

Analysis unit

Individual

Universe

Vadu HDSS covers as many as 50,000 households having 140,000 population spread across 22 villages.

Kind of data

Event history data

Frequency of data collection

Two rounds per year

Sampling procedure

Vadu area including 22 villages in two administrative blocks is the study area. This area was selected as this is primarily coverage area of Vadu Rural Health Program which is in function since more than four decade. Every individual household is included in HDSS. There is no sampling strategy employed as 100% population coverage in the area is expected.

Mode of data collection

Proxy Respondent [proxy]

Research instrument

Language of communication is in Marath or Hindi. The form labels are multilingual - in English and Marathi, but the data entered through the forms are in English only.

The following forms were used: - Field Worker Checklist Form - The checklist provides a guideline to ensure that all the households are covered during the round and the events occurred in each household are captured. - Enumeration Form: To capture the population details at the start of the HDSS or any addition of villages afterwards. - Pregnancy Form: To capture pregnancy details of women in the age group 15 to 49. - Birth Form: To capture the details of the birth events.
- Inmigration Form: To capture inward population movement from outside the HDSS area and also for movement within the HDSS area. - Outmigration Form: To capture outward population movement from inside the HDSS area and also for movement within the HDSS area. - Death Form: To capture death events.

Cleaning operations

Entered data undergo a data cleaning process. During the cleaning process all error data are either corrected in consultaiton with the data QC team or the respective forms are sent back to the field for re collection of correct data. Data editors have the access to the raw dataset for making necessary editing after corrected data are bought from the field.

For all individuals whose enumeration (ENU), Inmigration (IMG) or Birth (BTH) have occurred before the left censoring date (2009-01-01) and have not outmigrated (OMG) or not died (DTH) before the left censoring date (2009-01-01) are included in the dataset as Enumeration (ENU) with EventDate as the left censored date (2009-01-01). But the actual date of observation of the event (ENU, BTH, IMG) is retained in the dataset as observation date for these left censored ENU events. The individual is dropped from the dataset if their end event (OMG or DTH) is prior to the left censoring date (2009-01-01)

Response rate

On an average the response rate is 99.99% in all rounds over the years.

Sampling error estimates

Not Applicable

Data appraisal

Data is cleaned to an acceptable level against the standard data rules using Pentaho Data Integration Comminity Edition (PDI CE) tool. After the cleaning process, quality metrics were as follows:

CentreId MetricTable QMetric Illegal Legal Total Metric RunDate IN021 MicroDataCleaned Starts 1 301112 301113 0. 2017-05-31 20:06
IN021 MicroDataCleaned Transitions 0 667010 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned Ends 301113 2017-05-31 20:07
IN021 MicroDataCleaned SexValues 29 666981 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned DoBValues 575 666435 667010 0. 2017-05-31 20:07

Note: Except lower under five mortality in 2012 and lower adult mortality among females in 2013, all other estimates are fairly within expected range. Data underwent additional review in terms of electronic data capture, data cleaning and management to look for reasons for lower under five mortality rates in 2013 and lower female adult mortality in 2013. The additional review returned marginally higher rates and this supplements the validity of collected data. Further field related review of 2012 and 2013 data are underway and any revisions to published data/figures will be shared at a later stage.
F
Hindi Wake Words & Voice Commands Speech Data
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Hindi Wake Words & Voice Commands Speech Data [Dataset]. https://www.futurebeeai.com/dataset/wake-words-and-commands-dataset/wake-words-and-commands-hindi-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Hindi Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.
Speech Data
This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:
•Wake words alone
•Wake words followed by command phrases
Participant Diversity
•
Speakers: 50 native Hindi speakers from the FutureBeeAI community

•
Regions: Participants from various India provinces, ensuring broad coverage of accents and dialects

•
Demographics: Ages 18–70; 60% male and 40% female participants

Recording Details
•
Type: Scripted wake words and command phrases

•
Duration: 1 to 15 seconds per clip

•
Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

Dataset Diversity
•Wake Word Types
•
Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.

•
Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.

•
Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more

•Command Types by Use Case
•
Automobile: Play music, check directions, voice search, provide feedback, and more

•
Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more

•
Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.

•Recording Environments
•No background noise
•Background traffic noise
•People talking in the background
•Speaking Pace
•Normal speed
•Fast speed
This diversity ensures robust training for real-world voice assistant applications.
Metadata
Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.
•
Participant Metadata: Unique ID, age, gender, region, accent, dialect

•
Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

Use Cases & Applications
•
Voice Assistant Activation: Train models to accurately detect and trigger based on wake words

•
Smart Home Devices: Enable responsive voice control in smart appliances

•
<b style="font-weight:
w
India - National Family Health Survey 1998-1999 - Dataset - waterdata
wbwaterdata.org
Updated Mar 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). India - National Family Health Survey 1998-1999 - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/india-national-family-health-survey-1998-1999
Explore at:
Dataset updated
Mar 16, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The second National Family Health Survey (NFHS-2), conducted in 1998-99, provides information on fertility, mortality, family planning, and important aspects of nutrition, health, and health care. The International Institute for Population Sciences (IIPS) coordinated the survey, which collected information from a nationally representative sample of more than 90,000 ever-married women age 15-49. The NFHS-2 sample covers 99 percent of India's population living in all 26 states. This report is based on the survey data for 25 of the 26 states, however, since data collection in Tripura was delayed due to local problems in the state. IIPS also coordinated the first National Family Health Survey (NFHS-1) in 1992-93. Most of the types of information collected in NFHS-2 were also collected in the earlier survey, making it possible to identify trends over the intervening period of six and one-half years. In addition, the NFHS-2 questionnaire covered a number of new or expanded topics with important policy implications, such as reproductive health, women's autonomy, domestic violence, women's nutrition, anaemia, and salt iodization. The NFHS-2 survey was carried out in two phases. Ten states were surveyed in the first phase which began in November 1998 and the remaining states (except Tripura) were surveyed in the second phase which began in March 1999. The field staff collected information from 91,196 households in these 25 states and interviewed 89,199 eligible women in these households. In addition, the survey collected information on 32,393 children born in the three years preceding the survey. One health investigator on each survey team measured the height and weight of eligible women and children and took blood samples to assess the prevalence of anaemia. SUMMARY OF FINDINGS POPULATION CHARACTERISTICS Three-quarters (73 percent) of the population lives in rural areas. The age distribution is typical of populations that have recently experienced a fertility decline, with relatively low proportions in the younger and older age groups. Thirty-six percent of the population is below age 15, and 5 percent is age 65 and above. The sex ratio is 957 females for every 1,000 males in rural areas but only 928 females for every 1,000 males in urban areas, suggesting that more men than women have migrated to urban areas. The survey provides a variety of demographic and socioeconomic background information. In the country as a whole, 82 percent of household heads are Hindu, 12 percent are Muslim, 3 percent are Christian, and 2 percent are Sikh. Muslims live disproportionately in urban areas, where they comprise 15 percent of household heads. Nineteen percent of household heads belong to scheduled castes, 9 percent belong to scheduled tribes, and 32 percent belong to other backward classes (OBCs). Two-fifths of household heads do not belong to any of these groups. Questions about housing conditions and the standard of living of households indicate some improvements since the time of NFHS-1. Sixty percent of households in India now have electricity and 39 percent have piped drinking water compared with 51 percent and 33 percent, respectively, at the time of NFHS-1. Sixty-four percent of households have no toilet facility compared with 70 percent at the time of NFHS-1. About three-fourths (75 percent) of males and half (51 percent) of females age six and above are literate, an increase of 6-8 percentage points from literacy rates at the time of NFHS-1. The percentage of illiterate males varies from 6-7 percent in Mizoram and Kerala to 37 percent in Bihar and the percentage of illiterate females varies from 11 percent in Mizoram and 15 percent in Kerala to 65 percent in Bihar. Seventy-nine percent of children age 6-14 are attending school, up from 68 percent in NFHS-1. The proportion of children attending school has increased for all ages, particularly for girls, but girls continue to lag behind boys in school attendance. Moreover, the disparity in school attendance by sex grows with increasing age of children. At age 6-10, 85 percent of boys attend school compared with 78 percent of girls. By age 15-17, 58 percent of boys attend school compared with 40 percent of girls. The percentage of girls 6-17 attending school varies from 51 percent in Bihar and 56 percent in Rajasthan to over 90 percent in Himachal Pradesh and Kerala. Women in India tend to marry at an early age. Thirty-four percent of women age 15-19 are already married including 4 percent who are married but gauna has yet to be performed. These proportions are even higher in the rural areas. Older women are more likely than younger women to have married at an early age: 39 percent of women currently age 45-49 married before age 15 compared with 14 percent of women currently age 15-19. Although this indicates that the proportion of women who marry young is declining rapidly, half the women even in the age group 20-24 have married before reaching the legal minimum age of 18 years. On average, women are five years younger than the men they marry. The median age at marriage varies from about 15 years in Madhya Pradesh, Bihar, Uttar Pradesh, Rajasthan, and Andhra Pradesh to 23 years in Goa. As part of an increasing emphasis on gender issues, NFHS-2 asked women about their participation in household decisionmaking. In India, 91 percent of women are involved in decision-making on at least one of four selected topics. A much lower proportion (52 percent), however, are involved in making decisions about their own health care. There are large variations among states in India with regard to women's involvement in household decisionmaking. More than three out of four women are involved in decisions about their own health care in Himachal Pradesh, Meghalaya, and Punjab compared with about two out of five or less in Madhya Pradesh, Orissa, and Rajasthan. Thirty-nine percent of women do work other than housework, and more than two-thirds of these women work for cash. Only 41 percent of women who earn cash can decide independently how to spend the money that they earn. Forty-three percent of working women report that their earnings constitute at least half of total family earnings, including 18 percent who report that the family is entirely dependent on their earnings. Women's work-participation rates vary from 9 percent in Punjab and 13 percent in Haryana to 60-70 percent in Manipur, Nagaland, and Arunachal Pradesh. FERTILITY AND FAMILY PLANNING Fertility continues to decline in India. At current fertility levels, women will have an average of 2.9 children each throughout their childbearing years. The total fertility rate (TFR) is down from 3.4 children per woman at the time of NFHS-1, but is still well above the replacement level of just over two children per woman. There are large variations in fertility among the states in India. Goa and Kerala have attained below replacement level fertility and Karnataka, Himachal Pradesh, Tamil Nadu, and Punjab are at or close to replacement level fertility. By contrast, fertility is 3.3 or more children per woman in Meghalaya, Uttar Pradesh, Rajasthan, Nagaland, Bihar, and Madhya Pradesh. More than one-third to less than half of all births in these latter states are fourth or higher-order births compared with 7-9 percent of births in Kerala, Goa, and Tamil Nadu. Efforts to encourage the trend towards lower fertility might usefully focus on groups within the population that have higher fertility than average. In India, rural women and women from scheduled tribes and scheduled castes have somewhat higher fertility than other women, but fertility is particularly high for illiterate women, poor women, and Muslim women. Another striking feature is the high level of childbearing among young women. More than half of women age 20-49 had their first birth before reaching age 20, and women age 15-19 account for almost one-fifth of total fertility. Studies in India and elsewhere have shown that health and mortality risks increase when women give birth at such young ages?both for the women themselves and for their children. Family planning programmes focusing on women in this age group could make a significant impact on maternal and child health and help to reduce fertility. INFANT AND CHILD MORTALITY NFHS-2 provides estimates of infant and child mortality and examines factors associated with the survival of young children. During the five years preceding the survey, the infant mortality rate was 68 deaths at age 0-11 months per 1,000 live births, substantially lower than 79 per 1,000 in the five years preceding the NFHS-1 survey. The child mortality rate, 29 deaths at age 1-4 years per 1,000 children reaching age one, also declined from the corresponding rate of 33 per 1,000 in NFHS-1. Ninety-five children out of 1,000 born do not live to age five years. Expressed differently, 1 in 15 children die in the first year of life, and 1 in 11 die before reaching age five. Child-survival programmes might usefully focus on specific groups of children with particularly high infant and child mortality rates, such as children who live in rural areas, children whose mothers are illiterate, children belonging to scheduled castes or scheduled tribes, and children from poor households. Infant mortality rates are more than two and one-half times as high for women who did not receive any of the recommended types of maternity related medical care than for mothers who did receive all recommended types of care. HEALTH, HEALTH CARE, AND NUTRITION Promotion of maternal and child health has been one of the most important components of the Family Welfare Programme of the Government of India. One goal is for each pregnant woman to receive at least three antenatal check-ups plus two tetanus toxoid injections and a full course of iron and folic acid supplementation. In India, mothers of 65 percent of the children born in the three years preceding NFHS-2 received at least one antenatal
Waste Management and Recycling in Indian Cities
kaggle.com
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krishna Yadu (2024). Waste Management and Recycling in Indian Cities [Dataset]. http://doi.org/10.34740/kaggle/dsv/10203312
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10203312
Dataset updated
Dec 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Krishna Yadu
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About the Dataset: Waste Management and Recycling in India

Overview:

This dataset provides comprehensive information on waste management and recycling practices in various cities across India. It includes key data related to waste generation, recycling rates, population density, municipal efficiency, landfill details, and more. The data spans multiple years (2019–2023) and covers a range of waste types, including plastic, organic waste, electronic waste (e-waste), construction waste, and hazardous waste.

Purpose:

The dataset aims to: - Promote efficient waste management practices across Indian cities. - Analyze trends in recycling and waste disposal methods. - Provide insights for improving municipal management systems. - Support research and development in sustainability, environmental science, and urban planning.

Columns:

City/District: The name of the Indian city or district.

Waste Type: Type of waste generated, e.g., Plastic, Organic, E-Waste, Construction, Hazardous.

Waste Generated (Tons/Day): Amount of waste generated in tons per day.

Recycling Rate (%): The percentage of waste that is recycled.

Population Density (People/km²): The number of people per square kilometer in the city.

Municipal Efficiency Score (1-10): A score indicating how effectively the municipality manages waste (e.g., waste segregation, collection, disposal).

Disposal Method: The method used for waste disposal (e.g., Landfill, Recycling, Incineration, Composting).

Cost of Waste Management (₹/Ton): The cost of managing one ton of waste in Indian Rupees.

Awareness Campaigns Count: The number of awareness campaigns organized by the municipality in that year related to waste management.

Landfill Name: The name of the landfill site used by the city.

Landfill Location (Lat, Long): The geographical location (latitude and longitude) of the landfill.

Landfill Capacity (Tons): The total waste capacity (in tons) that the landfill can hold.

Year: The year of the data entry, ranging from 2019 to 2023.

Applications:

Urban Planning: The dataset can be used to analyze and optimize waste management infrastructure in urban areas.

Sustainability Research: It can help in studying the progress of recycling and waste reduction strategies.

Policy Making: Government bodies can use this data to craft policies aimed at improving waste management and recycling rates.

Machine Learning/AI: The dataset can be used to build models for predicting waste generation trends, recycling outcomes, and municipal efficiency.

Sources:

The data is simulated for this dataset based on average waste management practices observed in Indian cities.

Real-world data could come from municipal corporations, environmental agencies, and government reports on waste management.
#IndiaNeedsOxygen Tweets
kaggle.com
zip
Updated Nov 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2021). #IndiaNeedsOxygen Tweets [Dataset]. https://www.kaggle.com/kaushiksuresh147/indianeedsoxygen-tweets
Explore at:
zip(4441094 bytes)Available download formats
Dataset updated
Nov 14, 2021
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
India marks one COVID-19 death every 5 minutes

https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">

Content

People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.

For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.

India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.

https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">

Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source

Dataset

The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.

The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |

Acknowledgements

https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters

Inspiration

The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.

And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
Indian Railways Latest
kaggle.com
Updated Dec 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arihant Jain (2020). Indian Railways Latest [Dataset]. https://www.kaggle.com/datasets/arihantjain09/indian-railways-latest
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2020
Dataset provided by
Kaggle
Authors
Arihant Jain
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
As Indian Railway Dataset is not thoroughly available, we thought of creating one and give it to the world! There is no missing data in this dataset.

Content

We have made this dataset using some info from data.gov.in, and added distance along with another table train_info, and much more cleaning. There are 2 files in this dataset train_info and train_schedule. train_schedule has more than 186000 rows while train_info consists of 11114 rows.

Acknowledgements

This dataset was part of our DBMS project which is hosted and live on **http://www.railways.live **

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats,...
zenodo.org
explore.openaire.eu
bin, csv, jpeg, txt
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vijay Karthick; Vijay Karthick; Vijay Kumar; Vijay Kumar; Anand Osuri; Anand Osuri (2024). Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India [Dataset]. http://doi.org/10.5281/zenodo.13340613
Explore at:
csv, bin, jpeg, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13340613
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vijay Karthick; Vijay Karthick; Vijay Kumar; Vijay Kumar; Anand Osuri; Anand Osuri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Western Ghats, Sakleshpura, India
Description
Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India

This dataset contains mammal occurrence records from 2022 to 2024 in the Sakleshpura region of central Western Ghats, India. It includes a few occurrence records of other chordates. Occurrence records were gathered in the field by researchers of the Nature Conservation Foundation, India, using a mobile data collection application. Suggested citation is:
Nature Conservation Foundation (2024). Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India. Nature Conservation Foundation, India. Dataset

Keywords: tropical rainforest, plantations, Sakleshpura, animal distribution, Western Ghats

CONTACT #1
1. Name: Anand M Osuri
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: aosuri@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-9909-5633

CONTACT #2
1. Name: Vijay Karthick
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: vijayk@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-6023-3955

CONTACT #3
1. Name: Vijay Kumar
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: vijaykumar@ncf-india.org
5. ORCID: https://orcid.org/0009-0000-4149-0083

Geographic Coverage:
1. Location/Study Area: Sakleshpura, Karnataka, India
2. GPS coordinates: Kadamane Village (12.924647, 75.654650)

Temporal Coverage:
1. Begins: 2022-05-16 (Year, Month, Day)
2. Ends: 2024-05-22 (Year, Month, Day)

Besides the 000_readMe.txt file containing this information and the 14 images associated with individual observations, the dataset includes three comma-delimited text (csv) files, and one R code file as explained below:
1) 001_mammalData.csv -- This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file

2) 002_placeLocs.csv -- This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 or 1000m accuracy

3) 003_nameMatch.csv -- This file matches the name as originally recorded with the correct common name and scientific name

4) 004_GBIF_upload_code.R -- R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)

5) 005_download_images_from_googledrive.R - R code to extract image IDs and download images from googledrive

6) 006_kadamane_mammal_occurrence.xlsx - An excel file that contains the raw data and used in the codes above

FILES INCLUDED IN DATASET

001_mammaldata.csv
This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file

observers: Observers who made the observation
timestamp: Automatic time stamp of date and time when app was used
date: Date of observation
time: Time of observation
decimalLatitude: Latitude in decimal degrees N
decimalLongitude: Longitude in decimal degrees E
GPSaltitude: Altitude in metres
GPSaccuracy: Horizontal accuracy of GPS location in metres
place: Name of locality
habitat: Habitat type
taxa: mammal or reptile/amphibian
species: Species common name
count: Number of individuals observed
countType: Total (solitary or fully counted groups) or Partial (incompletely counted groups)
obsType: Type of observation: sighting, sign (droppings or vocalisation), death, roadkill, electrocution, other
notes: Notes or remarks on observation
imageID: Link to the google drive photo, if photo is available
instanceID: Automatically generated unique identifier of observation

002_placeLocs.csv
This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy

place: Name of locality as recorded
lat: Assigned latitude in decimal degrees N
long: Assigned longitude in decimal degrees E
GPSaccuracy: Assigned as 500 or 1000m – Horizontal accuracy of GPS location in metres

003_nameMatch.csv
This file matches the name as originally recorded with the correct common name and scientific name.

verbatimIdentification: Identification as originally recorded in the ‘species’ column of the mammaldata.csv file
vernacularName: Common or english name
scientificName: Scientific name

004_GBIF_upload_code.R
R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)

005_download_images_from_googledrive.R
R code that extracts imageIDs from the 001_mammalData.csv file and downloads them automatically to a preferred directory

006_kadamane_mammal_occurrence.xlsx
An excel file that contains the raw data and used in the codes above
w
Study on Global Ageing and Adult Health-2007, Wave 1 - India
apps.who.int
catalog.ihsn.org
+3more
Updated Oct 24, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Professor P. Arokiasamy (2013). Study on Global Ageing and Adult Health-2007, Wave 1 - India [Dataset]. https://apps.who.int/healthinfo/systems/surveydata/index.php/catalog/65
Explore at:
Dataset updated
Oct 24, 2013
Dataset authored and provided by
Professor P. Arokiasamy
Time period covered
2007
Area covered
India
Description
Abstract

Purpose: The multi-country Study on Global Ageing and Adult Health (SAGE) is run by the World Health Organization's Multi-Country Studies unit in the Innovation, Information, Evidence and Research Cluster. SAGE is part of the unit's Longitudinal Study Programme which is compiling longitudinal data on the health and well-being of adult populations, and the ageing process, through primary data collection and secondary data analysis. SAGE baseline data (Wave 0, 2002/3) was collected as part of WHO's World Health Survey http://www.who.int/healthinfo/survey/en/index.html (WHS). SAGE Wave 1 (2007/10) provides a comprehensive data set on the health and well-being of adults in six low and middle-income countries: China, Ghana, India, Mexico, Russian Federation and South Africa. Objectives: To obtain reliable, valid and comparable health, health-related and well-being data over a range of key domains for adult and older adult populations in nationally representative samples To examine patterns and dynamics of age-related changes in health and well-being using longitudinal follow-up of a cohort as they age, and to investigate socio-economic consequences of these health changes To supplement and cross-validate self-reported measures of health and the anchoring vignette approach to improving comparability of self-reported measures, through measured performance tests for selected health domains To collect health examination and biomarker data that improves reliability of morbidity and risk factor data and to objectively monitor the effect of interventions

Additional Objectives: To generate large cohorts of older adult populations and comparison cohorts of younger populations for following-up intermediate outcomes, monitoring trends, examining transitions and life events, and addressing relationships between determinants and health, well-being and health-related outcomes To develop a mechanism to link survey data to demographic surveillance site data To build linkages with other national and multi-country ageing studies To improve the methodologies to enhance the reliability and validity of health outcomes and determinants data To provide a public-access information base to engage all stakeholders, including national policy makers and health systems planners, in planning and decision-making processes about the health and well-being of older adults

Methods: SAGE's first full round of data collection included both follow-up and new respondents in most participating countries. The goal of the sampling design was to obtain a nationally representative cohort of persons aged 50 years and older, with a smaller cohort of persons aged 18 to 49 for comparison purposes. In the older households, all persons aged 50+ years (for example, spouses and siblings) were invited to participate. Proxy respondents were identified for respondents who were unable to respond for themselves. Standardized SAGE survey instruments were used in all countries consisting of five main parts: 1) household questionnaire; 2) individual questionnaire; 3) proxy questionnaire; 4) verbal autopsy questionnaire; and, 5) appendices including showcards. A VAQ was completed for deaths in the household over the last 24 months. The procedures for including country-specific adaptations to the standardized questionnaire and translations into local languages from English follow those developed by and used for the World Health Survey.

Content Household questionnaire 0000 Coversheet 0100 Sampling Information 0200 Geocoding and GPS Information 0300 Recontact Information 0350 Contact Record 0400 Household Roster 0450 Kish Tables and Household Consent 0500 Housing 0600 Household and Family Support Networks and Transfers 0700 Assets and Household Income 0800 Household Expenditures 0900 Interviewer Observations

Individual questionnaire 1000 Socio-Demographic Characteristics 1500 Work History and Benefits 2000 Health State Descriptions and Vignettes 2500 Anthropometrics, Performance Tests and Biomarkers 3000 Risk Factors and Preventive Health Behaviours 4000 Chronic Conditions and Health Services Coverage 5000 Health Care Utilization 6000 Social Cohesion 7000 Subjective Well-Being and Quality of Life (WHOQoL-8 and Day Reconstruction Method) 8000 Impact of Caregiving 9000 Interviewer Assessment

Geographic coverage

National coverage

Analysis unit

households and individuals

Universe

The household section of the survey covered all households in 19 of the 28 states in India which covers 96% of the population. Institutionalised populations are excluded. The individual section covered all persons aged 18 years and older residing within individual households.

Kind of data

Sample survey data [ssd]

Sampling procedure

World Health Survey Sampling India has 28 states and seven union territories. 19 of the 28 states were included in the design representing 96% of the population. India used a stratified multistage cluster sample design. Six states were selected in accordance with their geographic location and level of development. Strata were defined by the 6 states:(Assam, Karnataka, Maharashtra, Rajasthan, Uttar Pradesh and West Bengal), and locality (urban or rural). There are 12 strata in total. The 2000 Census demarcation was used as the sampling frame. Two stage and three stage sampling was adopted in rural and urban areas, respectively. In rural areas PSUs(villages) were selected probability proportional to size. The measure of size being the 2001 Census population in the village. SSUs (households) were selected using systematic sampling. TSUs (individuals) were selected using Kish tables. In urban areas, PSUs(city wards) were selected probability proportional to size. SSUs(census enumeration blocks), two were randomly selected from each PSU. TSU (households) were selected using systematic sampling. QSU (individuals) were selected as in rural areas. A sample of 379 EAs was selected as the primary sampling units(PSU).

SAGE Sampling The SAGE sample was pre-determined as all PSUs and households selected for the WHS/SAGE Wave 0 survey were included. Exceptions are three PSUs in Assam which were replaced as they were inaccessible due to flooding. And a further six PSUs were omitted for which the household roster information was not available. In each selected EA, a listing of the households was conducted to classify each household into the following mutually exclusive categories: 1)Households with a WHS/SAGE Wave 0 respondent aged 50-plus: all members aged 50-plus including the WHS/SAGE Wave 0 respondent were eligible for the individual interview. 2)Households with a WHS/SAGE Wave 0 respondent aged 47-49: all members aged 50-plus including the WHS/SAGE Wave 0 respondent aged 47-49 was eligible for the individual interview. 3)Households with a WHS/SAGE Wave 0 female respondent aged 18-46: all females members aged 18-49 including the WHS/SAGE Wave 0 female respondent aged 18-46 were eligible for the individual interview. 4)Households with a WHS/SAGE Wave 0 male respondent aged 18-46: three households were selected using systematic sampling and one male aged 18-49 was eligible for the individual interview. In the households not selected, all members aged 50-plus were eligible for the individual interview.

Stages of selection Strata: State, Locality=12 PSU: EAs=375 surveyed SSU: Households=10424 surveyed TSU: Individual=12198 surveyed

Mode of data collection

Face-to-face [f2f] PAPI

Research instrument

The questionnaires were based on the WHS Model Questionnaire with some modification and many new additions. A household questionnaire was administered to all households eligible for the study. A Verbal Autopsy questionnaire was administered to households that had a death in the last 24 months. An Individual questionniare was administered to eligible respondents identified from the household roster. A Proxy questionnaire was administered to individual respondents who had cognitive limitations. A Womans Questionnaire was administered to all females aged 18-49 years identified from the household roster. The questionnaires were developed in English and were piloted as part of the SAGE pretest in 2005. All documents were translated into Hindi, Assamese, Kanada and Marathi. SAGE generic questionnaires are available as external resources.

Cleaning operations

Data editing took place at a number of stages including: (1) office editing and coding (2) during data entry (3) structural checking of the CSPro files (4) range and consistency secondary edits in Stata

Response rate

Household Response rate=88% Cooperation rate=92%

Individual: Response rate=68% Cooperation rate=92%
All Stocks Data of Indian Stock Market(1 Year)
kaggle.com
Updated Jan 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KESHAV_MAHESHWARI (2022). All Stocks Data of Indian Stock Market(1 Year) [Dataset]. https://www.kaggle.com/datasets/gmkeshav/all-stocks-data-of-indian-stock-market1-year
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
KESHAV_MAHESHWARI
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
India
Description
After some rigorous SQL queries and coding on python. I made this dataset. In this dataset, all stocks of the Indian Stock Market are present a total of 2435 stocks. The data is of 1-year rows represent stock name and column represent date and I have filled the table with closing price. Enjoy and do some stock price predictions.
Mammal occurrence records (2020-23) in the Valparai Plateau and Anamalai...
zenodo.org
data.niaid.nih.gov
bin, csv, jpeg, txt
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T. R. Shankar Raman; T. R. Shankar Raman; Divya Mudappa; Divya Mudappa (2024). Mammal occurrence records (2020-23) in the Valparai Plateau and Anamalai Tiger Reserve, Western Ghats, India [Dataset]. http://doi.org/10.5281/zenodo.11903722
Explore at:
jpeg, csv, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11903722
Dataset updated
Oct 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
T. R. Shankar Raman; T. R. Shankar Raman; Divya Mudappa; Divya Mudappa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 17, 2024
Area covered
Valparai, Western Ghats, India
Description
This dataset contains Mammal occurrence records (January 2020 - June 2023) in the Valparai Plateau and Anamalai Tiger Reserve, Western Ghats, India. It includes a few occurrence records of reptiles. Occurrence records were gathered in the field by researchers of the Nature Conservation Foundation, India, using a mobile data collection application. Suggested citation is:
Nature Conservation Foundation (2024). Mammal occurrence records (2020-23) in the Valparai Plateau and Anamalai Tiger Reserve, Western Ghats, India. Nature Conservation Foundation, India. Dataset, Zenodo. DOI: 10.5281/zenodo.11903722

CONTACT #1
1. Name: T. R. Shankar Raman
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: trsr@ncf-india.org
5. ORCID: https://orcid.org/0000-0002-1347-3953

CONTACT #2
1. Name: Divya Mudappa
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: divya@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-9708-4826

Keywords: tropical rainforest, plantations, Anamalai Hills, Western Ghats, animal distribution, mammals

Geographic Coverage:
1. Location/Study Area: Valparai Plateau, Tamil Nadu, India; Anamalai Tiger Reserve, Tamil Nadu, India
2. GPS coordinates: Valparai Plateau (10°15'- 10°22'N, 76°52' - 76°59'E); Anamalai Tiger Reserve (10°12' - 10°35'N, 76°49' - 77°24'E)

Temporal Coverage:
1. Begins: 2020-01-11 (Year, Month, Day)
2. Ends: 2023-06-02 (Year, Month, Day)

Besides the 000_readMe.txt file containing this information, the dataset includes 60 images (photographs), three comma-delimited text (csv) files, and one R markdown text file with R code as explained below:
1) 001_mammalData.csv -- This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file

2) 002_placeLocs.csv -- This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy

3) 003_nameMatch.csv -- This file matches the name as originally recorded with the correct common name and scientific name

4) 004_mammup.Rmd -- R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)

+60 image files (with ".jpg" file extension)

FILES INCLUDED IN DATASET

001_mammdata.csv
This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file
recordedBy: Observer who recorded/made the observation
username: Username of person on whose mobile phone the data were noted
timestamp: Automatic time stamp of date and time when app was used
date: Date of observation
time: Time of observation
decimalLatitude: Latitude in decimal degrees N
decimalLongitude: Longitude in decimal degrees E
GPSaltitude: Altitude in metres
GPSaccuracy: Horizontal accuracy of GPS location in metres
place: Name of locality
habitat: Habitat type
species: Species common name
count: Number of individuals observed
countType: Total (solitary or fully counted groups) or Partial (incompletely counted groups)
obsType: Type of observation: sighting, sign (droppings or vocalisation), death, roadkill, electrocution, other
notes: Notes or remarks on observation
imageID: Image filename if available (NA, if not available)
instanceID: Automatically generated unique identifier of observation

002_placeLocs.csv
This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy
place: Name of locality as recorded
lat: Assigned latitude in decimal degrees N
long: Assigned longitude in decimal degrees E
GPSaccuracy: Assigned as 500 m – Horizontal accuracy of GPS location in metres

003_nameMatch.csv
This file matches the name as originally recorded with the correct common name and scientific name.
verbatimIdentification: Identification as originally recorded in the ‘species’ column of the mammdata.csv file
vernacularName: Common or engish name
scientificName: Scientific name

004_mammup.Rmd
R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)
e
Poverty and inner wellbeing: India and Zambia 2010-2014 - Dataset - B2FIND
b2find.eudat.eu
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Poverty and inner wellbeing: India and Zambia 2010-2014 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c2aaeedb-8c5a-51b2-81de-c64cafd29e00
Explore at:
Dataset updated
Oct 29, 2023
Area covered
Zambia, India
Description
The main method of the project was a survey interview from which both qualitative and quantitative data were collected. Field research was undertaken in marginalised rural communities in Zambia (Chiawa) and India (Sarguja district, Chhattisgarh state). Two rounds of fieldwork were undertaken in each place, in Zambia August–November 2010 (Zambia T1) and August–October 2012 (Zambia T2); in India February–May 2011 (India T1) and February–June 2013 (India T2). In both locations, we talked to husbands and wives (separately) and women heading households. In India we surveyed 340 people in 2011 and 368 in 2013. 187 respondents were interviewed in both rounds. 7% of respondents were single women. Qualitative data include 105 survey notes. In Zambia we surveyed 412 people in 2010 and 370 in 2012. These included 52 women heading households. 358 respondents were surveyed both years. Qualitative data include notes from 105 survey interviews. This research aims to identify pathways of wellbeing and poverty within rural communities in Zambia and India. It will demonstrate how poverty affects wellbeing and how different constellations of wellbeing in turn affect people's movements into, within and out of poverty. Drawing on the sociology of development and psychology, it adopts a mixed method, cross-cultural longitudinal approach, with qualitative and quantitative data collection across a two year interval, involving 700 respondents. Statistical tests assess the validity and reliability of our model of wellbeing. In-depth case studies provide a deeper sense of people's own understandings and experience. In particular, the research tests a key hypothesis that social and personal relationships constitute critical drivers of wellbeing in developing countries. The project is rooted in research-policy engagement. It involves partnership with NGOs committed to incorporating wellbeing into their programmes, and generates a broader programme of communications activities at national and global level. The Wellbeing and Poverty Pathways project developed a multi-dimensional model of wellbeing called “Inner Wellbeing” (IWB) which reflects what people think and feel they are able to be and do. The project explored relationships between people's subjective experiences of wellbeing and the external conditions in which they live their lives. Inner wellbeing comprises seven domains: economic confidence; agency and participation; social connections; close relationships; physical and mental health; competence and self-worth; values and meaning. It was constructed through a combination of theoretical reflection and empirical analysis in two rural communities, one in Zambia and one in India. The main research instrument was a survey which comprised three sections: an opening section on demographics and health; the central IWB section; and a final section on livelihoods and access to state services. Specifically for the central IWB section, the survey has five questions (or items) for each domain, which are designed to reflect different aspects of that domain. For each question respondents are asked to select one of five graduated answers. These are then scored on a scale from strong negative (1) to weak negative (2) to neutral (3) to weak positive (4) to strong positive wellbeing (5). The questions were extensively grounded and piloted to ensure they captured issues that were important to people’s lives locally. The studied population came from two rural areas of the Global South: Chiawa in Zambia and four villages in the Sarguja district of the Chhattisgarh state in India. No sample selection was applied. Instead, everyone in the study areas who would talk to us was interviewed. Chiawa is a Game Management Area (GMA), located in Kafue district, Lusaka province. To the south east it borders Zimbabwe and to the east the Lower Zambezi National Park. The majority population is Goba, a people-group that originated in what is now Zimbabwe. The research in India focused on four villages located in the historically remote hill and forest regions of northern Chhattisgarh. These villages were selected because they presented a range of contrasts. The communities there are extremely poor and people depend on (largely rainfed) farming, daily labour and gathering non-timber forest products to survive. Reflecting the area’s population as a whole, the majority of respondents (84%) are Adivasi, including Particularly Vulnerable Tribal Groups (PTG), with smaller numbers of Other Backward Caste (OBC) (15%) and Scheduled Caste (1%) people.
m
DEVANAGARI CAPTCHA DATASET OF 1 Million Images : A challenge Test
data.mendeley.com
ieee-dataport.org
Updated Apr 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SANJAY PATE (2023). DEVANAGARI CAPTCHA DATASET OF 1 Million Images : A challenge Test [Dataset]. http://doi.org/10.17632/knmbfjsdwn.1
Explore at:
Unique identifier
https://doi.org/10.17632/knmbfjsdwn.1
Dataset updated
Apr 5, 2023
Authors
SANJAY PATE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CAPTCHA (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). Only humans can successfully complete this test; current computer systems cannot. It is utilized in several applications for both human and machine identification. Text-based CAPTCHAs are the most typical type used on websites. Most of the letters in this protected CAPTCHA script are in English, it is challenging for rural residents who only speak their native tongues to pass the test. Devanagari characters have more complex characters than standard English characters and numeral-based CAPTCHAs, which makes machine recognition much more difficult. The majority of official websites in India only offer information in Devanagari. Unfortunately, websites do not use Devanagari CAPTCHAs.As a result, we have created a new text-based CAPTCHA in Devanagari script in this article. A computer/printed font and handwritten Devanagari character(34 each) and number(10 each) , in total 44+44 = 88 character images are used to design CAPTCHA. General CAPTCHA generation principles are used to add noise to the image using digital image processing techniques. Size of each CAPTCHA image is 250 X 90 pixels. 04 (Four) types of Character Sets are used – Printed Alphabet(34), Handwritten Alphabet(34), Printed Digit(10), and Handwritten Digit(10). Generated 11 Classes from these 04 combinations. The string length of the CAPTCHA image considered here is FIVE, SIX, and SEVEN ( 5, 6, 7). For each class – 03 (THREE) subclasses are created depending upon string length. In total there are 11 classes X 3 subclasses = 33 subclasses. So 33 types of CAPTCHA images were generated. For each class, 10,000 CAPTCHA images were created. For 11 Classes X 10,000 images , a Devanagari CAPTCHA Data set of 1,10,000 ( One Million Ten Thousand) images were created using Python. To make the CAPTCHA image less recognized or not easily broken. Passing a test with identifying Devanagari alphabets is difficult. It is beneficial to researchers who are investigating captcha recognition in this area. This dataset is helpful to researcher to design OCR for recognize Devanagari CAPTCHA and break it.
Indian Candidates for General Election 2019
kaggle.com
Updated Mar 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prakrut Chauhan (2020). Indian Candidates for General Election 2019 [Dataset]. https://www.kaggle.com/prakrutchauhan/indian-candidates-for-general-election-2019/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 3, 2020
Dataset provided by
Kaggle
Authors
Prakrut Chauhan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
India
Description
Context

With over 600 Million voters voting for 8500+ candidates across 543 constituencies, the general elections in the world's largest democracy are a potential goldmine of data. While there are existing separate datasets about the votes each candidate received and the personal information of each candidate, there was no comprehensive dataset that included both these information. Thus, this dataset will provide more usability than most existing datasets in this domain.

Content

I scraped the website of myneta.info to get the personal information of each candidate (as per their own sworn affidavits) and the website of Election Commission of India to get the data about the votes received. I merged both this datasets to create this comprehensive dataset. Only the candidates who secured at least 1% of the total votes polled in their constituency have been included.

Acknowledgements

I have collected the data from MyNeta.info maintained by the Association for Democratic Reforms and the website of Election Commission of India.

Inspiration

There are 2 main tasks that can be performed on this dataset: Exploratory Data Analytics to visualize the impact of each feature of the candidate and the use of machine learning to predict the chances of winning of a candidate.

Facebook

Twitter

Click to copy link

Link copied

Cite

Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india

🦈 Shark Tank India dataset 🇮🇳

Shark Tank India data set, includes Season 1 to Season 4 information

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 20, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Satya Thirumani

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Shark Tank India Data set.

Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

Here is the data dictionary for (Indian) Shark Tank season's dataset.

Season Number - Season number
Startup Name - Company name or product name
Episode Number - Episode number within the season
Pitch Number - Overall pitch number
Season Start - Season first aired date
Season End - Season last aired date
Original Air Date - Episode original/first aired date, on OTT/TV
Episode Title - Episode title in SonyLiv
Anchor - Name of the episode presenter/host
Industry - Industry name or type
Business Description - Business Description
Company Website - Company Website URL
Started in - Year in which startup was started/incorporated
Number of Presenters - Number of presenters
Male Presenters - Number of male presenters
Female Presenters - Number of female presenters
Transgender Presenters - Number of transgender/LGBTQ presenters
Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no
Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old
Pitchers City - Presenter's town/city or place where company head office exists
Pitchers State - Indian state pitcher hails from or state where company head office exists
Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue
Monthly Sales - Total monthly sales, in lakhs
Gross Margin - Gross margin/profit of company, in percentages
Net Margin - Net margin/profit of company, in percentages
EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization
Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)
SKUs - Stock Keeping Units or number of varieties, at the time of pitch
Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch
Bootstrapped - Startup is bootstrapped or not (yes/no)
Part of Match off - Competition between two similar brands, pitched at same time
Original Ask Amount - Original Ask Amount, in lakhs INR
Original Offered Equity - Original Offered Equity, in percentages
Valuation Requested - Valuation Requested, in lakhs INR
Received Offer - Received offer or not, 1-received, 0-not received
Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected
Total Deal Amount - Total Deal Amount, in lakhs INR
Total Deal Equity - Total Deal Equity, in percentages
Total Deal Debt - Total Deal debt/loan amount, in lakhs INR
Debt Interest - Debt interest rate, in percentages
Deal Valuation - Deal Valuation, in lakhs INR
Number of sharks in deal - Number of sharks involved in deal
Deal has conditions - Deal has conditions or not? (yes or no)
Royalty Percentage - Royalty percentage, if it's royalty deal
Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs
Advisory Shares Equity - Deal with Advisory shares or equity, in percentages
Namita Investment Amount - Namita Investment Amount, in lakhs INR
Namita Investment Equity - Namita Investment Equity, in percentages
Namita Debt Amount - Namita Debt Amount, in lakhs INR
Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR
Vineeta Investment Equity - Vineeta Investment Equity, in percentages
Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR
Anupam Investment Amount - Anupam Investment Amount, in lakhs INR
Anupam Investment Equity - Anupam Investment Equity, in percentages
Anupam Debt Amount - Anupam Debt Amount, in lakhs INR
Aman Investment Amount - Aman Investment Amount, in lakhs INR
Aman Investment Equity - Aman Investment Equity, in percentages
Aman Debt Amount - Aman Debt Amount, in lakhs INR
Peyush Investment Amount - Peyush Investment Amount, in lakhs INR
Peyush Investment Equity - Peyush Investment Equity, in percentages
Peyush Debt Amount - Peyush Debt Amount, in lakhs INR
Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR
Ritesh Investment Equity - Ritesh Investment Equity, in percentages
Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR
Amit Investment Amount - Amit Investment Amount, in lakhs INR
Amit Investment Equity - Amit Investment Equity, in percentages
Amit Debt Amount - Amit Debt Amount, in lakhs INR
Guest Investment Amount - Guest Investment Amount, in lakhs INR
Guest Investment Equity - Guest Investment Equity, in percentages
Guest Debt Amount - Guest Debt Amount, in lakhs INR
Invested Guest Name - Name of the guest(s) who invested in deal
All Guest Names - Name of all guests, who are present in episode
Namita Present - Whether Namita present in episode or not
Vineeta Present - Whether Vineeta present in episode or not
Anupam ...

Clear search

Close search

Google apps

Main menu

🦈 Shark Tank India dataset 🇮🇳

Shark Tank India Data set.

PREDIK Data-Driven: Geospatial Data | USA | Tailor-made datasets: Foot...

Medicinal Leaf Dataset

Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

Indo-German Literature Dataset

Description

Paper

Usage

Description of Selection and Cleaning

Column Descriptions

Information

Contact Information

Publication Information

‘Shark Tank India Companies’ analyzed by Analyst-2

Context

Content

Acknowledgements

Global Dated Landslide Data Base during Sentinel-2 satellite data...

Vadu HDSS INDEPTH Core Dataset 2009 - 2015 (Release 2017) - India

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Frequency of data collection

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data appraisal

Hindi Wake Words & Voice Commands Speech Data

Introduction

Speech Data

Participant Diversity

Recording Details

Dataset Diversity

Metadata

Use Cases & Applications

India - National Family Health Survey 1998-1999 - Dataset - waterdata

Waste Management and Recycling in Indian Cities

About the Dataset: Waste Management and Recycling in India

Overview:

Purpose:

Columns:

Applications:

Sources:

#IndiaNeedsOxygen Tweets

India marks one COVID-19 death every 5 minutes

Content

Dataset

Acknowledgements

Inspiration

Indian Railways Latest

Content

Acknowledgements

Inspiration

Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats,...

Study on Global Ageing and Adult Health-2007, Wave 1 - India

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

All Stocks Data of Indian Stock Market(1 Year)

Mammal occurrence records (2020-23) in the Valparai Plateau and Anamalai...

Poverty and inner wellbeing: India and Zambia 2010-2014 - Dataset - B2FIND

DEVANAGARI CAPTCHA DATASET OF 1 Million Images : A challenge Test

Indian Candidates for General Election 2019

Context

Content

Acknowledgements

Inspiration