Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes specimens originating from India in the collections at the Natural History Museum at the University of Oslo (NHM-UiO).
Animals: The mammal collection includes 108 specimens (mounted animals, skulls or skins, sometimes from the same individuals) from ‘India’. Insofar these are dated at all, they originate from the 19th century. No further collecting information is preserved. These data are already available from the GBIF portal (and not included in this dataset). Oslo has 1027 bird specimens from India, both skins and mounted and demounted specimens. These are either not dated or originate from the 19th century or the first half of the 20th century. Locality is sometimes recorded at the region or district level, with relatively many specimens from Darjeeling. More detailed collecting data are missing. Most have been collected by Englishmen, some of whom have had an important role in Indian ornithology. These skins may therefore be of particular historical value. Notable are 295 skins labeled as being collected by ‘Blyth’. This name most likely refers to the English zoologist Edward Blyth (1810 –1873), who was one of the founders of zoology in India (cf. Wikipedia lemma Edward Blyth). Another known name is Henry Seebohm (1832-1895), to whom twelve skins are attributed (misspelled in one case as Subohm). The bird data are not yet published in GBIF. The fish collection contains 34 databased specimens. The Staphylinidae beetle collection includes 508 specimens from India which are not yet identified to species level. The Hymenoptera collection includes 130 pinned specimens originating from the collection of Charles Thomas Bingham (1848-1908). These have been collected in Sikkim. In addition there are 7 Hymenoptera and 1 Orthoptera originating from the Deinboll collection, all labelled Trankebar. Some of these may represent types of taxa described by J.C. Fabricius (1745-1808). These collections are not yet digitised. There are virtually no Lepidoptera or Diptera from India in Oslo. Finally, the museum holds circa 10 crustacean specimens and 3 molluscs.
Plants: There is a small digitised collection of 89 vascular plants from Himachal Pradesh and Maharashtra provinces. These were deposited by the Indian student B. Natarajan who studied in Oslo in the 1990s. In addition, the older vascular plant type collection in Oslo has been digitised. This includes 12 older type specimens from India. Most of the herbarium has not been digitised, however. It may contain between 5 000 and 10 000 specimens from India. These are currently difficult to locate as the herbarium is organised in taxonomic rather than geographic units. The museum intends to digitise the herbarium at a level that would enable the retrieval of taxa per continent or even per country. This enterprise is still in the planning phase, however. Likewise the bryophyte and algae collections might contain material from India, but this can only be retrieved after digitisation. Some of these records are published to GBIF as a separate dataset. Oslo probably holds no Indian fungi. The digitisation of the Oslo lichen herbarium is ongoing. Currently 34 specimens from India are visible in the GBIF portal (and not included i this dataset). This number may increase to circa 100 once the entire lichen herbarium is digitised. Most of these have been collected after 1950 and have rather complete collecting data. The botanical garden in Oslo has 6 living plants originating from India.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset provides detailed information on road surfaces from OpenStreetMap (OSM) data, distinguishing between paved and unpaved surfaces across the region. This information is based on road surface prediction derived from hybrid deep learning approach. For more information on Methods, refer to the paper
Roughly 4.8023 million km of roads are mapped in OSM in this region. Based on AI-mapped estimates the share of paved and unpaved roads is approximately 0.5281 and 0.2874 (in million kms), corressponding to 10.9979% and 5.9838% respectively of the total road length in the dataset region. 3.9868 million km or 83.0183% of road surface information is missing in OSM. In order to fill this gap, Mapillary derived road surface dataset provides an additional 0.0218 million km of information (corressponding to 0.5461% of total missing information on road surface)
It is intended for use in transportation planning, infrastructure analysis, climate emissions and geographic information system (GIS) applications.
This dataset provides comprehensive information on road and urban area features, including location, surface quality, and classification metadata. This dataset includes attributes from OpenStreetMap (OSM) data, AI predictions for road surface, and urban classifications.
AI features:
pred_class: Model-predicted class for the road surface, with values "paved" or "unpaved."
pred_label: Binary label associated with pred_class
(0 = paved, 1 = unpaved).
osm_surface_class: Classification of the surface type from OSM, categorized as "paved" or "unpaved."
combined_surface_osm_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing the OSM surface tag, classified as "paved" or "unpaved."
combined_surface_DL_priority: Surface classification combining pred_label
and surface
(OSM) while prioritizing DL prediction pred_label
, classified as "paved" or "unpaved."
n_of_predictions_used: Number of predictions used for the feature length estimation.
predicted_length: Predicted length based on the DL model’s estimations, in meters.
DL_mean_timestamp: Mean timestamp of the predictions used, for comparison.
OSM features may have these attributes(Learn what tags mean here):
name: Name of the feature, if available in OSM.
name:en: Name of the feature in English, if available in OSM.
name:* (in local language): Name of the feature in the local official language, where available.
highway: Road classification based on OSM tags (e.g., residential, motorway, footway).
surface: Description of the surface material of the road (e.g., asphalt, gravel, dirt).
smoothness: Assessment of surface smoothness (e.g., excellent, good, intermediate, bad).
width: Width of the road, where available.
lanes: Number of lanes on the road.
oneway: Indicates if the road is one-way (yes or no).
bridge: Specifies if the feature is a bridge (yes or no).
layer: Indicates the layer of the feature in cases where multiple features are stacked (e.g., bridges, tunnels).
source: Source of the data, indicating the origin or authority of specific attributes.
Urban classification features may have these attributes:
continent: The continent where the data point is located (e.g., Europe, Asia).
country_iso_a2: The ISO Alpha-2 code representing the country (e.g., "US" for the United States).
urban: Binary indicator for urban areas based on the GHSU Urban Layer 2019. (0 = rural, 1 = urban)
urban_area: Name of the urban area or city where the data point is located.
osm_id: Unique identifier assigned by OpenStreetMap (OSM) to each feature.
osm_type: Type of OSM element (e.g., node, way, relation).
The data originates from OpenStreetMap (OSM) and is augmented with model predictions using images downloaded from Mapillary in combination with the GHSU Global Human Settlement Urban Layer 2019 and AFRICAPOLIS2020 urban layer.
This dataset is one of many HeiGIT exports on HDX. See the HeiGIT website for more information.
We are looking forward to hearing about your use-case! Feel free to reach out to us and tell us about your research at communications@heigit.org – we would be happy to amplify your work.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Big Brands spend a significant amount on popularizing a product. Nevertheless, their efforts go in vain while establishing the merchandise in the hyperlocal market. Based on different geographical conditions same attributes can communicate a piece of much different information about the customer. Hence, insights this is a must for any brand owner.
In this competition, we have brought the data gathered from one of the top apparel brands in India. Provided the details concerning category, score, and presence in the store, participants are challenged to predict the popularity level of the merchandise.
The popularity class decides how popular the product is given the attributes which a store owner can control to make it happen.
Train.csv - 18208 rows x 12 columns (Includes popularity Column as Target variable) Test.csv - 12140 rows x 11 columns Sample Submission.csv - Please check the Evaluation section for more details on how to generate a valid submission
store_ratio basket_ratio category_1 store_score category_2 store_presence score_1 score_2 score_3 score_4 time popularity - Class of popularity (Target Column)
Multi-class Classification Modeling Advance Feature engineering Optimizing Multi-Class log loss score as a metric to generalize well on unseen data
Top-3 winners will get MLDS 2021 passes MLDS (Machine Learning Developer's Summit) INDIA’S NO.1 CONFERENCE EXCLUSIVELY FOR MACHINE LEARNING PRACTITIONERS ECOSYSTEM MLDS21 brings together India’s leading Machine Learning innovators and practitioners to share their ideas and experience about machine learning tools, advanced development in this sphere and gives the attendees a first look at new trends & developer products.
Use y_true as provided as class Labels(y_true) as predicted probabilities per class (y_pred) from the model using the predict_proba() method
You should submit a .csv/.xlsx file with exactly 12140 rows with 5 columns (i.e. 0, 1, 2, 3, 4). Your submission will return an Invalid Score if you have extra columns or rows.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5602038%2Ffacdb791dcf4105ce5e606087c0cf8cc%2Fxyz.png?generation=1611324853494826&alt=media" alt="">
The file should have exactly 5 columns.
Using pandas, one can do
submission_df.to_csv('my_submission_file.csv', index=False)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.
Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html
The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.
Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).
In the following, we describe how the search results have been collected.
Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.
To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.
A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.
The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).
Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.
The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.
Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.
The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.
One term of usage applies:
In any research product whose findings are based on this dataset, please cite
@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The second National Family Health Survey (NFHS-2), conducted in 1998-99, provides information on fertility, mortality, family planning, and important aspects of nutrition, health, and health care. The International Institute for Population Sciences (IIPS) coordinated the survey, which collected information from a nationally representative sample of more than 90,000 ever-married women age 15-49. The NFHS-2 sample covers 99 percent of India's population living in all 26 states. This report is based on the survey data for 25 of the 26 states, however, since data collection in Tripura was delayed due to local problems in the state. IIPS also coordinated the first National Family Health Survey (NFHS-1) in 1992-93. Most of the types of information collected in NFHS-2 were also collected in the earlier survey, making it possible to identify trends over the intervening period of six and one-half years. In addition, the NFHS-2 questionnaire covered a number of new or expanded topics with important policy implications, such as reproductive health, women's autonomy, domestic violence, women's nutrition, anaemia, and salt iodization. The NFHS-2 survey was carried out in two phases. Ten states were surveyed in the first phase which began in November 1998 and the remaining states (except Tripura) were surveyed in the second phase which began in March 1999. The field staff collected information from 91,196 households in these 25 states and interviewed 89,199 eligible women in these households. In addition, the survey collected information on 32,393 children born in the three years preceding the survey. One health investigator on each survey team measured the height and weight of eligible women and children and took blood samples to assess the prevalence of anaemia. SUMMARY OF FINDINGS POPULATION CHARACTERISTICS Three-quarters (73 percent) of the population lives in rural areas. The age distribution is typical of populations that have recently experienced a fertility decline, with relatively low proportions in the younger and older age groups. Thirty-six percent of the population is below age 15, and 5 percent is age 65 and above. The sex ratio is 957 females for every 1,000 males in rural areas but only 928 females for every 1,000 males in urban areas, suggesting that more men than women have migrated to urban areas. The survey provides a variety of demographic and socioeconomic background information. In the country as a whole, 82 percent of household heads are Hindu, 12 percent are Muslim, 3 percent are Christian, and 2 percent are Sikh. Muslims live disproportionately in urban areas, where they comprise 15 percent of household heads. Nineteen percent of household heads belong to scheduled castes, 9 percent belong to scheduled tribes, and 32 percent belong to other backward classes (OBCs). Two-fifths of household heads do not belong to any of these groups. Questions about housing conditions and the standard of living of households indicate some improvements since the time of NFHS-1. Sixty percent of households in India now have electricity and 39 percent have piped drinking water compared with 51 percent and 33 percent, respectively, at the time of NFHS-1. Sixty-four percent of households have no toilet facility compared with 70 percent at the time of NFHS-1. About three-fourths (75 percent) of males and half (51 percent) of females age six and above are literate, an increase of 6-8 percentage points from literacy rates at the time of NFHS-1. The percentage of illiterate males varies from 6-7 percent in Mizoram and Kerala to 37 percent in Bihar and the percentage of illiterate females varies from 11 percent in Mizoram and 15 percent in Kerala to 65 percent in Bihar. Seventy-nine percent of children age 6-14 are attending school, up from 68 percent in NFHS-1. The proportion of children attending school has increased for all ages, particularly for girls, but girls continue to lag behind boys in school attendance. Moreover, the disparity in school attendance by sex grows with increasing age of children. At age 6-10, 85 percent of boys attend school compared with 78 percent of girls. By age 15-17, 58 percent of boys attend school compared with 40 percent of girls. The percentage of girls 6-17 attending school varies from 51 percent in Bihar and 56 percent in Rajasthan to over 90 percent in Himachal Pradesh and Kerala. Women in India tend to marry at an early age. Thirty-four percent of women age 15-19 are already married including 4 percent who are married but gauna has yet to be performed. These proportions are even higher in the rural areas. Older women are more likely than younger women to have married at an early age: 39 percent of women currently age 45-49 married before age 15 compared with 14 percent of women currently age 15-19. Although this indicates that the proportion of women who marry young is declining rapidly, half the women even in the age group 20-24 have married before reaching the legal minimum age of 18 years. On average, women are five years younger than the men they marry. The median age at marriage varies from about 15 years in Madhya Pradesh, Bihar, Uttar Pradesh, Rajasthan, and Andhra Pradesh to 23 years in Goa. As part of an increasing emphasis on gender issues, NFHS-2 asked women about their participation in household decisionmaking. In India, 91 percent of women are involved in decision-making on at least one of four selected topics. A much lower proportion (52 percent), however, are involved in making decisions about their own health care. There are large variations among states in India with regard to women's involvement in household decisionmaking. More than three out of four women are involved in decisions about their own health care in Himachal Pradesh, Meghalaya, and Punjab compared with about two out of five or less in Madhya Pradesh, Orissa, and Rajasthan. Thirty-nine percent of women do work other than housework, and more than two-thirds of these women work for cash. Only 41 percent of women who earn cash can decide independently how to spend the money that they earn. Forty-three percent of working women report that their earnings constitute at least half of total family earnings, including 18 percent who report that the family is entirely dependent on their earnings. Women's work-participation rates vary from 9 percent in Punjab and 13 percent in Haryana to 60-70 percent in Manipur, Nagaland, and Arunachal Pradesh. FERTILITY AND FAMILY PLANNING Fertility continues to decline in India. At current fertility levels, women will have an average of 2.9 children each throughout their childbearing years. The total fertility rate (TFR) is down from 3.4 children per woman at the time of NFHS-1, but is still well above the replacement level of just over two children per woman. There are large variations in fertility among the states in India. Goa and Kerala have attained below replacement level fertility and Karnataka, Himachal Pradesh, Tamil Nadu, and Punjab are at or close to replacement level fertility. By contrast, fertility is 3.3 or more children per woman in Meghalaya, Uttar Pradesh, Rajasthan, Nagaland, Bihar, and Madhya Pradesh. More than one-third to less than half of all births in these latter states are fourth or higher-order births compared with 7-9 percent of births in Kerala, Goa, and Tamil Nadu. Efforts to encourage the trend towards lower fertility might usefully focus on groups within the population that have higher fertility than average. In India, rural women and women from scheduled tribes and scheduled castes have somewhat higher fertility than other women, but fertility is particularly high for illiterate women, poor women, and Muslim women. Another striking feature is the high level of childbearing among young women. More than half of women age 20-49 had their first birth before reaching age 20, and women age 15-19 account for almost one-fifth of total fertility. Studies in India and elsewhere have shown that health and mortality risks increase when women give birth at such young ages?both for the women themselves and for their children. Family planning programmes focusing on women in this age group could make a significant impact on maternal and child health and help to reduce fertility. INFANT AND CHILD MORTALITY NFHS-2 provides estimates of infant and child mortality and examines factors associated with the survival of young children. During the five years preceding the survey, the infant mortality rate was 68 deaths at age 0-11 months per 1,000 live births, substantially lower than 79 per 1,000 in the five years preceding the NFHS-1 survey. The child mortality rate, 29 deaths at age 1-4 years per 1,000 children reaching age one, also declined from the corresponding rate of 33 per 1,000 in NFHS-1. Ninety-five children out of 1,000 born do not live to age five years. Expressed differently, 1 in 15 children die in the first year of life, and 1 in 11 die before reaching age five. Child-survival programmes might usefully focus on specific groups of children with particularly high infant and child mortality rates, such as children who live in rural areas, children whose mothers are illiterate, children belonging to scheduled castes or scheduled tribes, and children from poor households. Infant mortality rates are more than two and one-half times as high for women who did not receive any of the recommended types of maternity related medical care than for mothers who did receive all recommended types of care. HEALTH, HEALTH CARE, AND NUTRITION Promotion of maternal and child health has been one of the most important components of the Family Welfare Programme of the Government of India. One goal is for each pregnant woman to receive at least three antenatal check-ups plus two tetanus toxoid injections and a full course of iron and folic acid supplementation. In India, mothers of 65 percent of the children born in the three years preceding NFHS-2 received at least one antenatal
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
North-east region of India is one of the significant biodiversity hotspot. Being, one of the richest bird area, it is an important routes for migratory birds and home to many endemic birds. This paper describes the literature based dataset of species occurrences of birds of the north-eastern India. The occurrence records documented in the dataset are distributed across eleven provinces of India, viz. Arunachal Pradesh, Assam, Bihar, Manipur, Meghalaya, Mizoram, Nagaland, Sikkim, Tripura, Uttar Pradesh and West Bengal. The geospatial scope of the dataset represents 24 to 29 degree North latitude and 78 to 94 degree East longitude and comprises of over 2400 occurrence records. These records have been collated from scholarly literature published during 1915 – 2008, especially the Journal of the Bombay Natural History Society (JBNHS). The temporal scale of the dataset represents bird observations recorded during 1909 – 2007. The dataset has been developed by employing MS Excel. The key elements in the database are scientific name, taxonomic classification, temporal and geospatial details including geo-coordinate precision, data collector, basis of record, and primary source of the data record. The temporal and geospatial quality of more than 50% of the data records has been enhanced retrospectively. Where possible, data records are annotated with geospatial coordinate precision to the nearest minute. This dataset is being constantly updated with addition of new data records, and quality enhancement of documented occurrences. The dataset can be used in species distribution and niche modeling studies. It is planned to expand the scope of the dataset to collate bird species occurrences across Indian peninsular.
Vadu Rural Health Program, KEM Hospital Research Centre Pune has a rich tradition in health care and development being in the forefront of needs-based, issue-driven research over almost 35 years. During the decades of 1980 and 1990 the research at Vadu focused on mother and child with epidemiological and social science research exploring low birth weight, child survival, maternal mortality, safe abortion and domestic violence. The research portfolio has ever since expanded to include adult health and aging, non-communicable and communicable diseases and to clinical trials in recent years. It started with establishment of Health and Demographic Surveillance System at Vadu (HDSS Vadu) in August, 2002 that seeks to establish a quasi-experimental design setting to allow evaluation of impact of health interventions as well as monitor secular trends in diseases, risk factors and health behavior of humans.
The term "demographic surveillance" means to keep close track of the population dynamics. Vadu HDSS deals with keeping track of health issues and demographic changes in Vadu rural health program (VRHP) area. It is one of the most promising projects of national relevance that aims at establishing a quasi-experimental intervention research setting with the following objectives: 1) To create a longitudinal data base for efficient service delivery, future research, and linking all past micro-studies in Vadu area 2) Monitoring trends in public health problems 3) Keeping track of population dynamics 4) Evaluating intervention services
This dataset contains the events of all individuals ever resident during the study period (1 Jan. 2009 to 31 Dec. 2015).
Vadu HDSS falls in two administrative blocks: (1) Shirur and (2) Haweli of Pune district in Maharashtra in western India. It covers an area of approximately 232 square kilometers.
Individual
Vadu HDSS covers as many as 50,000 households having 140,000 population spread across 22 villages.
Event history data
Two rounds per year
Vadu area including 22 villages in two administrative blocks is the study area. This area was selected as this is primarily coverage area of Vadu Rural Health Program which is in function since more than four decade. Every individual household is included in HDSS. There is no sampling strategy employed as 100% population coverage in the area is expected.
Proxy Respondent [proxy]
Language of communication is in Marath or Hindi. The form labels are multilingual - in English and Marathi, but the data entered through the forms are in English only.
The following forms were used:
- Field Worker Checklist Form - The checklist provides a guideline to ensure that all the households are covered during the round and the events occurred in each household are captured.
- Enumeration Form: To capture the population details at the start of the HDSS or any addition of villages afterwards.
- Pregnancy Form: To capture pregnancy details of women in the age group 15 to 49.
- Birth Form: To capture the details of the birth events.
- Inmigration Form: To capture inward population movement from outside the HDSS area and also for movement within the HDSS area.
- Outmigration Form: To capture outward population movement from inside the HDSS area and also for movement within the HDSS area.
- Death Form: To capture death events.
Entered data undergo a data cleaning process. During the cleaning process all error data are either corrected in consultaiton with the data QC team or the respective forms are sent back to the field for re collection of correct data. Data editors have the access to the raw dataset for making necessary editing after corrected data are bought from the field.
For all individuals whose enumeration (ENU), Inmigration (IMG) or Birth (BTH) have occurred before the left censoring date (2009-01-01) and have not outmigrated (OMG) or not died (DTH) before the left censoring date (2009-01-01) are included in the dataset as Enumeration (ENU) with EventDate as the left censored date (2009-01-01). But the actual date of observation of the event (ENU, BTH, IMG) is retained in the dataset as observation date for these left censored ENU events. The individual is dropped from the dataset if their end event (OMG or DTH) is prior to the left censoring date (2009-01-01)
On an average the response rate is 99.99% in all rounds over the years.
Not Applicable
Data is cleaned to an acceptable level against the standard data rules using Pentaho Data Integration Comminity Edition (PDI CE) tool. After the cleaning process, quality metrics were as follows:
CentreId MetricTable QMetric Illegal Legal Total Metric RunDate
IN021 MicroDataCleaned Starts 1 301112 301113 0. 2017-05-31 20:06
IN021 MicroDataCleaned Transitions 0 667010 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned Ends 301113 2017-05-31 20:07
IN021 MicroDataCleaned SexValues 29 666981 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned DoBValues 575 666435 667010 0. 2017-05-31 20:07
Note: Except lower under five mortality in 2012 and lower adult mortality among females in 2013, all other estimates are fairly within expected range. Data underwent additional review in terms of electronic data capture, data cleaning and management to look for reasons for lower under five mortality rates in 2013 and lower female adult mortality in 2013. The additional review returned marginally higher rates and this supplements the validity of collected data. Further field related review of 2012 and 2013 data are underway and any revisions to published data/figures will be shared at a later stage.
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Welcome to the Hindi Wake Word & Command Dataset, meticulously designed to advance the development and accuracy of voice-activated systems. This dataset features an extensive collection of wake words and commands, essential for triggering and interacting with voice assistants and other voice-activated devices. Our dataset ensures these systems respond promptly and accurately to user inputs, enhancing their reliability and user experience.
This training dataset comprises over 20,000 audio recordings of wake words and command phrases designed to build robust and accurate voice assistant speech technology. Each participant recorded 400 recordings in diverse environments and at varying speeds. This dataset contains audio recordings of wake words, as well as wake words followed by commands.
This dataset includes recordings of various types of wake words and commands, in different environments and at different speeds, making it highly diverse.
This extensive coverage ensures the dataset includes realistic scenarios, which is essential for developing effective voice assistant speech recognition models.
The dataset provides comprehensive metadata for each audio recording and participant:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is one which highlights the demographics of Upper-Middle Class people living in Gachibowli, Hyderabad, India and attempts to, through various methods of statistical analysis, establish a relationship between several of these demographic details.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Since 2001, DiversityIndia has been a bustling online community for citizen scientists in India. Since 2008, they have coordinated diverse taxa-based diversity documentation in India in partnership with Nature Mates. The Sundarbans, one of the world's most elusive habitats, hosted a multi-taxon meet for the first time. The Indian Sundarban is a tidal influence zone where the tide has a significant impact on the local wildlife. In addition to the estuarine crocodile and a wide variety of other fauna, this is the only area where it is possible to see Bengal Tigers in the wild. The ecosystem is unique because it supports mangroves, plants with an alternate root system. This forest's floral inhabitants not only give it a distinctive appearance, but also one of the most exquisite habitats on earth. The Indian Sundarbans are connected to their counterpart in Bangladesh, making them together the world's largest mangrove environment. On the morning of April 16, 2022, the 33-person crew (including local camp staffs) convened on Bali Island, one of the inhabited islands of the Indian Sundarbans. Following a formal introduction, the group broke out into smaller groups of like-minded taxonomic enthusiasts and began recording each one using location, time, and date data. Those that were not immediately recognised in the field were later recognised from photographs taken there with the assistance of numerous experts in diverse taxa. The third day was devoted to collecting information about the agricultural fields that were dispersed around the hamlet on Bali Island and documenting the many forms of biodiversity that were seen. Like the team did on the first day, data were gathered as they were needed. The majority of the plant data were gathered on this day, encompassing the mangrove plants in the intertidal zones scattered around the village's periphery. The following day (2022-04-18), the entire team used a patrol boat to go out into the forest and spend the entire day visiting a few Forest Camps. Additionally, data were collected while the camp was in transit, between locations, and until our camp returned to Bali Island. The current data set, which is a first-of-its-kind data set from the Indian Sundarbans covering so many diverse taxon, is the outcome of the work done during the aforementioned field trip.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes specimens originating from India in the vascular plant herbarium, the Natural History Museum at the University of Oslo (NHM-UiO).
Introduction: As the country of India has banned the export of plant specimens, specimens collected in India prior to the ban are of high value for the research community. This report is from a survey of the herbarium sheets in the vascular plant herbarium of Oslo which ended in an inventory of plants from India present in the herbarium. Due to ongoing mass digitization of the vascular plant herbarium in Oslo, the angiosperms were only surveyed up to and including the family of Cyperaceae. The specimens were primarily collected in the mid or late 19.th century. Whereas most are personal collections and some are from “flora exsiccata”, which were herbarium sheets sold by sellers that did not necessarily collect the specimens themselves.
The collectors: One of the earliest collectors which has donated material to Oslo was Nathaniel Wallich (1786-1854). Unfortunately he did not specify where in India or when the collection was made. But it is known that he arrived in India in 1807, and that he retired aroud 1846. It is also known that he spent most of the time in or around Calcutta, but he also made several expeditions to what is now known as neighboring countries of India. Wallich described many new species on his journey, and some of the specimens might therefor be type specimens. The most famous, and important of the collectors may be Joseph Dalton Hooker(1817-1911), who collected along with Thomas Thomson (1817-1878), mainly in the Khasia area. They also split up and went for individual surveys where J.D. Hooker went to Sikkim, and T. Thomson went to the western Himalaya and Punjab. J.D. Hooker is known to have collected and described many new species from India, which means that there might be some hidden syntypes in the material present at the herbarium in Oslo. The collections are not numbered which might further complicate the decision of the status of the specimens. J.D. Hooker's main affilation was to the botanical garden in Kew, where more of their material can be found. However, the value of the collections in Oslo is high as it might add to what is found in Kew. Their journeys are well described, so an approxiamate date to their collections can be found by tracing their journey. Major/Captain Francis Jenkins (1763-1866) and John William Masters (1792-1873) collected plant specimens in the area of Assam. F. Jenkins is credited for beeing the one that discovered the tea plant in Assam. Most of the collections present in Oslo was made by J.W. Masters, with little of no additional information other than the label shared with F. Jenkins "Coll. Jenkins Plants of Assam". They apparently did more collecting than describing. Thomas Anderson (1832-1870) is represented with a few collections in the herbarium. It seems like some former worker in the herbaria of Oslo confused him with the Swedish botanist Nils Johan Andersson (1821-1880), which was on a circumference within approximately the same time period. Some of their collections are quite presice and inludes the site and even the date of collecting, however others are of less precision. John Firminger Duthie (1845-1922), is strongly represented in the collection of material from India. Most of his collections are from the period of 1880-1900. Therefore the number of new species described by him is lower than the previous collectors for obvious reasons. His collections are very precise and includes both area and date of collection. One of the collectors from the early 1800's was Dr. Bernhard Schmid (1787-1857). Working as a missionary he mainly collected. But his material has been used by Jonathan Carl Zenker. The material is almost exclusively collected in the Nilagiri area. More recently, Robert L. Fleming collected many specimens in the Dehra Dun area about 1950, mainly pteridophytes. Ove Arbo Høeg a professor from the University of Oslo did some collections in 1951-1952. As these collectors are more recent, the labels are more detailed than the predecessors. In addition to these collectors which count for most of the material, several others have contributed. Mainly serveral persons from the University of Delhi, which mainly collected around Delhi, even on the University campus. A special case is that of Rudolph Friedrich Hohenacker (1798-1874). Many collections bears his label, but these are not collected by him personally. R.F Hohenacker sold flora exsiccata's which consist of specimens collected by other persons. The only additional information on these collections are the area they were collected. These collections have to be considered of lower scientific value than others, unless the original collectors can be traced, and therby giving a timespan and an actual collector.
Final remarks: Many of the specimens from India in the herbarium of Oslo are from the 1800's and are therefore of a certain value. The possibility that some of them might be syntypes or isotypes adds additional value. And again for scientific purposes, material from India is rare outside India due to their export ban. Allthough it's not very useful for most modern DNA techniques, morphology is still the backbone of modern botany. It is also worth mentioning the "cool" factor of some of these collections. J.D. Hooker was one of Charles Darwins closest friends, which gives some perspective to what kind of material we are dealing with and help convey this era of botany.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This heart disease dataset is originally acquired from one of the multispecialty hospitals in India and was made publicly available by Kaggle Master Jocelyn Dumlao under the license CC0: Public Domain. It consists of 1000 subjects and 14 attributes related to early-stage heart disease detection, with features covering clinical, and diagnostic information.
In the original dataset, 53 entries in the serumcholestrol column were recorded as 0, which is clinically invalid. To improve data quality:
These entries were imputed using predictions from a Multilayer Perceptron (MLP) regression model.
The model used the following features for prediction:
Age
Resting Blood Pressure
Maximum Heart Rate Achieved
This approach was chosen instead of mean or median imputation to preserve the variability and relationships among features, ensuring a more realistic and accurate dataset for machine learning tasks.
Important Disclaimer:
This updated dataset is intended for educational and research purposes only. The imputed values are model-based estimations and should not be considered as actual medical records. Please exercise caution when interpreting results or deploying models trained on this data in real-world clinical settings.
S.No | Attribute | Explanation | Unit | Type of Data |
---|---|---|---|---|
1 | Patient Identification Number | Unique patient ID | Numeric Number | Numeric |
2 | Age | Age of the patient | In Years | Numeric |
3 | Gender | Gender of the patient | 0 (female) / 1 (male) | Binary |
4 | Resting Blood Pressure | Blood pressure when patient is at rest | 94-200 (in mm Hg) | Numeric |
5 | Serum Cholesterol | Serum cholesterol level | 126-564 (in mg/dl) | Numeric |
6 | Fasting Blood Sugar | Whether fasting blood sugar > 120 mg/dl | 0 (false) / 1 (true) | Binary |
7 | Chest Pain Type | Type of chest pain experienced | 0 (typical angina), 1 (atypical angina), 2 (non-anginal pain), 3 (asymptomatic) | Nominal |
8 | Resting Electrocardiogram | Resting ECG results | 0 (normal), 1 (ST-T wave abnormality), 2 (left ventricular hypertrophy) | Nominal |
9 | Maximum Heart Rate Achieved | Peak heart rate achieved during exercise | 71-202 | Numeric |
10 | Exercise Induced Angina | Angina induced by exercise | 0 (no) / 1 (yes) | Binary |
11 | Oldpeak (ST Depression) | ST depression induced by exercise relative to rest | 0-6.2 | Numeric |
12 | Slope of Peak Exercise ST | Slope of the peak exercise ST segment | 1 (upsloping), 2 (flat), 3 (downsloping) | Nominal |
13 | Number of Major Vessels | Number of major vessels (0-3) colored by fluoroscopy | 0, 1, 2, 3 | Numeric |
14 | Classification (Target) | Diagnosis of heart disease | 0 (absence), 1 (presence) | Binary |
Original dataset shared by Jocelyn Dumlao.
Data collected from Lincoln University College, attributed to Bhanu Prakash Doppala and Debnath Bhattacharyya.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Indian Premier League (IPL) is a professional Twenty20 cricket league in India usually contested between March and May of every year by eight teams representing eight different cities or states in India. The league was founded by the Board of Control for Cricket in India (BCCI) in 2007. The IPL has an exclusive window in ICC Future Tours Programme.
The IPL is the most-attended cricket league in the world and in 2014 was ranked sixth by average attendance among all sports leagues. In 2010, the IPL became the first sporting event in the world to be broadcast live on YouTube. The brand value of the IPL in 2019 was ₹475 billion (US$6.7 billion), according to Duff & Phelps. According to BCCI, the 2015 IPL season contributed ₹11.5 billion (US$160 million) to the GDP of the Indian economy.
The dataset consist of data about IPL matches played from the year 2008 to 2019. IPL is a professional Twenty20 cricket league founded by the Board of Control for Cricket in India (BCCI) in 2008. The league has 8 teams representing 8 different Indian cities or states. It enjoys tremendous popularity and the brand value of the IPL in 2019 was estimated to be ₹475 billion (US$6.7 billion). So let’s analyze IPL through stats.
The dataset has 18 columns. Let’s get acquainted with the columns. - id: The IPL match id. - season: The IPL season - city: The city where the IPL match was held. - date: The date on which the match was held. - team1: One of the teams of the IPL match - team2: The other team of the IPL match - toss_winner: The team that won the toss - toss_decision: The decision taken by the team that won the toss to ‘bat’ or ‘field’ - result: The result(‘normal’, ‘tie’, ‘no result’) of the match. - dl_applied: (1 or 0)indicates whether the Duckworth-Lewis rule was applied or not. - winner: The winner of the match. - win_by_runs: Provides the runs by which the team batting first won - win_by_runs: Provides the number of wickets by which the team batting second won. - player_of_match: The outstanding player of the match. - venue: The venue where the match was hosted. - umpire1: One of the two on-field umpires who officiate the match. - umpire2: One of the two on-field umpires who officiate the match. - umpire3: The off-field umpire who officiates the match
Literacy in India has been increasing as more and more people receive a better education, but it is still far from all-encompassing. In 2022, the degree of literacy in India was about 76.32 percent, with the majority of literate Indians being men. It is estimated that the global literacy rate for people aged 15 and above is about 86 percent. How to read a literacy rateIn order to identify potential for intellectual and educational progress, the literacy rate of a country covers the level of education and skills acquired by a country’s inhabitants. Literacy is an important indicator of a country’s economic progress and the standard of living – it shows how many people have access to education. However, the standards to measure literacy cannot be universally applied. Measures to identify and define illiterate and literate inhabitants vary from country to country: In some, illiteracy is equated with no schooling at all, for example. Writings on the wallGlobally speaking, more men are able to read and write than women, and this disparity is also reflected in the literacy rate in India – with scarcity of schools and education in rural areas being one factor, and poverty another. Especially in rural areas, women and girls are often not given proper access to formal education, and even if they are, many drop out. Today, India is already being surpassed in this area by other emerging economies, like Brazil, China, and even by most other countries in the Asia-Pacific region. To catch up, India now has to offer more educational programs to its rural population, not only on how to read and write, but also on traditional gender roles and rights.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A benchmark dataset is always required for any machine learning based classification or recognition system. To the best of our knowledge, no benchmark dataset exists for handwritten character recognition of Manipuri Meetei-Mayek script in public domain so far. In this work, we introduce a handwritten Manipuri Meetei-Mayek character dataset which consists of more than 5000 data samples which were collected from a diverse population group among three different districts of Manipur, India (Imphal East District, Thoubal District and Kamjong District) during March and April 2019. Each individual was asked to write all the Manipuri characters on one A4-size paper. The recorded responses are scanned with the help of a scanner and then each character is manually segmented from the scanned image.
The whole dataset is divided into five categories: 1. Mapi Mayek 2. Lonsum Mayek 3. Cheitap Mayek 4. Cheising Mayek 5. Khutam Mayek.
This dataset consists of scanned images of handwritten Manipuri Meetei-Mayek characters in .JPG format as well as in .MAT format.
With the newly advanced SUV's around the corner like Volkwagen Taigun and Skoda Kushaq, along with tech savvy AI & ML based electric cars like Tata Harrier, people in India are willing to upgrade to new cars these days. But the question in most price-sensitive Indian mind is at what price should I sell my current car? Even though visiting various car dealers and online websites like True Value and https://www.cartrade.com/, people still can't figure out the price range within which they should sell their current car. So, I tried to web crawl, web scrape and extract the data of the used cars from the top 14 cities of India viz. - New Delhi, Gurgaon, Chandigarh, Noida, Mumbai, Thane, Navi Mumbai, Bangalore, Pune, Chennai, Kolkata, Hyderabad, Ahmedabad, Lucknow.
One of my office colleagues wanted to buy a new car this season, because he was done with using his Tata Indica Vista. And after dealing with different car dealers and agents, he was browsing through https://www.cartrade.com/, but was still confused within which price range he should sell his car. This made me to extract the data of the used cars from top 14 Indian cities viz. - New Delhi, Gurgaon, Chandigarh, Noida, Mumbai, Thane, Navi Mumbai, Bangalore, Pune, Chennai, Kolkata, Hyderabad, Ahmedabad, Lucknow.
In the financial year of 2025, two-wheelers led the electric vehicle market in India, with sales reaching approximately *****million units. This represented a substantial growth compared to the sales figures from the previous year. Meanwhile, the sales of four-wheeler electric vehicles stood at around ********for the same period. Notably, all sectors of the electric vehicle market witnessed significant growth that year. Electric vehicle market The market projection for electric vehicles (EVs) in India is promising, with a significant increase in sales volume expected in the coming years. In the passenger vehicle segment, Tata Motors has carved out a substantial market share. Meanwhile, Ola Electric is leading the charge in the two-wheeler EV segment. However, the share of EV sales in total vehicle sales in India, while growing, is still relatively small. This points to a gradual yet steady shift towards EV adoption across various vehicle segments in the country. Challenges in EV adoption With the growing environmental concerns, the acceptance of EVs is on the rise in India, with the willingness to invest in sustainable modes of transportation. However, this transition is not without challenges. Concerns about the availability of public charging infrastructure and the safety of battery technology are prevalent. Addressing these issues will be crucial in accelerating the adoption of EVs in the country.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes specimens originating from India in the collections at the Natural History Museum at the University of Oslo (NHM-UiO).
Animals: The mammal collection includes 108 specimens (mounted animals, skulls or skins, sometimes from the same individuals) from ‘India’. Insofar these are dated at all, they originate from the 19th century. No further collecting information is preserved. These data are already available from the GBIF portal (and not included in this dataset). Oslo has 1027 bird specimens from India, both skins and mounted and demounted specimens. These are either not dated or originate from the 19th century or the first half of the 20th century. Locality is sometimes recorded at the region or district level, with relatively many specimens from Darjeeling. More detailed collecting data are missing. Most have been collected by Englishmen, some of whom have had an important role in Indian ornithology. These skins may therefore be of particular historical value. Notable are 295 skins labeled as being collected by ‘Blyth’. This name most likely refers to the English zoologist Edward Blyth (1810 –1873), who was one of the founders of zoology in India (cf. Wikipedia lemma Edward Blyth). Another known name is Henry Seebohm (1832-1895), to whom twelve skins are attributed (misspelled in one case as Subohm). The bird data are not yet published in GBIF. The fish collection contains 34 databased specimens. The Staphylinidae beetle collection includes 508 specimens from India which are not yet identified to species level. The Hymenoptera collection includes 130 pinned specimens originating from the collection of Charles Thomas Bingham (1848-1908). These have been collected in Sikkim. In addition there are 7 Hymenoptera and 1 Orthoptera originating from the Deinboll collection, all labelled Trankebar. Some of these may represent types of taxa described by J.C. Fabricius (1745-1808). These collections are not yet digitised. There are virtually no Lepidoptera or Diptera from India in Oslo. Finally, the museum holds circa 10 crustacean specimens and 3 molluscs.
Plants: There is a small digitised collection of 89 vascular plants from Himachal Pradesh and Maharashtra provinces. These were deposited by the Indian student B. Natarajan who studied in Oslo in the 1990s. In addition, the older vascular plant type collection in Oslo has been digitised. This includes 12 older type specimens from India. Most of the herbarium has not been digitised, however. It may contain between 5 000 and 10 000 specimens from India. These are currently difficult to locate as the herbarium is organised in taxonomic rather than geographic units. The museum intends to digitise the herbarium at a level that would enable the retrieval of taxa per continent or even per country. This enterprise is still in the planning phase, however. Likewise the bryophyte and algae collections might contain material from India, but this can only be retrieved after digitisation. Some of these records are published to GBIF as a separate dataset. Oslo probably holds no Indian fungi. The digitisation of the Oslo lichen herbarium is ongoing. Currently 34 specimens from India are visible in the GBIF portal (and not included i this dataset). This number may increase to circa 100 once the entire lichen herbarium is digitised. Most of these have been collected after 1950 and have rather complete collecting data. The botanical garden in Oslo has 6 living plants originating from India.