86 datasets found
  1. Amazon India products dataset in CSV format

    • crawlfeeds.com
    csv, zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    India
    Description

    Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

    Dataset Features:

    • Product Details: Name, Brand, Category, and Unique ID

    • Pricing Information: Current Price, Discounted Price, and Currency

    • Availability & Ratings: Stock Status, Customer Ratings, and Reviews

    • Seller Information: Seller Name and Fulfillment Details

    • Additional Attributes: Product Description, Specifications, and Images

    Dataset Specifications:

    • Format: CSV

    • Number of Records: 50,000+

    • Delivery Time: 3 Days

    • Price: $149.00

    • Availability: Immediate

    This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.

  2. FDI Dataset for India(Month-wise) from 2014-2024

    • kaggle.com
    Updated Dec 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golden Ave (2024). FDI Dataset for India(Month-wise) from 2014-2024 [Dataset]. https://www.kaggle.com/datasets/atharvarayar/fdi-dataset-for-indiamonth-wise-from-2014-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Golden Ave
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    India
    Description

    The dataset was created from the data released from the Department of Promotion of Industry and Internal Trade India. The data available in the website is in pdf format, I scraped the data and then converted it to csv format. The main motivation behind creating the dataset was I couldn't find the latest data for month-wise FDI in India. I checked the government websites but some of them that do have data, have yearly data and not for each month, to track quarterly performance, month-wise data is needed so I decided to scrape the data from the pdfs available, I also came across some dataset in Kaggle but they were till 2021 and some were not in csv. I am also working on sector-wise data(now available in version 2)and state-wise data which will be also soon be available. I will also try to update the data quarterly, and your support will go a long way to motivate me to keep updating. Cheers!

  3. p

    AI-Driven Mental Health Literacy - An Interventional Study from India (Data...

    • psycharchives.org
    Updated Oct 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). AI-Driven Mental Health Literacy - An Interventional Study from India (Data from main study).csv [Dataset]. https://psycharchives.org/handle/20.500.12034/8771
    Explore at:
    Dataset updated
    Oct 2, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The dataset is from an Indian study which made use of ChatGPT- a natural language processing model by OpenAI to design a mental health literacy intervention for college students. Prompt engineering tactics were used to formulate prompts that acted as anchors in the conversations with the AI agent regarding mental health. An intervention lasting for 20 days was designed with sessions of 15-20 minutes on alternative days. Fifty-one students completed pre-test and post-test measures of mental health literacy, mental help-seeking attitude, stigma, mental health self-efficacy, positive and negative experiences, and flourishing in the main study, which were then analyzed using paired t-tests. The results suggest that the intervention is effective among college students as statistically significant changes were noted in mental health literacy and mental health self-efficacy scores. The study affirms the practicality, acceptance, and initial indications of AI-driven methods in advancing mental health literacy and suggests the promising prospects of innovative platforms such as ChatGPT within the field of applied positive psychology.: Data used in analysis for the intervention study

  4. Crop Yield Data India

    • kaggle.com
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahid Hussain (2024). Crop Yield Data India [Dataset]. https://www.kaggle.com/datasets/saincoder404/crop-yield-data-india
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shahid Hussain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    India
    Description

    This dataset contains detailed information on crop yields across various states in India for the year 1997. It includes data on different crops, their production, area under cultivation, season of cultivation, and state-specific information. Additionally, the dataset provides supplementary details such as annual rainfall, fertilizer use, pesticide use, and yield for each crop. This comprehensive dataset can be used for agricultural analysis, trend prediction, and studying the impact of various factors on crop yields in India.

  5. B

    Residential School Locations Dataset (CSV Format)

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.

  6. Data from: Distribution models predict climate-related range alteration or...

    • zenodo.org
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A.P. Madhavan; A.P. Madhavan; Kshama Bhat; Kshama Bhat; Srinivasan Kasinathan; Srinivasan Kasinathan; Divya Mudappa; Divya Mudappa; Navendu Page; Navendu Page; T. R. Shankar Raman; T. R. Shankar Raman (2024). Data from: Distribution models predict climate-related range alteration or extinction of eleven threatened tropical rainforest trees in the Western Ghats [Dataset]. http://doi.org/10.5281/zenodo.10888938
    Explore at:
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    A.P. Madhavan; A.P. Madhavan; Kshama Bhat; Kshama Bhat; Srinivasan Kasinathan; Srinivasan Kasinathan; Divya Mudappa; Divya Mudappa; Navendu Page; Navendu Page; T. R. Shankar Raman; T. R. Shankar Raman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 28, 2024
    Area covered
    Western Ghats
    Description

    This dataset contains information related to species occurence data and species distribution modeling (SDM) analysisr of eleven threatened tree species. Occurrences are compiled from extensive field surveys in the Anamalai Hills along with data from the Global Biodiversity Information Facility (GBIF.org) and earlier work done within the southern Western Ghats, India.

    References:
    Page, N. V., & Shanker, K. (2020). Climatic stability drives latitudinal trends in range size and richness of woody plants in the Western Ghats, India. PLOS ONE, 15(7), e0235733. https://doi.org/10.1371/journal.pone.0235733

    GBIF.org (2022) GBIF Occurrence Download, 2 August 2022. DOI:10.15468/dl.gnvuxj


    AUTHOR #1
    1. Name: A.P. Madhavan
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Email address: madhavan@ncf-india.org
    4. ORCID: https://orcid.org/0009-0009-2754-8256

    AUTHOR #2
    1. Name: Kshama Bhat
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Email address: kshama@ncf-india.org
    4. ORCID: ORCID: https://orcid.org/0000-0002-6190-2687

    AUTHOR #3
    1. Name: Srinivasan Kasinathan
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Email address: srini@ncf-india.org
    4. ORCID: https://orcid.org/0000-0001-7323-6653

    AUTHOR #4
    1. Name: Divya Mudappa
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Email address: divya@ncf-india.org
    4. ORCID: https://orcid.org/0000-0001-9708-4826

    AUTHOR #5
    1. Name: Navendu Page
    2. Work Address: Wildlife Institute of India, Post Box No. 18, Chandrabani, Dehradun, Uttarakhand 248001, India
    3. Email address: navendu.page@gmail.com
    4. ORCID: ORCID: https://orcid.org/0000-0002-9413-7571

    AUTHOR #6
    1. Name: T. R. Shankar Raman
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Email address: trsr@ncf-india.org
    4. ORCID: https://orcid.org/0000-0002-1347-3953

    Keywords: tropical rainforest, climate change, tree distributions, species distribution models, range shifts, Western Ghats


    Geographic Coverage:
    1. Location/Study Area: Southern Western Ghats Montane Rain Forests, Southern Western Ghats Moist Deciduous Forests, India
    2. GPS coordinates: SWG (73.95° – 80.33° E, 8.06° – 13.11°N)

    Temporal coverage
    Starts: 2020-08-01
    Ends: 2024-03-28

    Besides this README.txt file, the dataset includes three comma-delimited text files (csv); two R scripts, and 1 kml file of surveyed trails.

    CSV files with the data in columns as explained below:

    1) Focal_Tree_Dat.csv

    Comp: Number identifier
    FT_ID: Unique tree no for each individual
    Focal_tree: Scientific name of species
    Date: Date of occurrence observation
    Place: Area/locality description
    Trail: Unique trail ID
    Waypoint: Waypoint number
    Time: Time in hh:mm format
    Location: Specific description of occurrence locality
    Latitude: Latitude in decimal degrees N
    Longitude: Longitude in decimal degrees E
    Elevation: Elevation in metres
    Slope: Cateory of slope
    ID_Notes: Notes on identification
    Phenophase: Phenophase expression at the time of observation
    GBH: Girth at breast height in centimetres (comma separated list of numbers in case of multi-stemmed trees)
    Tree_ht: Tree height in metres
    Canopy_ht: Maximimum height of the surrounding canopy in metres
    Substrate: Soil substrate composition
    Invasives: Name of invasive species (if present)
    Stature: Vegetation strata position
    Relatively: Stature of focal individual relative to other surrounding individuals
    Deadwood: Description of deadwood on the tree
    Damage: Description of damage on the bole
    Shape: Description of tree canopy shape
    Closure: Canopy closure at focal tree
    Seedlings: Number of conspecific seedlings present in 5 m radius of focal tree
    Saplings: Number of conspecific saplings present in 5 m radius of focal tree
    Trees: Number of conspecific trees present in 5 m radius of focal tree
    Remarks: Remarks

    2) Ffspecies.csv

    Source: Source of occurrence
    ID: State/location of occurrence
    Region: Biogeographic region of occurrence
    decimalLatitude: Latitude in decimal degrees N
    decimalLongitude: Longitude in decimal degrees E
    species: Scientific name of species

    4) ft_surveys.csv

    Date: Date of survey of sample trail
    Prot_type: Category indicating whether protected area or fragment
    Place: Area/locality description
    Route_description: Specific landmark description of trail
    Trail: Unique trail ID
    Trail_distance: Tracked distance of trail in km
    Corrected_trail_distance: Corrected distance of trail in km
    Track_filename_kml: File name of gps track
    Sample_collected: Name of species if sample collected
    Observers: Name of observers
    Remarks: Remarks

    ANALYSES SCRIPTS
    flexsdm_script.R
    Script containing the analysis of all maxent distribution modeling and associated analysis

    Franklinia_density.Rmd
    Script of density and abundance related analysis

  7. 🦈 Shark Tank India dataset 🇮🇳

    • kaggle.com
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Satya Thirumani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Shark Tank India Data set.

    Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

    All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

    Here is the data dictionary for (Indian) Shark Tank season's dataset.

    • Season Number - Season number
    • Startup Name - Company name or product name
    • Episode Number - Episode number within the season
    • Pitch Number - Overall pitch number
    • Season Start - Season first aired date
    • Season End - Season last aired date
    • Original Air Date - Episode original/first aired date, on OTT/TV
    • Episode Title - Episode title in SonyLiv
    • Anchor - Name of the episode presenter/host
    • Industry - Industry name or type
    • Business Description - Business Description
    • Company Website - Company Website URL
    • Started in - Year in which startup was started/incorporated
    • Number of Presenters - Number of presenters
    • Male Presenters - Number of male presenters
    • Female Presenters - Number of female presenters
    • Transgender Presenters - Number of transgender/LGBTQ presenters
    • Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no
    • Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old
    • Pitchers City - Presenter's town/city or place where company head office exists
    • Pitchers State - Indian state pitcher hails from or state where company head office exists
    • Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue
    • Monthly Sales - Total monthly sales, in lakhs
    • Gross Margin - Gross margin/profit of company, in percentages
    • Net Margin - Net margin/profit of company, in percentages
    • EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization
    • Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)
    • SKUs - Stock Keeping Units or number of varieties, at the time of pitch
    • Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch
    • Bootstrapped - Startup is bootstrapped or not (yes/no)
    • Part of Match off - Competition between two similar brands, pitched at same time
    • Original Ask Amount - Original Ask Amount, in lakhs INR
    • Original Offered Equity - Original Offered Equity, in percentages
    • Valuation Requested - Valuation Requested, in lakhs INR
    • Received Offer - Received offer or not, 1-received, 0-not received
    • Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected
    • Total Deal Amount - Total Deal Amount, in lakhs INR
    • Total Deal Equity - Total Deal Equity, in percentages
    • Total Deal Debt - Total Deal debt/loan amount, in lakhs INR
    • Debt Interest - Debt interest rate, in percentages
    • Deal Valuation - Deal Valuation, in lakhs INR
    • Number of sharks in deal - Number of sharks involved in deal
    • Deal has conditions - Deal has conditions or not? (yes or no)
    • Royalty Percentage - Royalty percentage, if it's royalty deal
    • Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs
    • Advisory Shares Equity - Deal with Advisory shares or equity, in percentages
    • Namita Investment Amount - Namita Investment Amount, in lakhs INR
    • Namita Investment Equity - Namita Investment Equity, in percentages
    • Namita Debt Amount - Namita Debt Amount, in lakhs INR
    • Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR
    • Vineeta Investment Equity - Vineeta Investment Equity, in percentages
    • Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR
    • Anupam Investment Amount - Anupam Investment Amount, in lakhs INR
    • Anupam Investment Equity - Anupam Investment Equity, in percentages
    • Anupam Debt Amount - Anupam Debt Amount, in lakhs INR
    • Aman Investment Amount - Aman Investment Amount, in lakhs INR
    • Aman Investment Equity - Aman Investment Equity, in percentages
    • Aman Debt Amount - Aman Debt Amount, in lakhs INR
    • Peyush Investment Amount - Peyush Investment Amount, in lakhs INR
    • Peyush Investment Equity - Peyush Investment Equity, in percentages
    • Peyush Debt Amount - Peyush Debt Amount, in lakhs INR
    • Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR
    • Ritesh Investment Equity - Ritesh Investment Equity, in percentages
    • Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR
    • Amit Investment Amount - Amit Investment Amount, in lakhs INR
    • Amit Investment Equity - Amit Investment Equity, in percentages
    • Amit Debt Amount - Amit Debt Amount, in lakhs INR
    • Guest Investment Amount - Guest Investment Amount, in lakhs INR
    • Guest Investment Equity - Guest Investment Equity, in percentages
    • Guest Debt Amount - Guest Debt Amount, in lakhs INR
    • Invested Guest Name - Name of the guest(s) who invested in deal
    • All Guest Names - Name of all guests, who are present in episode
    • Namita Present - Whether Namita present in episode or not
    • Vineeta Present - Whether Vineeta present in episode or not
    • Anupam ...
  8. Data from: Assessing Temporal Dynamics of Nitrogen Surplus in Indian...

    • zenodo.org
    csv
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shekhar Sharan Goyal; Shekhar Sharan Goyal; Udit Bhatia; Udit Bhatia; Rohini Kumar; Rohini Kumar (2024). Assessing Temporal Dynamics of Nitrogen Surplus in Indian Agriculture: District-Scale Data from 1966 to 2017 [Dataset]. http://doi.org/10.5281/zenodo.12662782
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shekhar Sharan Goyal; Shekhar Sharan Goyal; Udit Bhatia; Udit Bhatia; Rohini Kumar; Rohini Kumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises annual long-term total agricultural cropland nitrogen (N) surplus across India, provided at a district-level spatial resolution for the period from 1966 to 2017. The dataset includes twelve N surplus estimates that incorporate uncertainties stemming from various input data sources and methodological choices in key components of the N surplus. This dataset allows for the aggregation of N surplus at any relevant spatial scale, thereby supporting the development of effective water and land management strategies.

    Data description:

    1. District-level N Surplus Data (CSV format): District-level N surplus data (CSV format): This dataset includes 12 columns of N surplus values (Kg N/ha), each representing 52 years (1966-2017) of district-level data. The N surplus is calculated using different methods and data choices. 12_nitrogen_budjet_1966_2017

    2. Aggregated N surplus at the state level (CSV format): This dataset contains 52 years (1966-2017) of N surplus data (Kg N/ha) at the state level, including the mean and standard deviation of 12 different estimates. State_N_surplus_mean_std.csv

    3. District centroid locations: This dataset contains latitude and longitude coordinates for the centroid locations of districts in India where data is available, which can be used as identifiers. We have followed district name and classification using ICRISAT 1966 identifiers. district_centroids_lat_long.csv

    4. Aggregated N Surplus (Kg/ha) at Indian River Basins: This dataset encompasses 52 years (1966-2017) of mean nitrogen surplus (Kg N/ha) data at the basin level, classified according to the basin classification provided by the Central Water Commission of India. The dataset is available in the file basin_level_mean_N_surplus_kg_ha_1966_2017.csv

    Additionally, a CSV file named river_basin_IDs.csv is provided, which contains the river basin IDs along with their names as assigned by the Central Water Commission (CWC) of India.

    5. Aggregated N Surplus (Kg/ha) at Sub-Basin level: This dataset includes 52 years (1966-2017) of mean nitrogen surplus data (Kg N/ha) at the sub-basin level, based on the level 5 HydroSHEDS basin classification. The dataset is available in the file Sub_basin_mean_N_surplus_kg_ha_1966_2017.csv

    Unit: kg/ha/yr (ha = Net cropping area )

    Time period: 1966-2017

    Further information:

    Details/Citation: Assessing Temporal Dynamics of Nitrogen Surplus in Indian Agriculture: District-Scale Data from 1966 to 2017 by S.S. Goyal, U. Bhatia and R. Kumar.

    Further queries regarding these datasets can be directed to Shekhar Goyal (goyal_shekhar@iitgn.ac.in), Udit Bhatia (bhatia.u@iitgn.ac.in) and Rohini Kumar (rohini.kumar@ufz.de).

  9. Employment Of India CLeaned and Messy Data

    • kaggle.com
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SONIA SHINDE (2025). Employment Of India CLeaned and Messy Data [Dataset]. https://www.kaggle.com/datasets/soniaaaaaaaa/employment-of-india-cleaned-and-messy-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SONIA SHINDE
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    India
    Description

    This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.

    🔹 Dataset Composition:

    It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.

    Each record captures multiple attributes related to individuals in the Indian job market, including: - Age Group
    - Employment Status (Employed/Unemployed)
    - Monthly Salary (INR)
    - Education Level
    - Industry Sector
    - Years of Experience
    - Location
    - Perceived AI Risk
    - Date of Data Recording

    Transformations & Cleaning Applied:

    The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.

    Purpose & Utility:

    This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools

    It's also useful for: - Training ML models with clean inputs
    - Data storytelling with visual clarity
    - Demonstrating reproducibility in data cleaning pipelines

    By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.

  10. Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats,...

    • zenodo.org
    • explore.openaire.eu
    bin, csv, jpeg, txt
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vijay Karthick; Vijay Karthick; Vijay Kumar; Vijay Kumar; Anand Osuri; Anand Osuri (2024). Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India [Dataset]. http://doi.org/10.5281/zenodo.13340613
    Explore at:
    csv, bin, jpeg, txtAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vijay Karthick; Vijay Karthick; Vijay Kumar; Vijay Kumar; Anand Osuri; Anand Osuri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sakleshpura, India, Western Ghats
    Description

    Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India

    This dataset contains mammal occurrence records from 2022 to 2024 in the Sakleshpura region of central Western Ghats, India. It includes a few occurrence records of other chordates. Occurrence records were gathered in the field by researchers of the Nature Conservation Foundation, India, using a mobile data collection application. Suggested citation is:
    Nature Conservation Foundation (2024). Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India. Nature Conservation Foundation, India. Dataset

    Keywords: tropical rainforest, plantations, Sakleshpura, animal distribution, Western Ghats

    CONTACT #1
    1. Name: Anand M Osuri
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Work Phone: +91 821 2515601
    4. Email address: aosuri@ncf-india.org
    5. ORCID: https://orcid.org/0000-0001-9909-5633

    CONTACT #2
    1. Name: Vijay Karthick
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Work Phone: +91 821 2515601
    4. Email address: vijayk@ncf-india.org
    5. ORCID: https://orcid.org/0000-0001-6023-3955

    CONTACT #3
    1. Name: Vijay Kumar
    2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
    3. Work Phone: +91 821 2515601
    4. Email address: vijaykumar@ncf-india.org
    5. ORCID: https://orcid.org/0009-0000-4149-0083


    Geographic Coverage:
    1. Location/Study Area: Sakleshpura, Karnataka, India
    2. GPS coordinates: Kadamane Village (12.924647, 75.654650)

    Temporal Coverage:
    1. Begins: 2022-05-16 (Year, Month, Day)
    2. Ends: 2024-05-22 (Year, Month, Day)

    Besides the 000_readMe.txt file containing this information and the 14 images associated with individual observations, the dataset includes three comma-delimited text (csv) files, and one R code file as explained below:
    1) 001_mammalData.csv -- This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file

    2) 002_placeLocs.csv -- This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 or 1000m accuracy

    3) 003_nameMatch.csv -- This file matches the name as originally recorded with the correct common name and scientific name

    4) 004_GBIF_upload_code.R -- R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)

    5) 005_download_images_from_googledrive.R - R code to extract image IDs and download images from googledrive

    6) 006_kadamane_mammal_occurrence.xlsx - An excel file that contains the raw data and used in the codes above

    FILES INCLUDED IN DATASET

    001_mammaldata.csv
    This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file

    observers: Observers who made the observation
    timestamp: Automatic time stamp of date and time when app was used
    date: Date of observation
    time: Time of observation
    decimalLatitude: Latitude in decimal degrees N
    decimalLongitude: Longitude in decimal degrees E
    GPSaltitude: Altitude in metres
    GPSaccuracy: Horizontal accuracy of GPS location in metres
    place: Name of locality
    habitat: Habitat type
    taxa: mammal or reptile/amphibian
    species: Species common name
    count: Number of individuals observed
    countType: Total (solitary or fully counted groups) or Partial (incompletely counted groups)
    obsType: Type of observation: sighting, sign (droppings or vocalisation), death, roadkill, electrocution, other
    notes: Notes or remarks on observation
    imageID: Link to the google drive photo, if photo is available
    instanceID: Automatically generated unique identifier of observation

    002_placeLocs.csv
    This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy

    place: Name of locality as recorded
    lat: Assigned latitude in decimal degrees N
    long: Assigned longitude in decimal degrees E
    GPSaccuracy: Assigned as 500 or 1000m – Horizontal accuracy of GPS location in metres

    003_nameMatch.csv
    This file matches the name as originally recorded with the correct common name and scientific name.

    verbatimIdentification: Identification as originally recorded in the ‘species’ column of the mammaldata.csv file
    vernacularName: Common or english name
    scientificName: Scientific name

    004_GBIF_upload_code.R
    R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)

    005_download_images_from_googledrive.R
    R code that extracts imageIDs from the 001_mammalData.csv file and downloads them automatically to a preferred directory

    006_kadamane_mammal_occurrence.xlsx
    An excel file that contains the raw data and used in the codes above

  11. e

    Csv Pharmaceuticals India Private Limited | See Full Import/Export Data |...

    • eximpedia.app
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2024). Csv Pharmaceuticals India Private Limited | See Full Import/Export Data | Eximpedia [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Niue, Nauru, Colombia, Equatorial Guinea, Nigeria, Virgin Islands (U.S.), Andorra, Swaziland, Switzerland, Costa Rica
    Description

    Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries

  12. g

    India zip code - Download Dataset

    • geopostcodes.com
    csv
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoPostcodes (2025). India zip code - Download Dataset [Dataset]. https://www.geopostcodes.com/country/india-zip-code/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 2, 2025
    Dataset authored and provided by
    GeoPostcodes
    Area covered
    India
    Description

    Our India zip code Database offers comprehensive postal code data for spatial analysis, including postal and administrative areas. This dataset contains accurate and up-to-date information on all administrative divisions, cities, and zip codes, making it an invaluable resource for various applications such as address capture and validation, map and visualization, reporting and business intelligence (BI), master data management, logistics and supply chain management, and sales and marketing. Our location data packages are available in various formats, including CSV, optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more. Product features include fully and accurately geocoded data, multi-language support with address names in local and foreign languages, comprehensive city definitions, and the option to combine map data with UNLOCODE and IATA codes, time zones, and daylight saving times. Companies choose our location databases for their enterprise-grade service, reduction in integration time and cost by 30%, and weekly updates to ensure the highest quality.

  13. m

    Bollywood Movies data

    • data.mendeley.com
    Updated May 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Premkumar (2020). Bollywood Movies data [Dataset]. http://doi.org/10.17632/3c57btcxy9.1
    Explore at:
    Dataset updated
    May 12, 2020
    Authors
    Prashant Premkumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using a Python script to scrape data from the web, we collected data pertaining to all 1698 Hindi language movies that released in India across a 13 year period (2005-2017) from the website of Box Office India.

  14. d

    CompanyData.com (BoldData) — Indian Largest B2B Company Database — 32.5+...

    • datarade.ai
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompanyData.com (BoldData) (2025). CompanyData.com (BoldData) — Indian Largest B2B Company Database — 32.5+ Million Verified Companies [Dataset]. https://datarade.ai/data-products/list-of-17-8m-companies-in-india-bolddata
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    CompanyData.com (BoldData)
    Area covered
    India
    Description

    CompanyData.com, powered by BoldData, delivers high-quality, verified B2B company information from official trade registers around the world. Our India company database includes 32,468,995 verified business records, giving you powerful insight into one of the fastest-growing economies on the planet.

    Each company profile is rich with firmographic data, including company name, CIN (Corporate Identification Number), registration number, legal status, industry classification (NIC codes), revenue range, and employee size. Many records are enhanced with contact details such as email addresses, phone numbers, and names of key decision-makers, supporting direct outreach and smarter segmentation.

    Our India dataset is designed for a wide range of business applications — from KYC and AML compliance, due diligence, and regulatory checks, to B2B sales, lead generation, marketing campaigns, CRM enrichment, and AI model training. Whether you’re targeting local startups or large enterprises, our data helps you connect with the right businesses at the right time.

    Delivery is flexible to suit your needs. Choose from customized lists, full databases in Excel or CSV, access via our real-time API, or our intuitive self-service platform. We also offer data enrichment and cleansing services to refresh and improve your existing datasets with accurate, up-to-date company information from India.

    With access to 32,468,995 verified companies across more than 200 countries, CompanyData.com helps businesses grow confidently — in India and beyond. Rely on our precise, structured data to fuel your strategies and scale with speed and accuracy.

  15. A

    ‘India Census 2011’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘India Census 2011’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-india-census-2011-9aa2/latest
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Analysis of ‘India Census 2011’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/danofer/india-census on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    2011 India census data. Includes population/demographic data and housing data for each district.

    Content

    • india-districts-census-2011.csv - Population enumeration data with expanded columns.
    • hlpca-total.csv: Housing statistics for total (rural + urban) population by district.
    • pca-colnames.csv: Mapping of PCA column names to expanded names.

    Data is raw counts per district, not normalized percentages!

    Acknowledgements

    Gathered from 2 sources: https://github.com/pigshell/india-census-2011 https://github.com/nishusharma1608/India-Census-2011-Analysis

    Original census data released (and owned by) the Registrar General and Census Commissioner of India under the Ministry of Home Affairs, Government of India.

    Inspiration

    • Where are differences between urbal and rural areas greatest? Between casts? Between men and women?
    • Use data with other datasets in India!

    --- Original source retains full ownership of the source dataset ---

  16. Crimes In India

    • kaggle.com
    Updated Jul 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2022). Crimes In India [Dataset]. https://www.kaggle.com/datasets/saurabhshahane/crimes-in-india
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saurabh Shahane
    Area covered
    India
    Description

    This dataset contains number of crimes filed under each category of the Indian Penal Code (IPC), number of victims of those crimes, and average crime rate. The data is presented separately by IPC category and sub-category. Data are available at the state/UT level for 2018.

    ● 7060_source_data.csv: The raw data from the source with original administrative dimensions. This dataset may have already been restructured by scraping PDFs, combining files, or pivoting tables to fit the proper tabular format used by NDAP, but the actual data values remain unchanged. ● NDAP_REPORT_7060.csv: The final standardised data using LGD geographic dimensions as seen on NPAP. ● 7060_metadata.csv: Variable-level metadata, including the following fields: ❖ VariableName: The full variable name as it appears in the data ❖ VariableCode: A unique variable code that is used as a short name for the variable during internal processing and can be used for simplicity if desired ❖ Type_Of_Variable: The classification of the column, whether it is a dimension or a variable (i.e. indicator) ❖ Unit_Of_Measure: ❖ Aggregation_Type: The default aggregation function to be used when aggregating each variable ❖ Weighing_Variable_Name: The weight assigned to each variable that is used by default when aggregating ❖ Weighing_Variable_ID: The weighting variable id corresponding to the weighing variable name ❖ Long_Description: A more descriptive definition of the variable ❖ Scaling_factor: Scaling factor from source ● 7060_KEYS.csv: The key which maps source administrative units to the standardised Local Government Directory (LGD) dimensions. This file also contains pre-calculated weights for every constituent unit mapped from the source dimensions into the LGD. You can interpret each row as describing what fraction of the source unit is mapped to a corresponding LGD unit. This file includes the following fields: ❖ src[Unit]Name: The administrative unit name as it appears in the source data. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit name as it appears in the LGD. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit code corresponding to the unit name in the LDG. ❖ Year: The year in which the data was collected or reported. Depending on the dataset, any other temporal variables may also be present (Quarter, Month, Calendar Day, etc.) ❖ Number_Of_Children: The number of LGD units associated with the mapping described by an individual row. Units from the source that have undergone a split will contain multiple children. ❖ Number_Of_Parents: The number of source units associated with the mapping described by an individual row. Units from the source that have undergone a merge will contain multiple parents. ❖ Weighing_Variables: Households, Population, Male Population, Female Population, Land Area (Total, Rural, and Urban versions of each). For each weighing variable there are the following associated fields: ■ Count: the total count of households, population, or land area mapped from the source unit to the LGD unit for that particular row (NumberOfHouseholds, TotalPopulation, LandArea). ■ Mapping_Error: the percentage error due to missing villages in the base data, meaning what fraction of the weighing variable is dropped because the microdata could not be mapped to the LGD. ■ Weighing_Ratio: the weighing ratio for that constituent match of source unit to LGD unit for each particular row. This is the fraction applied to the source data to achieve the LGD-standardised final data

  17. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  18. m

    Cardiovascular_Disease_Dataset

    • data.mendeley.com
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanu Prakash Doppala (2021). Cardiovascular_Disease_Dataset [Dataset]. http://doi.org/10.17632/dzz48mvjht.1
    Explore at:
    Dataset updated
    Apr 16, 2021
    Authors
    Bhanu Prakash Doppala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.

  19. e

    India - Wind Speed and Wind Power Potential Maps

    • energydata.info
    • cloud.csiss.gmu.edu
    • +1more
    Updated Jun 8, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). India - Wind Speed and Wind Power Potential Maps [Dataset]. https://energydata.info/dataset/india-wind-speed-and-wind-power-potential-maps
    Explore at:
    Dataset updated
    Jun 8, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Maps with wind speed, wind rose and wind power density potential in India. The GIS data stems from the Global Wind Atlas (http://globalwindatlas.info/). GIS data is available as JSON and CSV. The second link provides poster size (.pdf) and midsize maps (.png).

  20. B

    Residential Schools Locations Dataset (Shapefile format)

    • borealisdata.ca
    • dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential Schools Locations Dataset (Shapefile format) [Dataset]. http://doi.org/10.5683/SP2/FJG5TG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential Schools Locations Dataset in shapefile format contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this data set, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The data set was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this data set,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School. When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites. The geographic coordinate system for this dataset is WGS 1984. The data in shapefile format [IRS_locations.zip] can be viewed and mapped in a Geographic Information System software. Detailed metadata in xml format is available as part of the data in shapefile format. In addition, the field name descriptions (IRS_locfields.csv) and the detailed locations descriptions (IRS_locdescription.csv) should be used alongside the data in shapefile format.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
Organization logo

Amazon India products dataset in CSV format

Amazon India products dataset in CSV format from amazon.in

Explore at:
csv, zipAvailable download formats
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Crawl Feeds
License

https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

Area covered
India
Description

Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

Dataset Features:

  • Product Details: Name, Brand, Category, and Unique ID

  • Pricing Information: Current Price, Discounted Price, and Currency

  • Availability & Ratings: Stock Status, Customer Ratings, and Reviews

  • Seller Information: Seller Name and Fulfillment Details

  • Additional Attributes: Product Description, Specifications, and Images

Dataset Specifications:

  • Format: CSV

  • Number of Records: 50,000+

  • Delivery Time: 3 Days

  • Price: $149.00

  • Availability: Immediate

This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.

Search
Clear search
Close search
Google apps
Main menu