https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.
Product Details: Name, Brand, Category, and Unique ID
Pricing Information: Current Price, Discounted Price, and Currency
Availability & Ratings: Stock Status, Customer Ratings, and Reviews
Seller Information: Seller Name and Fulfillment Details
Additional Attributes: Product Description, Specifications, and Images
Format: CSV
Number of Records: 50,000+
Delivery Time: 3 Days
Price: $149.00
Availability: Immediate
This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset was created from the data released from the Department of Promotion of Industry and Internal Trade India. The data available in the website is in pdf format, I scraped the data and then converted it to csv format. The main motivation behind creating the dataset was I couldn't find the latest data for month-wise FDI in India. I checked the government websites but some of them that do have data, have yearly data and not for each month, to track quarterly performance, month-wise data is needed so I decided to scrape the data from the pdfs available, I also came across some dataset in Kaggle but they were till 2021 and some were not in csv. I am also working on sector-wise data(now available in version 2)and state-wise data which will be also soon be available. I will also try to update the data quarterly, and your support will go a long way to motivate me to keep updating. Cheers!
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset is from an Indian study which made use of ChatGPT- a natural language processing model by OpenAI to design a mental health literacy intervention for college students. Prompt engineering tactics were used to formulate prompts that acted as anchors in the conversations with the AI agent regarding mental health. An intervention lasting for 20 days was designed with sessions of 15-20 minutes on alternative days. Fifty-one students completed pre-test and post-test measures of mental health literacy, mental help-seeking attitude, stigma, mental health self-efficacy, positive and negative experiences, and flourishing in the main study, which were then analyzed using paired t-tests. The results suggest that the intervention is effective among college students as statistically significant changes were noted in mental health literacy and mental health self-efficacy scores. The study affirms the practicality, acceptance, and initial indications of AI-driven methods in advancing mental health literacy and suggests the promising prospects of innovative platforms such as ChatGPT within the field of applied positive psychology.: Data used in analysis for the intervention study
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed information on crop yields across various states in India for the year 1997. It includes data on different crops, their production, area under cultivation, season of cultivation, and state-specific information. Additionally, the dataset provides supplementary details such as annual rainfall, fertilizer use, pesticide use, and yield for each crop. This comprehensive dataset can be used for agricultural analysis, trend prediction, and studying the impact of various factors on crop yields in India.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information related to species occurence data and species distribution modeling (SDM) analysisr of eleven threatened tree species. Occurrences are compiled from extensive field surveys in the Anamalai Hills along with data from the Global Biodiversity Information Facility (GBIF.org) and earlier work done within the southern Western Ghats, India.
References:
Page, N. V., & Shanker, K. (2020). Climatic stability drives latitudinal trends in range size and richness of woody plants in the Western Ghats, India. PLOS ONE, 15(7), e0235733. https://doi.org/10.1371/journal.pone.0235733
GBIF.org (2022) GBIF Occurrence Download, 2 August 2022. DOI:10.15468/dl.gnvuxj
AUTHOR #1
1. Name: A.P. Madhavan
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Email address: madhavan@ncf-india.org
4. ORCID: https://orcid.org/0009-0009-2754-8256
AUTHOR #2
1. Name: Kshama Bhat
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Email address: kshama@ncf-india.org
4. ORCID: ORCID: https://orcid.org/0000-0002-6190-2687
AUTHOR #3
1. Name: Srinivasan Kasinathan
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Email address: srini@ncf-india.org
4. ORCID: https://orcid.org/0000-0001-7323-6653
AUTHOR #4
1. Name: Divya Mudappa
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Email address: divya@ncf-india.org
4. ORCID: https://orcid.org/0000-0001-9708-4826
AUTHOR #5
1. Name: Navendu Page
2. Work Address: Wildlife Institute of India, Post Box No. 18, Chandrabani, Dehradun, Uttarakhand 248001, India
3. Email address: navendu.page@gmail.com
4. ORCID: ORCID: https://orcid.org/0000-0002-9413-7571
AUTHOR #6
1. Name: T. R. Shankar Raman
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Email address: trsr@ncf-india.org
4. ORCID: https://orcid.org/0000-0002-1347-3953
Keywords: tropical rainforest, climate change, tree distributions, species distribution models, range shifts, Western Ghats
Geographic Coverage:
1. Location/Study Area: Southern Western Ghats Montane Rain Forests, Southern Western Ghats Moist Deciduous Forests, India
2. GPS coordinates: SWG (73.95° – 80.33° E, 8.06° – 13.11°N)
Temporal coverage
Starts: 2020-08-01
Ends: 2024-03-28
Besides this README.txt file, the dataset includes three comma-delimited text files (csv); two R scripts, and 1 kml file of surveyed trails.
CSV files with the data in columns as explained below:
1) Focal_Tree_Dat.csv
Comp: Number identifier
FT_ID: Unique tree no for each individual
Focal_tree: Scientific name of species
Date: Date of occurrence observation
Place: Area/locality description
Trail: Unique trail ID
Waypoint: Waypoint number
Time: Time in hh:mm format
Location: Specific description of occurrence locality
Latitude: Latitude in decimal degrees N
Longitude: Longitude in decimal degrees E
Elevation: Elevation in metres
Slope: Cateory of slope
ID_Notes: Notes on identification
Phenophase: Phenophase expression at the time of observation
GBH: Girth at breast height in centimetres (comma separated list of numbers in case of multi-stemmed trees)
Tree_ht: Tree height in metres
Canopy_ht: Maximimum height of the surrounding canopy in metres
Substrate: Soil substrate composition
Invasives: Name of invasive species (if present)
Stature: Vegetation strata position
Relatively: Stature of focal individual relative to other surrounding individuals
Deadwood: Description of deadwood on the tree
Damage: Description of damage on the bole
Shape: Description of tree canopy shape
Closure: Canopy closure at focal tree
Seedlings: Number of conspecific seedlings present in 5 m radius of focal tree
Saplings: Number of conspecific saplings present in 5 m radius of focal tree
Trees: Number of conspecific trees present in 5 m radius of focal tree
Remarks: Remarks
2) Ffspecies.csv
Source: Source of occurrence
ID: State/location of occurrence
Region: Biogeographic region of occurrence
decimalLatitude: Latitude in decimal degrees N
decimalLongitude: Longitude in decimal degrees E
species: Scientific name of species
4) ft_surveys.csv
Date: Date of survey of sample trail
Prot_type: Category indicating whether protected area or fragment
Place: Area/locality description
Route_description: Specific landmark description of trail
Trail: Unique trail ID
Trail_distance: Tracked distance of trail in km
Corrected_trail_distance: Corrected distance of trail in km
Track_filename_kml: File name of gps track
Sample_collected: Name of species if sample collected
Observers: Name of observers
Remarks: Remarks
ANALYSES SCRIPTS
flexsdm_script.R
Script containing the analysis of all maxent distribution modeling and associated analysis
Franklinia_density.Rmd
Script of density and abundance related analysis
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.
All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.
Here is the data dictionary for (Indian) Shark Tank season's dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises annual long-term total agricultural cropland nitrogen (N) surplus across India, provided at a district-level spatial resolution for the period from 1966 to 2017. The dataset includes twelve N surplus estimates that incorporate uncertainties stemming from various input data sources and methodological choices in key components of the N surplus. This dataset allows for the aggregation of N surplus at any relevant spatial scale, thereby supporting the development of effective water and land management strategies.
Data description:
1. District-level N Surplus Data (CSV format): District-level N surplus data (CSV format): This dataset includes 12 columns of N surplus values (Kg N/ha), each representing 52 years (1966-2017) of district-level data. The N surplus is calculated using different methods and data choices. 12_nitrogen_budjet_1966_2017
2. Aggregated N surplus at the state level (CSV format): This dataset contains 52 years (1966-2017) of N surplus data (Kg N/ha) at the state level, including the mean and standard deviation of 12 different estimates. State_N_surplus_mean_std.csv
3. District centroid locations: This dataset contains latitude and longitude coordinates for the centroid locations of districts in India where data is available, which can be used as identifiers. We have followed district name and classification using ICRISAT 1966 identifiers. district_centroids_lat_long.csv
4. Aggregated N Surplus (Kg/ha) at Indian River Basins: This dataset encompasses 52 years (1966-2017) of mean nitrogen surplus (Kg N/ha) data at the basin level, classified according to the basin classification provided by the Central Water Commission of India. The dataset is available in the file basin_level_mean_N_surplus_kg_ha_1966_2017.csv
Additionally, a CSV file named river_basin_IDs.csv
is provided, which contains the river basin IDs along with their names as assigned by the Central Water Commission (CWC) of India.
5. Aggregated N Surplus (Kg/ha) at Sub-Basin level: This dataset includes 52 years (1966-2017) of mean nitrogen surplus data (Kg N/ha) at the sub-basin level, based on the level 5 HydroSHEDS basin classification. The dataset is available in the file Sub_basin_mean_N_surplus_kg_ha_1966_2017.csv
Unit: kg/ha/yr (ha = Net cropping area )
Time period: 1966-2017
Further information:
Details/Citation: Assessing Temporal Dynamics of Nitrogen Surplus in Indian Agriculture: District-Scale Data from 1966 to 2017 by S.S. Goyal, U. Bhatia and R. Kumar.
Further queries regarding these datasets can be directed to Shekhar Goyal (goyal_shekhar@iitgn.ac.in), Udit Bhatia (bhatia.u@iitgn.ac.in) and Rohini Kumar (rohini.kumar@ufz.de).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.
It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.
Each record captures multiple attributes related to individuals in the Indian job market, including:
- Age Group
- Employment Status (Employed/Unemployed)
- Monthly Salary (INR)
- Education Level
- Industry Sector
- Years of Experience
- Location
- Perceived AI Risk
- Date of Data Recording
The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.
This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools
It's also useful for:
- Training ML models with clean inputs
- Data storytelling with visual clarity
- Demonstrating reproducibility in data cleaning pipelines
By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India
This dataset contains mammal occurrence records from 2022 to 2024 in the Sakleshpura region of central Western Ghats, India. It includes a few occurrence records of other chordates. Occurrence records were gathered in the field by researchers of the Nature Conservation Foundation, India, using a mobile data collection application. Suggested citation is:
Nature Conservation Foundation (2024). Mammal occurrence records (2022-24) from Sakleshpura, central Western Ghats, India. Nature Conservation Foundation, India. Dataset
Keywords: tropical rainforest, plantations, Sakleshpura, animal distribution, Western Ghats
CONTACT #1
1. Name: Anand M Osuri
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: aosuri@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-9909-5633
CONTACT #2
1. Name: Vijay Karthick
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: vijayk@ncf-india.org
5. ORCID: https://orcid.org/0000-0001-6023-3955
CONTACT #3
1. Name: Vijay Kumar
2. Work Address: Nature Conservation Foundation, 1311, 12th A Main, Vijayanagar 1st Stage, Mysuru 570017, Karnataka, India
3. Work Phone: +91 821 2515601
4. Email address: vijaykumar@ncf-india.org
5. ORCID: https://orcid.org/0009-0000-4149-0083
Geographic Coverage:
1. Location/Study Area: Sakleshpura, Karnataka, India
2. GPS coordinates: Kadamane Village (12.924647, 75.654650)
Temporal Coverage:
1. Begins: 2022-05-16 (Year, Month, Day)
2. Ends: 2024-05-22 (Year, Month, Day)
Besides the 000_readMe.txt file containing this information and the 14 images associated with individual observations, the dataset includes three comma-delimited text (csv) files, and one R code file as explained below:
1) 001_mammalData.csv -- This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file
2) 002_placeLocs.csv -- This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 or 1000m accuracy
3) 003_nameMatch.csv -- This file matches the name as originally recorded with the correct common name and scientific name
4) 004_GBIF_upload_code.R -- R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)
5) 005_download_images_from_googledrive.R - R code to extract image IDs and download images from googledrive
6) 006_kadamane_mammal_occurrence.xlsx - An excel file that contains the raw data and used in the codes above
FILES INCLUDED IN DATASET
001_mammaldata.csv
This file has the main mammal occurrence data with relevant and renamed columns derived from the original downloaded Excel worksheet file
observers: Observers who made the observation
timestamp: Automatic time stamp of date and time when app was used
date: Date of observation
time: Time of observation
decimalLatitude: Latitude in decimal degrees N
decimalLongitude: Longitude in decimal degrees E
GPSaltitude: Altitude in metres
GPSaccuracy: Horizontal accuracy of GPS location in metres
place: Name of locality
habitat: Habitat type
taxa: mammal or reptile/amphibian
species: Species common name
count: Number of individuals observed
countType: Total (solitary or fully counted groups) or Partial (incompletely counted groups)
obsType: Type of observation: sighting, sign (droppings or vocalisation), death, roadkill, electrocution, other
notes: Notes or remarks on observation
imageID: Link to the google drive photo, if photo is available
instanceID: Automatically generated unique identifier of observation
002_placeLocs.csv
This file lists names places for which the GPS location was unavailable from the mobile phone application, and was manually assigned to coordinates with 500 m accuracy
place: Name of locality as recorded
lat: Assigned latitude in decimal degrees N
long: Assigned longitude in decimal degrees E
GPSaccuracy: Assigned as 500 or 1000m – Horizontal accuracy of GPS location in metres
003_nameMatch.csv
This file matches the name as originally recorded with the correct common name and scientific name.
verbatimIdentification: Identification as originally recorded in the ‘species’ column of the mammaldata.csv file
vernacularName: Common or english name
scientificName: Scientific name
004_GBIF_upload_code.R
R code for processing the files to create a file for upload as an occurrence dataset on the Global Biodiversity Information Facility (GBIF.org)
005_download_images_from_googledrive.R
R code that extracts imageIDs from the 001_mammalData.csv file and downloads them automatically to a preferred directory
006_kadamane_mammal_occurrence.xlsx
An excel file that contains the raw data and used in the codes above
Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries
Our India zip code Database offers comprehensive postal code data for spatial analysis, including postal and administrative areas. This dataset contains accurate and up-to-date information on all administrative divisions, cities, and zip codes, making it an invaluable resource for various applications such as address capture and validation, map and visualization, reporting and business intelligence (BI), master data management, logistics and supply chain management, and sales and marketing. Our location data packages are available in various formats, including CSV, optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more. Product features include fully and accurately geocoded data, multi-language support with address names in local and foreign languages, comprehensive city definitions, and the option to combine map data with UNLOCODE and IATA codes, time zones, and daylight saving times. Companies choose our location databases for their enterprise-grade service, reduction in integration time and cost by 30%, and weekly updates to ensure the highest quality.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using a Python script to scrape data from the web, we collected data pertaining to all 1698 Hindi language movies that released in India across a 13 year period (2005-2017) from the website of Box Office India.
CompanyData.com, powered by BoldData, delivers high-quality, verified B2B company information from official trade registers around the world. Our India company database includes 32,468,995 verified business records, giving you powerful insight into one of the fastest-growing economies on the planet.
Each company profile is rich with firmographic data, including company name, CIN (Corporate Identification Number), registration number, legal status, industry classification (NIC codes), revenue range, and employee size. Many records are enhanced with contact details such as email addresses, phone numbers, and names of key decision-makers, supporting direct outreach and smarter segmentation.
Our India dataset is designed for a wide range of business applications — from KYC and AML compliance, due diligence, and regulatory checks, to B2B sales, lead generation, marketing campaigns, CRM enrichment, and AI model training. Whether you’re targeting local startups or large enterprises, our data helps you connect with the right businesses at the right time.
Delivery is flexible to suit your needs. Choose from customized lists, full databases in Excel or CSV, access via our real-time API, or our intuitive self-service platform. We also offer data enrichment and cleansing services to refresh and improve your existing datasets with accurate, up-to-date company information from India.
With access to 32,468,995 verified companies across more than 200 countries, CompanyData.com helps businesses grow confidently — in India and beyond. Rely on our precise, structured data to fuel your strategies and scale with speed and accuracy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘India Census 2011’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/danofer/india-census on 13 February 2022.
--- Dataset description provided by original source is as follows ---
2011 India census data. Includes population/demographic data and housing data for each district.
Data is raw counts per district, not normalized percentages!
Gathered from 2 sources: https://github.com/pigshell/india-census-2011 https://github.com/nishusharma1608/India-Census-2011-Analysis
Original census data released (and owned by) the Registrar General and Census Commissioner of India under the Ministry of Home Affairs, Government of India.
--- Original source retains full ownership of the source dataset ---
This dataset contains number of crimes filed under each category of the Indian Penal Code (IPC), number of victims of those crimes, and average crime rate. The data is presented separately by IPC category and sub-category. Data are available at the state/UT level for 2018.
● 7060_source_data.csv: The raw data from the source with original administrative dimensions. This dataset may have already been restructured by scraping PDFs, combining files, or pivoting tables to fit the proper tabular format used by NDAP, but the actual data values remain unchanged. ● NDAP_REPORT_7060.csv: The final standardised data using LGD geographic dimensions as seen on NPAP. ● 7060_metadata.csv: Variable-level metadata, including the following fields: ❖ VariableName: The full variable name as it appears in the data ❖ VariableCode: A unique variable code that is used as a short name for the variable during internal processing and can be used for simplicity if desired ❖ Type_Of_Variable: The classification of the column, whether it is a dimension or a variable (i.e. indicator) ❖ Unit_Of_Measure: ❖ Aggregation_Type: The default aggregation function to be used when aggregating each variable ❖ Weighing_Variable_Name: The weight assigned to each variable that is used by default when aggregating ❖ Weighing_Variable_ID: The weighting variable id corresponding to the weighing variable name ❖ Long_Description: A more descriptive definition of the variable ❖ Scaling_factor: Scaling factor from source ● 7060_KEYS.csv: The key which maps source administrative units to the standardised Local Government Directory (LGD) dimensions. This file also contains pre-calculated weights for every constituent unit mapped from the source dimensions into the LGD. You can interpret each row as describing what fraction of the source unit is mapped to a corresponding LGD unit. This file includes the following fields: ❖ src[Unit]Name: The administrative unit name as it appears in the source data. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit name as it appears in the LGD. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit code corresponding to the unit name in the LDG. ❖ Year: The year in which the data was collected or reported. Depending on the dataset, any other temporal variables may also be present (Quarter, Month, Calendar Day, etc.) ❖ Number_Of_Children: The number of LGD units associated with the mapping described by an individual row. Units from the source that have undergone a split will contain multiple children. ❖ Number_Of_Parents: The number of source units associated with the mapping described by an individual row. Units from the source that have undergone a merge will contain multiple parents. ❖ Weighing_Variables: Households, Population, Male Population, Female Population, Land Area (Total, Rural, and Urban versions of each). For each weighing variable there are the following associated fields: ■ Count: the total count of households, population, or land area mapped from the source unit to the LGD unit for that particular row (NumberOfHouseholds, TotalPopulation, LandArea). ■ Mapping_Error: the percentage error due to missing villages in the base data, meaning what fraction of the weighing variable is dropped because the microdata could not be mapped to the LGD. ■ Weighing_Ratio: the weighing ratio for that constituent match of source unit to LGD unit for each particular row. This is the fraction applied to the source data to achieve the LGD-standardised final data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Maps with wind speed, wind rose and wind power density potential in India. The GIS data stems from the Global Wind Atlas (http://globalwindatlas.info/). GIS data is available as JSON and CSV. The second link provides poster size (.pdf) and midsize maps (.png).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Residential Schools Locations Dataset in shapefile format contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this data set, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The data set was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this data set,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School. When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites. The geographic coordinate system for this dataset is WGS 1984. The data in shapefile format [IRS_locations.zip] can be viewed and mapped in a Geographic Information System software. Detailed metadata in xml format is available as part of the data in shapefile format. In addition, the field name descriptions (IRS_locfields.csv) and the detailed locations descriptions (IRS_locdescription.csv) should be used alongside the data in shapefile format.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.
Product Details: Name, Brand, Category, and Unique ID
Pricing Information: Current Price, Discounted Price, and Currency
Availability & Ratings: Stock Status, Customer Ratings, and Reviews
Seller Information: Seller Name and Fulfillment Details
Additional Attributes: Product Description, Specifications, and Images
Format: CSV
Number of Records: 50,000+
Delivery Time: 3 Days
Price: $149.00
Availability: Immediate
This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.