100+ datasets found

TMDB movies clean dataset
kaggle.com
zip
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bharat Kumar0925 (2024). TMDB movies clean dataset [Dataset]. https://www.kaggle.com/datasets/bharatkumar0925/tmdb-movies-clean-dataset
Explore at:
zip(266877093 bytes)Available download formats
Dataset updated
Sep 6, 2024
Authors
Bharat Kumar0925
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Description

This dataset contains two files: Large_movies_data.csv and large_movies_clean.csv. The data is taken from the TMDB dataset. Originally, it contained around 900,000 movies, but some movies were dropped for recommendation purposes. Specifically, movies missing an overview were removed since the overview is one of the most important columns for analysis.

Column Description:

Large_movies_data.csv:

Id: Unique identifier for each movie.

Title: The title of the movie.

Overview: A brief description of the movie.

Genres: The genres associated with the movie.

Cast: The main actors in the movie.

Director: The director of the movie.

Writers: The screenwriters of the movie.

Production_companies: Companies involved in producing the movie.

Producers: Producers of the movie.

Original_language: The original language of the movie.

Vote_count: Number of votes the movie has received.

Vote_average: Average rating based on user votes.

Popularity: Popularity score of the movie.

Runtime: Duration of the movie in minutes.

Release_date: The release date of the movie.

Total movies in Large_movies_data.csv: 663,828.

Large_movies_clean.csv:

This file is a cleaned version with unnecessary columns removed, text converted to lowercase, and many symbols removed (though some may still remain). If you find that certain features are missing, you can use the original Large_movies_data.csv.

Columns in large_movies_clean.csv: - Id: Unique identifier for each movie. - Title: The title of the movie. - Tags: Combined information from the overview, genres, and other textual columns. - Original_language: The original language of the movie. - Vote_count: Number of votes the movie has received. - Vote_average: Average rating based on user votes. - Year: Year extracted from the release date. - Month: Month extracted from the release date.

Possible Use Cases:

Recommendation System: A robust recommendation system can be built using this large dataset.

Analysis: Analyze various aspects, such as identifying actors who starred in the most popular movies, the impact of having the same writer, director, and producer on a movie, and whether independent producers create better movies.

Rating Prediction: Predict the average rating of a movie based on factors such as overview, genres, and cast.

Other Analysis: Perform other types of analysis to discover patterns in the movie industry.

If you find this dataset useful, please upvote it!
LinkedIn Datasets
brightdata.com
.json, .csv, .xlsx
Updated Dec 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 17, 2021
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
Large Customer Churn Analysis Dataset
kaggle.com
zip
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hajra Amir (2024). Large Customer Churn Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/hajraamir21/large-customer-churn-analysis-dataset
Explore at:
zip(17387 bytes)Available download formats
Dataset updated
Dec 18, 2024
Authors
Hajra Amir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains synthetic data generated for customer churn analysis. It includes 1000 entries representing customer information, such as demographics, account details, subscription types, and churn status. The data is ideal for predictive modeling, machine learning algorithms, and exploratory data analysis (EDA). Features: CustomerID: A unique identifier for each customer. Gender: Male or Female. Age: Customer's age in years. Geography: Country or region of the customer (e.g., Germany, France, UK). Tenure: Number of months the customer has been with the company. Contract: Type of subscription (Month-to-month, One-year, Two-year). MonthlyCharges: The amount billed monthly. TotalCharges: The total amount billed to date. PaymentMethod: Method used for payments (e.g., Credit card, Direct debit). IsActiveMember: Whether the customer is an active member (1 = Active, 0 = Inactive). Churn: Indicates whether the customer has churned (Yes/No).
m
Student Skill Gap Analysis
data.mendeley.com
kaggle.com
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bindu Garg (2025). Student Skill Gap Analysis [Dataset]. http://doi.org/10.17632/rv6scbpd7v.1
Explore at:
Unique identifier
https://doi.org/10.17632/rv6scbpd7v.1
Dataset updated
Apr 28, 2025
Authors
Bindu Garg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is designed for skill gap analysis, focusing on evaluating the skill gap between students’ current skills and industry requirements. It provides insights into technical skills, soft skills, career interests, and challenges, helping in skill gap analysis to identify areas for improvement.

By leveraging this dataset, educators, recruiters, and researchers can conduct skill gap analysis to assess students’ job readiness and tailor training programs accordingly. It serves as a valuable resource for identifying skill deficiencies and skill gaps improving career guidance, and enhancing curriculum design through targeted skill gap analysis.

Following is the column descriptors: Name - Student's full name. email_id - Student's email address. Year - The academic year the student is currently in (e.g., 1st Year, 2nd Year, etc.). Current Course - The course the student is currently pursuing (e.g., B.Tech CSE, MBA, etc.). Technical Skills - List of technical skills possessed by the student (e.g., Python, Data Analysis, Cloud Computing). Programming Languages - Programming languages known by the student (e.g., Python, Java, C++). Rating - Self-assessed rating of technical skills on a scale of 1 to 5. Soft Skills - List of soft skills (e.g., Communication, Leadership, Teamwork). Rating - Self-assessed rating of soft skills on a scale of 1 to 5. Projects - Indicates whether the student has worked on any projects (Yes/No). Career Interest - The student's preferred career path (e.g., Data Scientist, Software Engineer). Challenges - Challenges faced while applying for jobs/internships (e.g., Lack of experience, Resume building issues).
f
Data from: Additive Hazards Regression Analysis of Massive Interval-Censored...
tandf.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peiyao Huang; Shuwei Li; Xinyuan Song (2025). Additive Hazards Regression Analysis of Massive Interval-Censored Data via Data Splitting [Dataset]. http://doi.org/10.6084/m9.figshare.27103243.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27103243.v1
Dataset updated
May 12, 2025
Dataset provided by
Taylor & Francis
Authors
Peiyao Huang; Shuwei Li; Xinyuan Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid development of data acquisition and storage space, massive datasets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To accommodate such big volume in the analysis, a variety of methods have been proposed in the circumstances of complete or right censored survival data. However, existing development of big data methodology has not attended to interval-censored outcomes, which are ubiquitous in cross-sectional or periodical follow-up studies. In this work, we propose an easily implemented divide-and-combine approach for analyzing massive interval-censored survival data under the additive hazards model. We establish the asymptotic properties of the proposed estimator, including the consistency and asymptotic normality. In addition, the divide-and-combine estimator is shown to be asymptotically equivalent to the full-data-based estimator obtained from analyzing all data together. Simulation studies suggest that, relative to the full-data-based approach, the proposed divide-and-combine approach has desirable advantage in terms of computation time, making it more applicable to large-scale data analysis. An application to a set of interval-censored data also demonstrates the practical utility of the proposed method.
c
Fox News dataset is for analyzing media trends and narratives
crawlfeeds.com
csv, zip
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

Key Features of the Fox News Dataset

Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.

Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.

Format: Provided in CSV format for seamless integration into analytical and research tools.

Why Use This Dataset?

This large dataset is ideal for:

Text Classification: Develop machine learning models to classify and categorize news content.

Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.

Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.

Trend Analysis: Identify shifts in public discourse and media focus over time.

Explore More News Datasets

Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
Refined DataCo Supply Chain Geospatial Dataset
kaggle.com
zip
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om Gupta (2025). Refined DataCo Supply Chain Geospatial Dataset [Dataset]. https://www.kaggle.com/datasets/aaumgupta/refined-dataco-supply-chain-geospatial-dataset
Explore at:
zip(29010639 bytes)Available download formats
Dataset updated
Jan 29, 2025
Authors
Om Gupta
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Refined DataCo Smart Supply Chain Geospatial Dataset

Optimized for Geospatial and Big Data Analysis

This dataset is a refined and enhanced version of the original DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS dataset, specifically designed for advanced geospatial and big data analysis. It incorporates geocoded information, language translations, and cleaned data to enable applications in logistics optimization, supply chain visualization, and performance analytics.

Key Features

1. Geocoded Source and Destination Data

Accurate latitude and longitude coordinates for both source and destination locations.

Facilitates geospatial mapping, route analysis, and distance calculations.

2. Supplementary GeoJSON Files

src_points.geojson: Source point geometries.

dest_points.geojson: Destination point geometries.

routes.geojson: Line geometries representing source-destination routes.

These files are compatible with GIS software and geospatial libraries such as GeoPandas, Folium, and QGIS.

3. Language Translation

Key location fields (countries, states, and cities) are translated into English for consistency and global accessibility.

4. Cleaned and Consolidated Data

Addressed missing values, removed duplicates, and corrected erroneous entries.

Ready-to-use dataset for analysis without additional preprocessing.

5. Routes and Points Geometry

Enables the creation of spatial visualizations, hotspot identification, and route efficiency analyses.

Applications

1. Logistics Optimization

Analyze transportation routes and delivery performance to improve efficiency and reduce costs.

2. Supply Chain Visualization

Create detailed maps to visualize the global flow of goods.

3. Geospatial Modeling

Perform proximity analysis, clustering, and geospatial regression to uncover patterns in supply chain operations.

4. Business Intelligence

Use the dataset for KPI tracking, decision-making, and operational insights.

Dataset Content

Files Included

DataCoSupplyChainDatasetRefined.csv

The main dataset containing cleaned fields, geospatial coordinates, and English translations.

src_points.geojson

GeoJSON file containing the source points for easy visualization and analysis.

dest_points.geojson

GeoJSON file containing the destination points.

routes.geojson

GeoJSON file with LineStrings representing routes between source and destination points.

Attribution

This dataset is based on the original dataset published by Fabian Constante, Fernando Silva, and António Pereira:
Constante, Fabian; Silva, Fernando; Pereira, António (2019), “DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS”, Mendeley Data, V5, doi: 10.17632/8gx2fvg2k6.5.

Refinements include geospatial processing, translation, and additional cleaning by the uploader to enhance usability and analytical potential.

Tips for Using the Dataset

For geospatial analysis, leverage tools like GeoPandas, QGIS, or Folium to visualize routes and points.

Use the GeoJSON files for interactive mapping and spatial queries.

Combine this dataset with external datasets (e.g., road networks) for enriched analytics.

This dataset is designed to empower data scientists, researchers, and business professionals to explore the intersection of geospatial intelligence and supply chain optimization.
c
Walmart Products Dataset – Free Product Data CSV
crawlfeeds.com
csv, zip
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Walmart Products Dataset – Free Product Data CSV [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.

Key Features

Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.

CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.

Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.

Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.

Who Benefits?

Data analysts & researchers exploring e-commerce trends or product catalog data.

Developers & data scientists building price-comparison tools, recommendation engines or ML models.

E-commerce strategists/marketers need product metadata for competitive analysis or market research.

Students/hobbyists needing a free dataset for learning or demo projects.

Why Use This Dataset Instead of Manual Scraping?

Time-saving: No need to write scrapers or deal with rate limits.

Clean, structured data: All records are verified and already formatted in CSV, saving hours of cleaning.

Risk-free: Avoid Terms-of-Service issues or IP blocks that come with manual scraping.
Instant access: Free and immediately downloadable.
E
Exploratory Data Analysis (EDA) Tools Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54257
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Exploratory Data Analysis (EDA) tools market is experiencing robust growth, driven by the increasing need for businesses to derive actionable insights from their ever-expanding datasets. The market, currently estimated at $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated $45 billion by 2033. This growth is fueled by several factors, including the rising adoption of big data analytics, the proliferation of cloud-based solutions offering enhanced accessibility and scalability, and the growing demand for data-driven decision-making across diverse industries like finance, healthcare, and retail. The market is segmented by application (large enterprises and SMEs) and type (graphical and non-graphical tools), with graphical tools currently holding a larger market share due to their user-friendly interfaces and ability to effectively communicate complex data patterns. Large enterprises are currently the dominant segment, but the SME segment is anticipated to experience faster growth due to increasing affordability and accessibility of EDA solutions. Geographic expansion is another key driver, with North America currently holding the largest market share due to early adoption and a strong technological ecosystem. However, regions like Asia-Pacific are exhibiting high growth potential, fueled by rapid digitalization and a burgeoning data science talent pool. Despite these opportunities, the market faces certain restraints, including the complexity of some EDA tools requiring specialized skills and the challenge of integrating EDA tools with existing business intelligence platforms. Nonetheless, the overall market outlook for EDA tools remains highly positive, driven by ongoing technological advancements and the increasing importance of data analytics across all sectors. The competition among established players like IBM Cognos Analytics and Altair RapidMiner, and emerging innovative companies like Polymer Search and KNIME, further fuels market dynamism and innovation.
m
The RHMCD-20 datasets for Depression and Mental Health Data Analysis with...
data.mendeley.com
Updated Dec 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imrus Salehin (2023). The RHMCD-20 datasets for Depression and Mental Health Data Analysis with Machine Learning [Dataset]. http://doi.org/10.17632/pxjmjyfdh2.1
Explore at:
Unique identifier
https://doi.org/10.17632/pxjmjyfdh2.1
Dataset updated
Dec 18, 2023
Authors
Imrus Salehin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
the RHMCD-20 dataset, we took care to include information from a wide range of sources, including teenagers from Bangladesh, college students, housewives, professionals from businesses and corporations, and other people.This is survey data for Depression and Mental Health Data Analysis. Survey questions : Age: Represents the age of the participants. Gender: Indicates the gender of the participants. Occupation: Represents the participant's occupations. Days_Indoors :Indicates the number of days the participant has not been out of the house Growing_Stress: Indicates the participant's stress is increasing day by day (Yes/No). Quarantine_Frustration: Frustrations in the first two weeks of quarantine (Yes/Maybe/No). Changes_Habits: Represents major changes in eating habits and sleeping (Yes/Maybe/No). Mental_Health_History : A precedent of mental disorders in the previous generation (Yes/No). Weight_Change :Highlights changes in body weight during quarantine (Yes/Maybe/No) Mood_Swings: Represents extreme mood changes (Low/Medium/High). Coping_Struggles: The inability to cope with daily problems or stress (Yes/Maybe/No). Work_Interest :Represents whether the participant is losing interest in working (Yes/No). Social_Weakness :Conveys feeling mentally weak when interacting with others (Yes/No).
A
Artificial Intelligence Training Dataset Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Artificial Intelligence Training Dataset Report [Dataset]. https://www.archivemarketresearch.com/reports/artificial-intelligence-training-dataset-38645
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Artificial Intelligence (AI) Training Dataset market is projected to reach $1605.2 million by 2033, exhibiting a CAGR of 9.4% from 2025 to 2033. The surge in demand for AI training datasets is driven by the increasing adoption of AI and machine learning technologies in various industries such as healthcare, financial services, and manufacturing. Moreover, the growing need for reliable and high-quality data for training AI models is further fueling the market growth. Key market trends include the increasing adoption of cloud-based AI training datasets, the emergence of synthetic data generation, and the growing focus on data privacy and security. The market is segmented by type (image classification dataset, voice recognition dataset, natural language processing dataset, object detection dataset, and others) and application (smart campus, smart medical, autopilot, smart home, and others). North America is the largest regional market, followed by Europe and Asia Pacific. Key companies operating in the market include Appen, Speechocean, TELUS International, Summa Linguae Technologies, and Scale AI. Artificial Intelligence (AI) training datasets are critical for developing and deploying AI models. These datasets provide the data that AI models need to learn, and the quality of the data directly impacts the performance of the model. The AI training dataset market landscape is complex, with many different providers offering datasets for a variety of applications. The market is also rapidly evolving, as new technologies and techniques are developed for collecting, labeling, and managing AI training data.
d
Data from: Inferring complex phylogenies using parsimony: an empirical...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Douglas E. Soltis; Pamela S. Soltis; Mark E. Mort; Mark W. Chase; Vincent Savolainen; Sara B. Hoot; Cynthia M. Morton (2025). Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms [Dataset]. http://doi.org/10.5061/dryad.64
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.64
Dataset updated
Jul 6, 2025
Dataset provided by
Dryad Digital Repository
Authors
Douglas E. Soltis; Pamela S. Soltis; Mark E. Mort; Mark W. Chase; Vincent Savolainen; Sara B. Hoot; Cynthia M. Morton
Time period covered
Feb 22, 2008
Description
To explore the feasibility of parsimony analysis for large data sets, we conducted heuristic parsimony searches and bootstrap analyses on separate and combined DNA data sets for 190 angiosperms and three outgroups. Separate data sets of 18S rDNA (1,855 bp), rbc L (1,428 bp), and atp B (1,450 bp) sequences were combined into a single matrix 4,733 bp in length. Analyses of the combined data set show great improvements in computer run times compared to those of the separate data sets and of the data sets combined in pairs. Six searches of the 18S rDNA rbc L atp B data set were conducted; in all cases TBR branch swapping was completed, generally within a few days. In contrast, TBR branch swapping was not completed for any of the three separate data sets, or for the pairwise combined data sets. These results illustrate that it is possible to conduct a thorough search of tree space with large data sets, given sufficient signal. In this case, and probably most others, sufficient signal for a ...
f
Assessment and Improvement of Statistical Tools for Comparative Proteomics...
acs.figshare.com
figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veit Schwämmle; Ileana Rodríguez León; Ole Nørregaard Jensen (2023). Assessment and Improvement of Statistical Tools for Comparative Proteomics Analysis of Sparse Data Sets with Few Experimental Replicates [Dataset]. http://doi.org/10.1021/pr400045u.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/pr400045u.s002
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Veit Schwämmle; Ileana Rodríguez León; Ole Nørregaard Jensen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
c
Flipkart reviews large dataset
crawlfeeds.com
csv, zip
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Flipkart reviews large dataset [Dataset]. https://crawlfeeds.com/datasets/flipkart-reviews-large-dataset
Explore at:
csv, zipAvailable download formats
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Flipkart Reviews Large Dataset is a comprehensive collection of 1.86 million customer reviews from Flipkart, one of India's largest e-commerce platforms. Available in CSV format, this dataset is ideal for conducting sentiment analysis, understanding consumer preferences, and developing machine learning models.

For a more extensive dataset, consider the Flipkart E-commerce Dataset, which offers detailed information on over 5.7 million products, including names, descriptions, prices, customer reviews, ratings, and images. This dataset is invaluable for data analysis, machine learning projects, and in-depth market research.

Whether you're looking to enhance recommendation systems, perform market research, or analyze customer feedback trends, this dataset offers a wealth of information.

Use Cases:

Sentiment Analysis: Analyze customer sentiments to understand product reception.

Recommendation Systems: Build models to recommend products based on customer feedback.

Consumer Behavior Analysis: Study purchasing patterns and preferences across different product categories.

Market Research: Gain insights into market trends and customer opinions for various products.
n
DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS
narcis.nl
data.mendeley.com
+1more
Updated Mar 13, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Constante, F (via Mendeley Data) (2019). DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS [Dataset]. http://doi.org/10.17632/8gx2fvg2k6.5
Explore at:
Unique identifier
https://doi.org/10.17632/8gx2fvg2k6.5
Dataset updated
Mar 13, 2019
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Constante, F (via Mendeley Data)
Description
A DataSet of Supply Chains used by the company DataCo Global was used for the analysis. Dataset of Supply Chain , which allows the use of Machine Learning Algorithms and R Software. Areas of important registered activities : Provisioning , Production , Sales , Commercial Distribution.It also allows the correlation of Structured Data with Unstructured Data for knowledge generation.

Type Data : Structured Data : DataCoSupplyChainDataset.csv Unstructured Data : tokenized_access_logs.csv (Clickstream)

Types of Products : Clothing , Sports , and Electronic Supplies

Additionally it is attached in another file called DescriptionDataCoSupplyChain.csv, the description of each of the variables of the DataCoSupplyChainDatasetc.csv.
N
Comprehensive Median Household Income and Distribution Dataset for Big...
neilsberg.com
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Big Sandy, TX: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd8b6e24-b041-11ee-aaca-3860777c1fe6/
Explore at:
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Texas, Big Sandy
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the median household income in Big Sandy. It can be utilized to understand the trend in median household income and to analyze the income distribution in Big Sandy by household type, size, and across various income brackets.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Big Sandy, TX Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)

Median Household Income Variation by Family Size in Big Sandy, TX: Comparative analysis across 7 household sizes

Income Distribution by Quintile: Mean Household Income in Big Sandy, TX

Big Sandy, TX households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Big Sandy median household income. You can refer the same here
Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and...
technavio.com
pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/cloud-analytics-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jul 22, 2024
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2024 - 2028
Description
Snapshot img

Cloud Analytics Market Size 2024-2028

The cloud analytics market size is forecast to increase by USD 74.08 billion at a CAGR of 24.4% between 2023 and 2028.

The market is experiencing significant growth due to several key trends. The adoption of hybrid and multi-cloud setups is on the rise, as these configurations enhance data connectivity and flexibility. Another trend driving market growth is the increasing use of cloud security applications to safeguard sensitive data. However, concerns regarding confidential data security and privacy remain a challenge for market growth. Organizations must ensure robust security measures are in place to mitigate risks and maintain trust with their customers. Overall, the market is poised for continued expansion as businesses seek to leverage the benefits of cloud technologies for data processing and data analytics.

What will be the Size of the Cloud Analytics Market During the Forecast Period?

Request Free Sample

The market is experiencing significant growth due to the increasing volume of data generated by businesses and the demand for advanced analytics solutions. Cloud-based analytics enables organizations to process and analyze large datasets from various data sources, including unstructured data, in real-time. This is crucial for businesses looking to make data-driven decisions and gain valuable insights to optimize their operations and meet customer requirements. Key industries such as sales and marketing, customer service, and finance are adopting cloud analytics to improve key performance indicators and gain a competitive edge. Both Small and Medium-sized Enterprises (SMEs) and large enterprises are embracing cloud analytics, with solutions available on private, public, and multi-cloud platforms. Big data technology, such as machine learning and artificial intelligence, are integral to cloud analytics, enabling advanced data analytics and business intelligence. Cloud analytics provides businesses with the flexibility to store and process data In the cloud, reducing the need for expensive on-premises data storage and computation. Hybrid environments are also gaining popularity, allowing businesses to leverage the benefits of both private and public clouds. Overall, the market is poised for continued growth as businesses increasingly rely on data-driven insights to inform their decision-making processes.

How is this Cloud Analytics Industry segmented and which is the largest segment?

The cloud analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2017-2022 for the following segments.

Solution Hosted data warehouse solutions Cloud BI tools Complex event processing Others Deployment Public cloud Hybrid cloud Private cloud Geography North America US Europe Germany UK APAC China Japan Middle East and Africa South America

By Solution Insights

The hosted data warehouse solutions segment is estimated to witness significant growth during the forecast period.

Hosted data warehouses enable organizations to centralize and analyze large datasets from multiple sources, facilitating advanced analytics solutions and real-time insights. By utilizing cloud-based infrastructure, businesses can reduce operational costs through eliminating licensing expenses, hardware investments, and maintenance fees. Additionally, cloud solutions offer network security measures, such as Software Defined Networking and Network integration, ensuring data protection. Cloud analytics caters to diverse industries, including SMEs and large enterprises, addressing requirements for sales and marketing, customer service, and key performance indicators. Advanced analytics capabilities, including predictive analytics, automated decision making, and fraud prevention, are essential for data-driven decision making and business optimization.

Furthermore, cloud platforms provide access to specialized talent, big data technology, and AI, enhancing customer experiences and digital business opportunities. Data connectivity and data processing in real-time are crucial for network agility and application performance. Hosted data warehouses offer computational power and storage capabilities, ensuring efficient data utilization and enterprise information management. Cloud service providers offer various cloud environments, including private, public, multi-cloud, and hybrid, catering to diverse business needs. Compliance and security concerns are addressed through cybersecurity frameworks and data security measures, ensuring data breaches and thefts are minimized.

Get a glance at the Cloud Analytics Industry report of share of various segments Request Free Sample

The Hosted data warehouse solutions s
N
Comprehensive Median Household Income and Distribution Dataset for Big Bend...
neilsberg.com
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Big Bend Town, Wisconsin: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd8b5c39-b041-11ee-aaca-3860777c1fe6/
Explore at:
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Big Bend, Wisconsin
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the median household income in Big Bend town. It can be utilized to understand the trend in median household income and to analyze the income distribution in Big Bend town by household type, size, and across various income brackets.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Big Bend Town, Wisconsin Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)

Median Household Income Variation by Family Size in Big Bend Town, Wisconsin: Comparative analysis across 7 household sizes

Income Distribution by Quintile: Mean Household Income in Big Bend Town, Wisconsin

Big Bend Town, Wisconsin households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Big Bend town median household income. You can refer the same here
Big data and business analytics revenue worldwide 2015-2022
statista.com
Updated Aug 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2021). Big data and business analytics revenue worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
Explore at:
Dataset updated
Aug 17, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data and business analytics (BDA) market was valued at ***** billion U.S. dollars in 2018 and is forecast to grow to ***** billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around ** billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate **** ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around **** billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.
Z
Cloud-based User Entity Behavior Analytics Log Data Set
data.niaid.nih.gov
zenodo.org
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landauer, Max; Skopik, Florian; Höld, Georg; Wurzenberger, Markus (2023). Cloud-based User Entity Behavior Analytics Log Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7119952
Explore at:
Dataset updated
Oct 30, 2023
Dataset provided by
AIT Austrian Institute of Technology
Authors
Landauer, Max; Skopik, Florian; Höld, Georg; Wurzenberger, Markus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This respository contains the CLUE-LDS (CLoud-based User Entity behavior analytics Log Data Set). The data set contains log events from real users utilizing a cloud storage suitable for User Entity Behavior Analytics (UEBA). Events include logins, file accesses, link shares, config changes, etc. The data set contains around 50 million events generated by more than 5000 distinct users in more than five years (2017-07-07 to 2022-09-29 or 1910 days). The data set is complete except for 109 events missing on 2021-04-22, 2021-08-20, and 2021-09-05 due to database failure. The unpacked file size is around 14.5 GB. A detailed analysis of the data set is provided in [1]. The logs are provided in JSON format with the following attributes in the first level:

id: Unique log line identifier that starts at 1 and increases incrementally, e.g., 1. time: Time stamp of the event in ISO format, e.g., 2021-01-01T00:00:02Z. uid: Unique anonymized identifier for the user generating the event, e.g., old-pink-crane-sharedealer. uidType: Specifier for uid, which is either the user name or IP address for logged out users. type: The action carried out by the user, e.g., file_accessed. params: Additional event parameters (e.g., paths, groups) stored in a nested dictionary. isLocalIP: Optional flag for event origin, which is either internal (true) or external (false). role: Optional user role: consulting, administration, management, sales, technical, or external. location: Optional IP-based geolocation of event origin, including city, country, longitude, latitude, etc. In the following data sample, the first object depicts a successful user login (see type: login_successful) and the second object depicts a file access (see type: file_accessed) from a remote location:

{"params": {"user": "intact-gray-marlin-trademarkagent"}, "type": "login_successful", "time": "2019-11-14T11:26:43Z", "uid": "intact-gray-marlin-trademarkagent", "id": 21567530, "uidType": "name"}

{"isLocalIP": false, "params": {"path": "/proud-copper-orangutan-artexer/doubtful-plum-ptarmigan-merchant/insufficient-amaranth-earthworm-qualitycontroller/curious-silver-galliform-tradingstandards/incredible-indigo-octopus-printfinisher/wicked-bronze-sloth-claimsmanager/frantic-aquamarine-horse-cleric"}, "type": "file_accessed", "time": "2019-11-14T11:26:51Z", "uid": "graceful-olive-spoonbill-careersofficer", "id": 21567531, "location": {"countryCode": "AT", "countryName": "Austria", "region": "4", "city": "Gmunden", "latitude": 47.915, "longitude": 13.7959, "timezone": "Europe/Vienna", "postalCode": "4810", "metroCode": null, "regionName": "Upper Austria", "isInEuropeanUnion": true, "continent": "Europe", "accuracyRadius": 50}, "uidType": "ipaddress"} The data set was generated at the premises of Huemer Group, a midsize IT service provider located in Vienna, Austria. Huemer Group offers a range of Infrastructure-as-a-Service solutions for enterprises, including cloud computing and storage. In particular, their cloud storage solution called hBOX enables customers to upload their data, synchronize them with multiple devices, share files with others, create versions and backups of their documents, collaborate with team members in shared data spaces, and query the stored documents using search terms. The hBOX extends the open-source project Nextcloud with interfaces and functionalities tailored to the requirements of customers. The data set comprises only normal user behavior, but can be used to evaluate anomaly detection approaches by simulating account hijacking. We provide an implementation for identifying similar users, switching pairs of users to simulate changes of behavior patterns, and a sample detection approach in our github repo. Acknowledgements: Partially funded by the FFG project DECEPT (873980). The authors thank Walter Huemer, Oskar Kruschitz, Kevin Truckenthanner, and Christian Aigner from Huemer Group for supporting the collection of the data set. If you use the dataset, please cite the following publication: [1] M. Landauer, F. Skopik, G. Höld, and M. Wurzenberger. "A User and Entity Behavior Analytics Log Data Set for Anomaly Detection in Cloud Computing". 2022 IEEE International Conference on Big Data - 6th International Workshop on Big Data Analytics for Cyber Intelligence and Defense (BDA4CID 2022), December 17-20, 2022, Osaka, Japan. IEEE. [PDF]

Facebook

Twitter

Click to copy link

Link copied

Cite

Bharat Kumar0925 (2024). TMDB movies clean dataset [Dataset]. https://www.kaggle.com/datasets/bharatkumar0925/tmdb-movies-clean-dataset

TMDB movies clean dataset

Large clean data of TMDB movies

Explore at:

zip(266877093 bytes)Available download formats

Dataset updated

Sep 6, 2024

Authors

Bharat Kumar0925

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset Description

This dataset contains two files: Large_movies_data.csv and large_movies_clean.csv. The data is taken from the TMDB dataset. Originally, it contained around 900,000 movies, but some movies were dropped for recommendation purposes. Specifically, movies missing an overview were removed since the overview is one of the most important columns for analysis.

Column Description:

Large_movies_data.csv:

Id: Unique identifier for each movie.
Title: The title of the movie.
Overview: A brief description of the movie.
Genres: The genres associated with the movie.
Cast: The main actors in the movie.
Director: The director of the movie.
Writers: The screenwriters of the movie.
Production_companies: Companies involved in producing the movie.
Producers: Producers of the movie.
Original_language: The original language of the movie.
Vote_count: Number of votes the movie has received.
Vote_average: Average rating based on user votes.
Popularity: Popularity score of the movie.
Runtime: Duration of the movie in minutes.
Release_date: The release date of the movie.

Total movies in Large_movies_data.csv: 663,828.

Large_movies_clean.csv:

This file is a cleaned version with unnecessary columns removed, text converted to lowercase, and many symbols removed (though some may still remain). If you find that certain features are missing, you can use the original Large_movies_data.csv.

Columns in large_movies_clean.csv: - Id: Unique identifier for each movie. - Title: The title of the movie. - Tags: Combined information from the overview, genres, and other textual columns. - Original_language: The original language of the movie. - Vote_count: Number of votes the movie has received. - Vote_average: Average rating based on user votes. - Year: Year extracted from the release date. - Month: Month extracted from the release date.

Possible Use Cases:

Recommendation System: A robust recommendation system can be built using this large dataset.
Analysis: Analyze various aspects, such as identifying actors who starred in the most popular movies, the impact of having the same writer, director, and producer on a movie, and whether independent producers create better movies.
Rating Prediction: Predict the average rating of a movie based on factors such as overview, genres, and cast.
Other Analysis: Perform other types of analysis to discover patterns in the movie industry.

If you find this dataset useful, please upvote it!

Clear search

Close search

Google apps

Main menu

TMDB movies clean dataset

Dataset Description

Column Description:

Large_movies_data.csv:

Large_movies_clean.csv:

Possible Use Cases:

LinkedIn Datasets

Large Customer Churn Analysis Dataset

Student Skill Gap Analysis

Data from: Additive Hazards Regression Analysis of Massive Interval-Censored...

Fox News dataset is for analyzing media trends and narratives

Key Features of the Fox News Dataset

Why Use This Dataset?

Explore More News Datasets

Refined DataCo Supply Chain Geospatial Dataset

Refined DataCo Smart Supply Chain Geospatial Dataset

Key Features

1. Geocoded Source and Destination Data

2. Supplementary GeoJSON Files

3. Language Translation

4. Cleaned and Consolidated Data

5. Routes and Points Geometry

Applications

1. Logistics Optimization

2. Supply Chain Visualization

3. Geospatial Modeling

4. Business Intelligence

Dataset Content

Files Included

Attribution

Tips for Using the Dataset

Walmart Products Dataset – Free Product Data CSV

Key Features

Who Benefits?

Why Use This Dataset Instead of Manual Scraping?

Exploratory Data Analysis (EDA) Tools Report

The RHMCD-20 datasets for Depression and Mental Health Data Analysis with...

Artificial Intelligence Training Dataset Report

Data from: Inferring complex phylogenies using parsimony: an empirical...

Assessment and Improvement of Statistical Tools for Comparative Proteomics...

Flipkart reviews large dataset

DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS

Comprehensive Median Household Income and Distribution Dataset for Big...

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and...

Snapshot img

Comprehensive Median Household Income and Distribution Dataset for Big Bend...

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Big data and business analytics revenue worldwide 2015-2022

Cloud-based User Entity Behavior Analytics Log Data Set

TMDB movies clean dataset

Large clean data of TMDB movies

Dataset Description

Column Description:

Large_movies_data.csv:

Large_movies_clean.csv:

Possible Use Cases: