Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This fictional sales dataset was created using a R code for the purpose of visualizing trends in customer demographics, product performance, and sales over time. A link to my Github repository containing all the codes used in generating the data frame and all the preceding processes can be found here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Energy time series data structure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A DataSet of Supply Chains used by the company DataCo Global was used for the analysis. Dataset of Supply Chain , which allows the use of Machine Learning Algorithms and R Software. Areas of important registered activities : Provisioning , Production , Sales , Commercial Distribution.It also allows the correlation of Structured Data with Unstructured Data for knowledge generation.
Types of Products : Clothing , Sports , and Electronic Supplies
Additionally it is attached in another file called DescriptionDataCoSupplyChain.csv, the description of each of the variables of the DataCoSupplyChainDatasetc.csv.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Sales and market dynamics play a pivotal role in determining how businesses reach their customers and ultimately drive revenue. In today's highly competitive landscape, understanding the intricate relationship between sales strategies and market positioning is essential for any organization aiming to thrive. Sales r
Market basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The global CD-R drive market, while facing a decline in recent years due to the rise of digital media and cloud storage, maintains a niche presence driven by specific applications. While precise market sizing data is unavailable, a reasonable estimation, based on industry trends and the presence of numerous established players like Sony, Yamaha, and Pioneer, suggests a 2025 market size of approximately $150 million. Considering the continued, albeit slow, demand for archival storage, specialized audio applications, and certain industrial uses, we project a compound annual growth rate (CAGR) of -3% from 2025 to 2033. This modest negative growth reflects the ongoing shift towards digital formats, yet acknowledges the persistence of CD-R technology in specific sectors. The market's future hinges on the continued demand in niche applications. Factors influencing this market include the increasing availability of affordable digital storage solutions, technological advancements in data storage, and the preference for streaming services. Despite the overall decline, several factors support the continued, albeit limited, market viability. These include the need for reliable, readily available, and cost-effective data backup solutions in certain industrial settings, the continuing use of CD-R technology for archiving purposes where data integrity and long-term accessibility are paramount, and specialized audio applications which value the inherent quality and simplicity of CD-R technology. Key players in the market are leveraging strategic collaborations and product innovations to tap into these niche markets and maintain a competitive edge. The market segmentation involves various drive types (internal vs. external), storage capacities, and target user segments. While the overall market is shrinking, understanding the specific needs of these niche segments is vital for continued market success.
https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy
Global Research And Development (R&D) Analytics Market demand is anticipated to be valued at US$ 2,025.0 Million in 2022, forecast a CAGR of 12.1% to be valued at US$ 6,366.6 Million from 2022 to 2032. Growth is attributed to the evolving need in end-use industries. From 2016 to 2021 a CAGR of 9.1% was registered for the Research And Development Analytics Market.
Data Points | Key Statistics |
---|---|
Growth Rate (2016 to 2021) | 9.1 % CAGR |
Projected Growth Rate (2022 to 2032) | 12.1% CAGR |
Expected Market Value (2022) | US$ 2,025.0 Million |
Anticipated Forecast Value (2032) | US$ 6,366.6 Million |
Report Scope
Report Attribute | Details |
---|---|
Growth Rate | CAGR of 12.1 % from 2022 to 2032 |
Expected Market Value (2022) | US$ 2025.0 Million |
Anticipated Forecast Value (2032) | US$ 6366.6 Million |
Base Year for Estimation | 2021 |
Historical Data | 2016 to 2021 |
Forecast Period | 2022 to 2032 |
Quantitative Units | Revenue in USD Billion, Volume in Kilotons, and CAGR from 2022 to 2032 |
Report Coverage | Revenue Forecast, Volume Forecast, Company Ranking, Competitive Landscape, Growth Factors, Trends, and Pricing Analysis |
Segments Covered |
|
Regions Covered |
|
Key Countries Profiled |
|
Key Companies Profiled |
|
Customization | Available Upon Request |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘HVAC Market Share by Efficiency and Capacity: Beginning 2017’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/b80e5928-b8a6-49bb-98c5-f90883ec2284 on 27 January 2022.
--- Dataset description provided by original source is as follows ---
HVAC Market Share by Efficiency and Capacity: Beginning 2017 dataset is based on heating, ventilation, and air conditioning (HVAC) sales data reported to D+R International by Heating, Air-conditioning & Refrigeration Distributors International (HARDI) members participating in the Unitary HVAC Market Report. Participation in the report is voluntary for distributors. The dataset covers New York State and the Northeast (includes Maine, New Hampshire, Vermont, Massachusetts, Connecticut, and Rhode Island). Blank cells represent data that are not currently available.
--- Original source retains full ownership of the source dataset ---
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT Aiming to analyzing the commercialization of pesticides in Brazil, its regions and states, an ecological time series study was developed from 2000 to 2014, based on data on sales of pesticides from the Brazilian Institute of the Environment and Renewable Natural Resources and the National Union of Plant Protection Products Industry. The commercialization was calculated as the quotient of the quantity of active ingredients, in kilograms, and the planted area of the main crops, in hectares, annually in the states and regions. The Excel® and R programs were used for data analysis. For trend analysis, linear regression was used with a 5% significance level. There was a trend towards an increase in sales in all regions of the country in the period (p
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Values of R- square.
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global digital camera market, valued at $5.39 billion in 2025, is projected to experience robust growth, driven by several key factors. The increasing popularity of photography as a hobby and profession, coupled with advancements in camera technology such as improved image sensors, faster autofocus systems, and enhanced video capabilities, are fueling market expansion. The rise of social media platforms and the demand for high-quality visual content further contribute to this growth. While the compact digital camera segment continues to exist, the market is significantly shaped by the popularity of mirrorless cameras, which offer a compelling blend of portability and professional-grade features. This segment is expected to see particularly strong growth over the forecast period. The interchangeable lens market is thriving due to the flexibility and customization options it provides to photographers of all skill levels. Geographic distribution shows a strong presence in North America and Europe, driven by high disposable incomes and established photography cultures. However, Asia-Pacific is poised for significant growth due to its expanding middle class and increasing adoption of digital technologies. Despite the competition from smartphones with integrated cameras, the dedicated digital camera market maintains its appeal due to its superior image quality, advanced features, and interchangeable lens capabilities. This segment caters specifically to professional photographers, prosumers, and serious hobbyists who demand higher quality and versatility. The market's growth, however, might be somewhat restrained by the increasing cost of high-end digital cameras and lenses, as well as the ongoing technological advancements making older models obsolete faster. The consistent CAGR of 4.85% indicates a steady, predictable trajectory over the forecast period (2025-2033). The competitive landscape is dominated by established players like Canon, Nikon, Sony, and Fujifilm, each holding significant market share. These companies are constantly innovating to stay ahead of the curve, introducing new models with improved features and functionalities. The presence of smaller, specialized brands like Leica and Hasselblad caters to a niche market focused on premium quality and unique features. This ensures diverse product options to cater to the varied preferences of consumers across different segments and price points. New entrants and technological advancements will continue to shape the market, driving competition and further innovation. Successful strategies for market players will involve focusing on specific niches, optimizing product offerings to meet evolving consumer demands, and developing strong marketing campaigns targeting both professionals and enthusiasts. The long-term outlook for the digital camera market remains positive, driven by persistent demand for high-quality imagery and ongoing technological advancements. Recent developments include: October 2022 - Fujifilm announced a partnership with Adobe to produce a new mirrorless digital camera firmware, "FUJIFILM X-H2S" (X-H2S), in spring 2023, which would deliver the world's first native Camera to Cloud (C2C) connectivity for mirrorless digital cameras, driven by Frame.io. At the same time, the firmware for the "FT-XH" file transmitters will also be launched., September 2022 - Canon U.S.A., Inc., a significant provider of digital imaging solutions, announced the addition of a suite of products to its cinema and broadcast offerings in response to user demand. Canon's latest 8K CINE-SERVO lens for a wide range of productions is the CINE-SERVO 15-120mm T2.95-3.95 EF/PL (CN8x15 IAS S); the EU-V3, a modular expansion unit for the EOS C500 Mark II and EOS C300 Mark III cameras; a Cinema EOS firmware update; and the DP-V2730i, a 27-inch 4K professional reference display that may seamlessly fit into workflows of broadcasters and filmmakers., September 2022 - FUJIFILM North America Corporation announced the release of the new FUJINON XF56mmF1.2 R WR (XF56mmF1.2 R WR) weather-resistant, mid-telephoto prime lens, the latest in a line of interchangeable lenses designed for the FUJIFILM X Series of mirrorless digital cameras. The FUJINON XF56mmF1.2 R WR is the successor to the popular FUJINON XF56mmF1.2 R lens and now boasts considerable improvements from that earlier model in minimum focusing distance, image-resolving capability, and gorgeous rendering of out-of-focus background.. Key drivers for this market are: Anticipated Increase in Sales of Mirror-less Lens, Demand for Specialized Products from Niche Customer Base. Potential restraints include: Lack of Awareness to Challenge the Market Growth. Notable trends are: Increase in Sales of Mirror-less Lens is Driving the Market.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic but realistic dataset contains 90+ customer reviews for 6 smartphone models (from Apple, Samsung, and Google), along with: - Product specifications (Price, Screen Size, Battery, Camera, RAM, Storage, 5G, Water Resistance) - Customer reviews (Star Ratings, Review Text, Verified Purchase Status) - Sales data (Units Sold per Model)
Potential Use Cases: ✅ Feature importance analysis (Which specs drive ratings?) ✅ Sentiment analysis (NLP on reviews) ✅ Pricing strategy optimization ✅ Market research (Comparing Apple vs. Samsung vs. Google)
Objective: Understand how product features influence purchasing decisions and satisfaction.
Which smartphone brand did you purchase?
brand
column.Which model did you purchase?
model_name
column.Where did you purchase the phone?
verified_purchase
(assumed online = verified).How would you rate the following features? (1 = Poor, 5 = Excellent)
star_rating
(average of these).Which feature is MOST important to you?
review_text
keywords (e.g., "battery" mentions).How do you feel about the price of your phone?
price
vs. star_rating
correlation.Would you recommend this phone to others?
star_rating
(5 = Definitely Yes).Column Details (Metadata)
Column Name (Type) Description "Example"**
model_id (Integer) Unique ID for each phone model 1 (iPhone 14)
brand (String) Manufacturer (Apple, Samsung, Google) "Apple"
model_name (String) Name of the phone model "iPhone 15"
price (Integer) Price in USD 999
screen_size (Float) Screen size in inches 6.1
battery (Integer) Battery capacity in mAh 4000
camera_main (String) Main camera resolution (MP) "48MP"
ram (Integer) RAM in GB 8
storage (Integer) Storage in GB 128
has_5g (Boolean) Whether the phone supports 5G TRUE
water_resistant (String) Water resistance rating (IP68 or None) "IP68"
units_sold (Integer) Estimated units sold (for market analysis) 15000
review_id (Integer) Unique ID for each review 1
user_name (String) Randomly generated reviewer name "John"
star_rating (Integer) Rating from 1 (worst) to 5 (best) 5
verified_purchase (Boolean) Whether the reviewer bought the product TRUE
review_date (Date) Date of the review (YYYY-MM-DD) "2023-05-10"
review_text (String) Simulated review text based on features & rating "The 48MP camera is amazing!"
Suggested Analysis Ideas to inspire data analysis: A. Feature Impact on Ratings Regression: star_rating ~ battery + camera_main + price Key drivers: Does battery life affect ratings more than camera quality?
B. Sentiment Analysis (NLP)
Use tidytext (R) or NLTK (Python) to extract most-loved/hated features.
Example:
r
library(tidytext)
reviews_tidy <- final_data %>% unnest_tokens(word, review_text)
reviews_tidy %>% count(word, sort = TRUE) %>% filter(n > 5)
C. Brand Comparison Apple vs. Samsung vs. Google: Which brand has higher average ratings? Price sensitivity: Do cheaper phones (e.g., Pixel) get better value ratings?
D. Sales vs. Features Correlation: units_sold ~ price + brand Premium segment analysis: Do iPhones sell more despite higher prices?
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This fictional sales dataset was created using a R code for the purpose of visualizing trends in customer demographics, product performance, and sales over time. A link to my Github repository containing all the codes used in generating the data frame and all the preceding processes can be found here