72 datasets found

Customer Shopping Trends Dataset
kaggle.com
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

Customer ID - Unique identifier for each customer

Age - Age of the customer

Gender - Gender of the customer (Male/Female)

Item Purchased - The item purchased by the customer

Category - Category of the item purchased

Purchase Amount (USD) - The amount of the purchase in USD

Location - Location where the purchase was made

Size - Size of the purchased item

Color - Color of the purchased item

Season - Season during which the purchase was made

Review Rating - Rating given by the customer for the purchased item

Subscription Status - Indicates if the customer has a subscription (Yes/No)

Shipping Type - Type of shipping chosen by the customer

Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)

Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)

Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction

Payment Method - Customer's most preferred payment method

Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon
d
B2B Marketing Data | B2B Leads Data | 181M+ Records | Decision Makers,...
datarade.ai
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exellius Systems (2023). B2B Marketing Data | B2B Leads Data | 181M+ Records | Decision Makers, Executives, CEO, MD | 20+ Attributes, Direct E-mail & Phone [Dataset]. https://datarade.ai/data-products/exellius-systems-decision-makers-executives-b2b-contact-data-exellius-systems
Explore at:
.xml, .csv, .xls, .txtAvailable download formats
Dataset updated
Jul 27, 2023
Dataset authored and provided by
Exellius Systems
Area covered
Togo, Yemen, Somalia, Papua New Guinea, Antarctica, Albania, Kiribati, Ghana, State of, Bangladesh
Description
Transform Your Business with Our Comprehensive B2B Marketing Data Our B2B Marketing Data is designed to be a cornerstone for data-driven professionals looking to optimize their business strategies. With an unwavering commitment to data integrity and quality, our dataset empowers you to make informed decisions, enhance your outreach efforts, and drive business growth.

Why Choose Our B2B Marketing Data? Unmatched Data Integrity and Quality Our data is meticulously sourced and validated through rigorous processes to ensure its accuracy, relevance, and reliability. This commitment to excellence guarantees that you are equipped with the most up-to-date information, empowering your business to thrive in a competitive landscape.

Versatile and Strategic Applications This versatile dataset caters to a wide range of business needs, including:

Lead Generation: Identify and connect with potential clients who align with your business goals. Market Segmentation: Tailor your marketing efforts by segmenting your audience based on industry, company size, or geographical location. Personalized Marketing Campaigns: Craft personalized outreach strategies that resonate with your target audience, increasing engagement and conversion rates. B2B Communication Strategies: Enhance your communication efforts with direct access to decision-makers, ensuring your message reaches the right people. Comprehensive Data Attributes Our B2B Marketing Data offers more than just basic contact information. With over 20+ attributes, you gain in-depth insights into:

Decision-Maker Roles: Understand the responsibilities and influence of key figures within an organization, such as CEOs, executives, and other senior management. Industry Affiliations: Analyze industry-specific data to tailor your approach to the unique dynamics of each sector. Contact Information: Direct email addresses and phone numbers streamline communication, enabling you to engage with your audience effectively and efficiently. Expansive Global Coverage Our dataset spans a wide array of countries, providing a truly global perspective for your business initiatives. Whether you're looking to expand into new markets or strengthen your presence in existing ones, our data ensures comprehensive coverage across the following regions:

North America: United States, Canada, Mexico Europe: United Kingdom, Germany, France, Italy, Spain, Netherlands, Sweden, and more Asia: China, Japan, India, South Korea, Singapore, Malaysia, and more South America: Brazil, Argentina, Chile, Colombia, and more Africa: South Africa, Nigeria, Kenya, Egypt, and more Australia and Oceania: Australia, New Zealand Middle East: United Arab Emirates, Saudi Arabia, Israel, Qatar, and more Industry-Wide Reach Our B2B Marketing Data covers an extensive range of industries, ensuring that no matter your focus, you have access to the insights you need:

Finance and Banking Technology Healthcare Manufacturing Retail Education Energy Real Estate Telecommunications Hospitality Transportation and Logistics Government and Public Sector Non-Profit Organizations And many more… Comprehensive Employee and Revenue Size Information Our dataset includes detailed records on company size and revenue, offering you the ability to:

Employee Size: From small businesses with a handful of employees to large multinational corporations, we provide data across all scales. Revenue Size: Analyze companies based on their revenue brackets, allowing for precise market segmentation and targeted marketing efforts. Seamless Integration with Broader Data Offerings Our B2B Marketing Data is not just a standalone product; it integrates seamlessly with our broader suite of premium datasets. This integration enables you to create a holistic and customized approach to your data-driven initiatives, ensuring that every aspect of your business strategy is informed by the most accurate and comprehensive data available.

Elevate Your Business with Data-Driven Precision Optimize your marketing strategies with our high-quality, reliable, and scalable B2B Marketing Data. Identify new opportunities, understand market dynamics, and connect with key decision-makers to drive your business forward. With our dataset, you’ll stay ahead of the competition and foster meaningful business relationships that lead to sustained growth.

Unlock the full potential of your business with our B2B Marketing Data – the ultimate resource for growth, reliability, and scalability.
People Data Labs Company Dataset
datarade.ai
.json, .csv
Updated Oct 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
People Data Labs (2021). People Data Labs Company Dataset [Dataset]. https://datarade.ai/data-products/people-data-labs-company-dataset-people-data-labs
Explore at:
.json, .csvAvailable download formats
Dataset updated
Oct 18, 2021
Dataset provided by
People Data Labs Inc.
Authors
People Data Labs
Area covered
Christmas Island, Romania, Tokelau, Dominican Republic, Martinique, Antarctica, South Sudan, Paraguay, Barbados, Slovenia
Description
People Data Labs is an aggregator of B2B person and company data. We source our globally compliant person dataset via our "Data Union".

The "Data Union" is our proprietary data sharing co-op. Customers opt-in to sharing their data and warrant that their data is fully compliant with global data privacy regulations. Some data sources are provided as a one time dump, others are refreshed every time we do a new data build. Our data sources come from a variety of verticals including HR Tech, Real Estate Tech, Identity/Anti-Fraud, Martech, and others. People Data Labs works with customers on compliance based topics. If a customer wishes to ensure anonymity, we work with them to anonymize the data.

Our company data has identifying information (name, website, social profiles), company attributes (industry, size, founded date), and tags + free text that is useful for segmentation.
Spain Job Offers Scraped Data
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Spain Job Offers Scraped Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/spain-job-offers-scraped-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Spain
Description
Spain Job Offers Scraped Data

Uncovering Qualifications and Requirements

By [source]

About this dataset

This dataset contains valuable web scraping information about job offers located in Spain, and gives details such as the offer name, company, location, and time of offer to potential employers. Having this knowledge is incredibly beneficial for any job seeker looking to target potential employers in Spain, understand the qualifications and requirements needed to be considered for a role and know approximately how long an offer is likely to stay on Linkedin. This dataset can also be extremely useful for recruiters who need a detailed overview of all job offers currently active in the Spanish market in order to filter out relevant vacancies. Lastly, professionals who have an eye on the Spanish job market can especially benefit from this dataset as it provides useful insights that can help optimise their search even more. This dataset consequently makes it easy for users interested in uncovering opportunities within Spain’s labour landscape with access detailed information about current job opportunities at their fingertips

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will help those looking to use this dataset to discover the job market in Spain. The data provided in the dataset can be a great starting point for people who want to optimize their job search and uncover potential opportunities available.

Understand What Is Being Measured:The dataset contains details such as a job offer name, company, and location along with other factors such as time of offer and type of schedule asked. It is important to understand what each column represents before using the data set.

Number of Job Offers Available:This dataset provides an insight on how many job offers are available throughout Spain by showing which areas have a high number of jobs listed and what types of jobs are needed in certain areas or businesses. This information could be used for expanding your career or for searching for specific jobs within different regions in Spain that match your skillset or desired salary range .

Required Qualifications & Skill Set:The type of schedule being asked by businesses is also mentioned, allowing users to understand if certain employers require multiple shifts, weekend work or hours outside the normal 9 - 5 depending on positions needed within companies located throughout the country . Additionally, understanding what skills sets are required not only quality you prioritize when learning new technologies or gaining qualifications but can give you an idea about what other soft skills may be required by businesses like team work , communication etc..

Location Opportunities:This web scraping list allows users to gain access into potential companies located throughout Spain such as Madrid , Barcelona , Valencia etc.. By understanding where business demand exists across different regions one could look at taking up new roles with higher remuneration , specialize more closely in recruitments/searches tailored specifically towards various regions around Spain .

By following this guide, you should now have a robust understanding about how best utilize this dataset obtained from UOC along with an increased knowledge on identifying job opportunities available through webscraping for those seeking work experience/positions across multiple regions within the country

Research Ideas

Analyzing the job market in Spain - Companies offering jobs can be compared and contrasted using this dataset, such as locations of where they are looking to hire, types of schedules they offer, length of job postings, etc. This information can let users to target potential employers instead of wasting time randomly applying for jobs online.

Optimizing a Job Search- Web scraping allows users to quickly gather job postings from all sources on a daily basis and view relevant qualifications and requirements needed for each post in order to better optimize their job search process.

Leveraging data insights – Insights collected by analyzing this web scraping dataset can be used for strategic advantage when creating LinkedIn or recruitment campaigns targeting Spanish markets based on the available applicants’ preferences – such as hours per week or area/position within particular companies typically offered in the datas set available from UOC

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

L...
Crunchbase Datasets
brightdata.com
.json, .csv, .xlsx
Updated Apr 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Crunchbase Datasets [Dataset]. https://brightdata.com/products/datasets/crunchbase
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Apr 10, 2022
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Bright Data’s datasets are created by utilizing proprietary technology for retrieving public web data at scale, resulting in fresh, complete, and accurate datasets. CrunchBase datasets provide unique insights into the latest industry trends. They enable the tracking of company growth, identifying key businesses and professionals, tracking employee movement between companies, as well as enabling more efficient competitive intelligence. Easily define your Crunchbase dataset using our smart filter capabilities, enabling you to customize pre-existing datasets, ensuring the data received fits your business needs. Bright Data’s Crunchbase company data includes over 2.8 million company profiles, with subsets available by industry, region, and any other parameters according to your requirements. There are over 70 data points per company, including overview, details, news, financials, investors, products, people, and more. Choose between full coverage or a subset. Get your Crunchbase dataset Today!
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
ai-chatbox.pro
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
May 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Real-Time Verified Search Fund Data | 200mm US Records | Personal Emails &...
datarade.ai
.csv, .xls
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wiza (2024). Real-Time Verified Search Fund Data | 200mm US Records | Personal Emails & 100mm Mobile Phone Numbers | Live-Sourced Linkedin Data [Dataset]. https://datarade.ai/data-products/wiza-real-time-verified-search-fund-data-200mm-us-records-wiza
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jul 23, 2024
Dataset provided by
Wiza, Inc
Authors
Wiza
Area covered
United States
Description
Stop relying on outdated and inaccurate databases and let Wiza be your source of truth for all deal sourcing and founder / CEO outreach.

Why we're different: The search fund market is dynamic and competitive - Wiza is not a static financial database that gets refreshed on occasion. Every datapoint is sourced and verified the moment that you receive the information. We verify deliverability of every single email ahead of providing the data, and we ensure that each person in your dataset has 100% job title and company accuracy by leveraging Linkedin Data sourced through their live Linkedin profile.

Key Features:

Comprehensive Data Coverage: Stop contacting the same people as everyone else. Wiza's search fund Data is sourced live, not stored in a limited database. When you tell us the type of company or person you would like to contact, we leverage Linkedin Data (the largest, most accurate database in the world) to find everyone who matches your ICP, and then we source the contact data and company data in real-time.

High-Quality, Accurate Data: Wiza ensures accuracy of all datapoints by taking a few key steps that other data providers fail to take: (1) Every email is SMTP verified ahead of delivery, ensuring they will not bounce (2) Every person's Linkedin profile is checked live to ensure we have 100% job title, company, location, etc. accuracy, ahead of providing any data (3) Phone numbers are constantly being verified with AI to ensure accuracy

Linkedin Data: Wiza is able to provide Linkedin Data points, sourced live from each person's Linkedin profile, including Subtitle, Bio, Job Title, Job Description, Skills, Languages, Certifications, Work History, Education, Open to Work, Premium Status, and more!

Personal Data: Wiza has access to industry leading volumes of B2C Contact Data, meaning you can find gmail/yahoo/hotmail email addresses, and mobile phone number data to contact your potential partners.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
R
Company Dataset
universe.roboflow.com
zip
Updated Jun 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universiti Sains Malaysia (2023). Company Dataset [Dataset]. https://universe.roboflow.com/universiti-sains-malaysia-kmhru/company/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Universiti Sains Malaysia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Bounding Boxes
Description
Here are a few use cases for this project:

Security Surveillance: The "Company" model can be used in a security surveillance system where it identifies and logs individuals detected in the footage, helping to maintain safe environments in both public and private settings.

Attendance Management: For office environments or events, the model could be used to manage attendance by recognizing and recording the entrance and exit of individuals.

Retail Analytics: The model could provide valuable insights to retailers about foot traffic, tracking who comes in and out of the store, distinguishing between staff and customer.

Interactive Experiences: In museums or educational facilities, it could be used to create interactive experiences where the system identifies the number of people watching an exhibit and personalizes the content accordingly.

Smart Home Technology: "Company" model can also be used in smart home technologies for recognizing authorized personnel in a given space to automate certain processes like personalized settings, security alerts, etc.
Top Global Companies Innovators & Giants 🌍🏢
kaggle.com
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sheikh Muhammad Abdullah (2024). Top Global Companies Innovators & Giants 🌍🏢 [Dataset]. https://www.kaggle.com/datasets/abdmental01/top-companies
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sheikh Muhammad Abdullah
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Data Description

The dataset provided includes information about various companies, their stock symbols, financial metrics such as price-to-book ratio and share price, as well as details about their origin countries. Additionally, the dataset contains frequency distribution information for certain ranges of price-to-book ratios and share prices.

About Data

The dataset appears to be a compilation of financial data for different companies, likely for investment analysis or comparison purposes. It includes the following key components:

Rank: Rank of the company based on some criteria (not explicitly mentioned).

Company: Name of the company.

Stock Symbol: Symbol used to identify the company's stock in trading.

Price to Book Ratio: Financial metric indicating the relationship between a company's market value and its book value.

Share Price (USD): Price of a single share of the company's stock in US dollars.

Company Origin: Country where the company is based.

Label Count: Frequency distribution information for certain ranges of price-to-book ratios and share prices.

This dataset can be utilized for various financial analyses such as company valuation, comparison of financial metrics across companies, and investment decision-making.
People Data | Authoritative Database
lseg.com
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LSEG (2025). People Data | Authoritative Database [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/company-data/company-profile-information/people-data
Explore at:
csv,python,user interface,xmlAvailable download formats
Dataset updated
Apr 2, 2025
Dataset provided by
London Stock Exchange Grouphttp://www.londonstockexchangegroup.com/
Authors
LSEG
License
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Description
People data provides complete people information and gives the ability to link individual information to organizations and roles.
Global impact of AI and big-data analytics on jobs 2023-2027
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Global impact of AI and big-data analytics on jobs 2023-2027 [Dataset]. https://www.statista.com/statistics/1383919/ai-bigdata-impact-jobs/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2022 - Feb 2023
Area covered
Worldwide
Description
Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.
A stakeholder-centered determination of High-Value Data sets: the use-case...
zenodo.org
data.niaid.nih.gov
txt
Updated Oct 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova; Anastasija Nikiforova (2021). A stakeholder-centered determination of High-Value Data sets: the use-case of Latvia [Dataset]. http://doi.org/10.5281/zenodo.5142817
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5142817
Dataset updated
Oct 27, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anastasija Nikiforova; Anastasija Nikiforova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Latvia
Description
The data in this dataset were collected in the result of the survey of Latvian society (2021) aimed at identifying high-value data set for Latvia, i.e. data sets that, in the view of Latvian society, could create the value for the Latvian economy and society.
The survey is created for both individuals and businesses.
It being made public both to act as supplementary data for "Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia" paper (author: Anastasija Nikiforova, University of Latvia) and in order for other researchers to use these data in their own work.

The survey was distributed among Latvian citizens and organisations. The structure of the survey is available in the supplementary file available (see Survey_HighValueDataSets.odt)

***Description of the data in this data set: structure of the survey and pre-defined answers (if any)***
1. Have you ever used open (government) data? - {(1) yes, once; (2) yes, there has been a little experience; (3) yes, continuously, (4) no, it wasn’t needed for me; (5) no, have tried but has failed}
2. How would you assess the value of open govenment data that are currently available for your personal use or your business? - 5-point Likert scale, where 1 – any to 5 – very high
3. If you ever used the open (government) data, what was the purpose of using them? - {(1) Have not had to use; (2) to identify the situation for an object or ab event (e.g. Covid-19 current state); (3) data-driven decision-making; (4) for the enrichment of my data, i.e. by supplementing them; (5) for better understanding of decisions of the government; (6) awareness of governments’ actions (increasing transparency); (7) forecasting (e.g. trendings etc.); (8) for developing data-driven solutions that use only the open data; (9) for developing data-driven solutions, using open data as a supplement to existing data; (10) for training and education purposes; (11) for entertainment; (12) other (open-ended question)
4. What category(ies) of “high value datasets” is, in you opinion, able to create added value for society or the economy? {(1)Geospatial data; (2) Earth observation and environment; (3) Meteorological; (4) Statistics; (5) Companies and company ownership; (6) Mobility}
5. To what extent do you think the current data catalogue of Latvia’s Open data portal corresponds to the needs of data users/ consumers? - 10-point Likert scale, where 1 – no data are useful, but 10 – fully correspond, i.e. all potentially valuable datasets are available
6. Which of the current data categories in Latvia’s open data portals, in you opinion, most corresponds to the “high value dataset”? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
7. Which of them form your TOP-3? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
8. How would you assess the value of the following data categories?
8.1. sensor data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
8.2. real-time data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
8.3. geospatial data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
9. What would be these datasets? I.e. what (sub)topic could these data be associated with? - open-ended question
10. Which of the data sets currently available could be valauble and useful for society and businesses? - open-ended question
11. Which of the data sets currently NOT available in Latvia’s open data portal could, in your opinion, be valauble and useful for society and businesses? - open-ended question
12. How did you define them? - {(1)Subjective opinion; (2) experience with data; (3) filtering out the most popular datasets, i.e. basing the on public opinion; (4) other (open-ended question)}
13. How high could be the value of these data sets value for you or your business? - 5-point Likert scale, where 1 – not valuable, 5 – highly valuable
14. Do you represent any company/ organization (are you working anywhere)? (if “yes”, please, fill out the survey twice, i.e. as an individual user AND a company representative) - {yes; no; I am an individual data user; other (open-ended)}
15. What industry/ sector does your company/ organization belong to? (if you do not work at the moment, please, choose the last option) - {Information and communication services; Financial and ansurance activities; Accommodation and catering services; Education; Real estate operations; Wholesale and retail trade; repair of motor vehicles and motorcycles; transport and storage; construction; water supply; waste water; waste management and recovery; electricity, gas supple, heating and air conditioning; manufacturing industry; mining and quarrying; agriculture, forestry and fisheries professional, scientific and technical services; operation of administrative and service services; public administration and defence; compulsory social insurance; health and social care; art, entertainment and recreation; activities of households as employers;; CSO/NGO; Iam not a representative of any company
16. To which category does your company/ organization belong to in terms of its size? - {small; medium; large; self-employeed; I am not a representative of any company}
17. What is the age group that you belong to? (if you are an individual user, not a company representative) - {11..15, 16..20, 21..25, 26..30, 31..35, 36..40, 41..45, 46+, “do not want to reveal”}
18. Please, indicate your education or a scientific degree that corresponds most to you? (if you are an individual user, not a company representative) - {master degree; bachelor’s degree; Dr. and/ or PhD; student (bachelor level); student (master level); doctoral candidate; pupil; do not want to reveal these data}

***Format of the file***
.xls, .csv (for the first spreadsheet only), .odt

***Licenses or restrictions***
CC-BY
s
Bridge Fund Small Business Dataset
information.stpaul.gov
hub.arcgis.com
Updated Oct 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saint Paul GIS (2021). Bridge Fund Small Business Dataset [Dataset]. https://information.stpaul.gov/datasets/stpaul::bridge-fund-small-business-dataset/about
Explore at:
Dataset updated
Oct 12, 2021
Dataset authored and provided by
Saint Paul GIS
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
In March 2020, Mayor Carter announced the Saint Paul Bridge Fund to provide emergency relief for families and small businesses most vulnerable to the economic impacts of the COVID-19 pandemic. The program was funded through $3.25 million dollars from the Saint Paul Housing and Redevelopment Authority along with contributions from philanthropic, corporate and individual donors. Through these additional contributions, the fund provided $4.1 million to families and small businesses in Saint Paul.Data previously shared in this space included only the 380 recipients funded through "Phase 1". This dataset includes all three phases that were ultimately rolled out through the Bridge Fund for Small Business program.Nearly 2,000 unique applications applied for a small business grant of $7,50036% were from ACP50 areas (Areas of Concentrated Poverty where 50% or more of the residents are people of color)The applications were reviewed in order of a random number assigned at application close. Of these applications:633 small businesses were awarded a $7,500 grant36% of applications in the city were from ACP50 areas86% of applicants in the city cited they were ordered closed under one of the Governor’s Executive OrdersThis is a dataset of the small businesses that applied for the Bridge Fund and includes:Self-reported survey responsesAward informationGeographic information Additional information about the Saint Paul Bridge Fund may be found at stpaul.gov/bridge-fund.
d
Lobbyist Agent Employers
catalog.data.gov
data.wa.gov
+2more
Updated Jun 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.wa.gov (2025). Lobbyist Agent Employers [Dataset]. https://catalog.data.gov/dataset/lobbyist-agent-employers
Explore at:
Dataset updated
Jun 29, 2025
Dataset provided by
data.wa.gov
Description
This dataset contains information about the agents employed by a lobbying firm and the employers they ultimately lobby for. A lobbyist/firm registers with the PDC, not individual agents (employees) of that firm. The PDC provides this data as a way to see the individuals that lobby for a firm and all the employers of that firm. This does not indicate that a particular agent necessarily lobbied for a particular employer, merely that the agent's firm lobbied for that employer. This dataset is a best-effort by the PDC to provide a complete set of records as described herewith and may contain incomplete or incorrect information. The PDC provides access to the original reports for the purpose of record verification. Descriptions attached to this dataset do not constitute legal definitions; please consult RCW 42.17A and WAC Title 390 for legal definitions and additional information regarding political finance disclosure requirements. CONDITION OF RELEASE: This publication and or referenced documents constitutes a list of individuals prepared by the Washington State Public Disclosure Commission and may not be used for commercial purposes. This list is provided on the condition and with the understanding that the persons receiving it agree to this statutorily imposed limitation on its use. See RCW 42.56.070(9) and AGO 1975 No. 15.
Data (i.e., evidence) about evidence based medicine
figshare.com
search.datacite.org
png
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge H Ramirez (2023). Data (i.e., evidence) about evidence based medicine [Dataset]. http://doi.org/10.6084/m9.figshare.1093997.v24
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1093997.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jorge H Ramirez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Update — December 7, 2014. – Evidence-based medicine (EBM) is not working for many reasons, for example: 1. Incorrect in their foundations (paradox): hierarchical levels of evidence are supported by opinions (i.e., lowest strength of evidence according to EBM) instead of real data collected from different types of study designs (i.e., evidence). http://dx.doi.org/10.6084/m9.figshare.1122534 2. The effect of criminal practices by pharmaceutical companies is only possible because of the complicity of others: healthcare systems, professional associations, governmental and academic institutions. Pharmaceutical companies also corrupt at the personal level, politicians and political parties are on their payroll, medical professionals seduced by different types of gifts in exchange of prescriptions (i.e., bribery) which very likely results in patients not receiving the proper treatment for their disease, many times there is no such thing: healthy persons not needing pharmacological treatments of any kind are constantly misdiagnosed and treated with unnecessary drugs. Some medical professionals are converted in K.O.L. which is only a puppet appearing on stage to spread lies to their peers, a person supposedly trained to improve the well-being of others, now deceits on behalf of pharmaceutical companies. Probably the saddest thing is that many honest doctors are being misled by these lies created by the rules of pharmaceutical marketing instead of scientific, medical, and ethical principles. Interpretation of EBM in this context was not anticipated by their creators. “The main reason we take so many drugs is that drug companies don’t sell drugs, they sell lies about drugs.” ―Peter C. Gøtzsche “doctors and their organisations should recognise that it is unethical to receive money that has been earned in part through crimes that have harmed those people whose interests doctors are expected to take care of. Many crimes would be impossible to carry out if doctors weren’t willing to participate in them.” —Peter C Gøtzsche, The BMJ, 2012, Big pharma often commits corporate crime, and this must be stopped. Pending (Colombia): Health Promoter Entities (In Spanish: EPS ―Empresas Promotoras de Salud).

Misinterpretations New technologies or concepts are difficult to understand in the beginning, it doesn’t matter their simplicity, we need to get used to new tools aimed to improve our professional practice. Probably the best explanation is here in these videos (credits to Antonio Villafaina for sharing these videos with me). English https://www.youtube.com/watch?v=pQHX-SjgQvQ&w=420&h=315 Spanish https://www.youtube.com/watch?v=DApozQBrlhU&w=420&h=315 ----------------------- Hypothesis: hierarchical levels of evidence based medicine are wrong Dear Editor, I have data to support the hypothesis described in the title of this letter. Before rejecting the null hypothesis I would like to ask the following open question:Could you support with data that hierarchical levels of evidence based medicine are correct? (1,2) Additional explanation to this question: – Only respond to this question attaching publicly available raw data.– Be aware that more than a question this is a challenge: I have data (i.e., evidence) which is contrary to classic (i.e., McMaster) or current (i.e., Oxford) hierarchical levels of evidence based medicine. An important part of this data (but not all) is publicly available. References

Ramirez, Jorge H (2014): The EBM challenge. figshare. http://dx.doi.org/10.6084/m9.figshare.1135873

The EBM Challenge Day 1: No Answers. Competing interests: I endorse the principles of open data in human biomedical research Read this letter on The BMJ – August 13, 2014.http://www.bmj.com/content/348/bmj.g3725/rr/762595Re: Greenhalgh T, et al. Evidence based medicine: a movement in crisis? BMJ 2014; 348: g3725. _ Fileset contents Raw data: Excel archive: Raw data, interactive figures, and PubMed search terms. Google Spreadsheet is also available (URL below the article description). Figure 1. Unadjusted (Fig 1A) and adjusted (Fig 1B) PubMed publication trends (01/01/1992 to 30/06/2014). Figure 2. Adjusted PubMed publication trends (07/01/2008 to 29/06/2014) Figure 3. Google search trends: Jan 2004 to Jun 2014 / 1-week periods. Figure 4. PubMed publication trends (1962-2013) systematic reviews and meta-analysis, clinical trials, and observational studies.
Figure 5. Ramirez, Jorge H (2014): Infographics: Unpublished US phase 3 clinical trials (2002-2014) completed before Jan 2011 = 50.8%. figshare.http://dx.doi.org/10.6084/m9.figshare.1121675 Raw data: "13377 studies found for: Completed | Interventional Studies | Phase 3 | received from 01/01/2002 to 01/01/2014 | Worldwide". This database complies with the terms and conditions of ClinicalTrials.gov: http://clinicaltrials.gov/ct2/about-site/terms-conditions Supplementary Figures (S1-S6). PubMed publication delay in the indexation processes does not explain the descending trends in the scientific output of evidence-based medicine. Acknowledgments I would like to acknowledge the following persons for providing valuable concepts in data visualization and infographics:

Maria Fernanda Ramírez. Professor of graphic design. Universidad del Valle. Cali, Colombia.

Lorena Franco. Graphic design student. Universidad del Valle. Cali, Colombia. Related articles by this author (Jorge H. Ramírez)

Ramirez JH. Lack of transparency in clinical trials: a call for action. Colomb Med (Cali) 2013;44(4):243-6. URL: http://www.ncbi.nlm.nih.gov/pubmed/24892242

Ramirez JH. Re: Evidence based medicine is broken (17 June 2014). http://www.bmj.com/node/759181

Ramirez JH. Re: Global rules for global health: why we need an independent, impartial WHO (19 June 2014). http://www.bmj.com/node/759151

Ramirez JH. PubMed publication trends (1992 to 2014): evidence based medicine and clinical practice guidelines (04 July 2014). http://www.bmj.com/content/348/bmj.g3725/rr/759895 Recommended articles

Greenhalgh Trisha, Howick Jeremy,Maskrey Neal. Evidence based medicine: a movement in crisis? BMJ 2014;348:g3725

Spence Des. Evidence based medicine is broken BMJ 2014; 348:g22

Schünemann Holger J, Oxman Andrew D,Brozek Jan, Glasziou Paul, JaeschkeRoman, Vist Gunn E et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies BMJ 2008; 336:1106

Lau Joseph, Ioannidis John P A, TerrinNorma, Schmid Christopher H, OlkinIngram. The case of the misleading funnel plot BMJ 2006; 333:597

Moynihan R, Henry D, Moons KGM (2014) Using Evidence to Combat Overdiagnosis and Overtreatment: Evaluating Treatments, Tests, and Disease Definitions in the Time of Too Much. PLoS Med 11(7): e1001655. doi:10.1371/journal.pmed.1001655

Katz D. A-holistic view of evidence based medicinehttp://thehealthcareblog.com/blog/2014/05/02/a-holistic-view-of-evidence-based-medicine/ ---
o
ML-You-Can-Use Wikidata Employers labeled
opendatabay.com
.undefined
Updated Jun 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). ML-You-Can-Use Wikidata Employers labeled [Dataset]. https://www.opendatabay.com/data/ai-ml/e31ecab8-d78b-4108-89df-7ea2d5d3e09e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Datasimple
Area covered
E-commerce & Online Transactions
Description
Context For Wikidata entries related to people, one of the fields is employer. An employer is typically defined as a company or entity which provides people with work. Many Wikidata entries are accurately described, but a number of entries don't conform to most people's expectations of what's a reasonably valid employer. This dataset is a distinct, labeled subset of the Wikidata employers.

Wikidata is a great resource of free data. However to interact with it meaningfully most people will find it necessary to clean the data.

Clean data means different things to different users, so I have provided metadata, statistics and labels so that individual users can decide which parts of the dataset are acceptable and useful.

Guidelines used for the Labeling Employer: one that employs or makes use of something or somebody (especially): a person or company that provides a job paying wages or a salary to one or more people (m-w.com dictionary definition)

Most commonly, an employer should indicate a company employing people. Used in a sentence, a company could be substituted for the employer name.

I work for Tuttle and Click CPA.

I work for a CPA company. Most commonly, an employer should indicate an entity, or a collective entity, but not a person.

I work for Tom Steyer. - no

I work for the Tom Steyer Charity. - yes Similarly:

oncology - no, that's the field

Oncology Department - yes, that's an employer entity. etc.

Plurals are invalid because they indicate an multitude of entities, instead of a single specific entity:

I work with Universities. - no

I worked at Berlin University - yes, one specific entity Some flexibility in labeling is desirable; it's important to screen out the extreme bad cases. Hence we provide metadata where possible.

For more details on how some data was labeled manually, how BERT embeddings were used to build a classifier, and how Cleanlab was used to detect problematic labels, please visit the ML-You-Can-Use notebooks to learn more about our label provenance.

Content The data comes from a dump of Wikidata (2/2/2020). It uses the English labels and descriptions of the Wikidata item codes (courtesy of the Kensho dataset).

item_id - The Wikidata item_id (QCode without the Q prefix) employer_count - the Wikidata item count employer - the en_label (Kensho) description - the en_description (Kensho) Additional Metadata Provided:

in_google_news - 0 no, 1 yes: does the occupation exists in the GoogleNews embedding language_detected - 3 digit language code, using FastText language detection source - Wikidata, Wikipedia, manual label - 0 invalid employer, 1 valid employer labeled_by - human, classifier_gnew, classifier_bert, cleanlab label_error_reason - domain, plural Acknowledgements Wikimedia Foundation Kensho Derived Wikimedia Dataset GoogleNews Word Embeddings FastText Language detection ML-You-Can-Use data provenance notebooks Inspiration This dataset can be useful for solving some interesting problems:

Detecting new trends in employers and occupations, and employment nomenclature

Automatic error correction of employers

Converting plurals to singulars Training an NER model

Training a Question/Answer model

Improving the FastText language detection model

Assessing FastText accuracy with limited data

License

CC BY-SA

Original Data Source: ML-You-Can-Use Wikidata Employers labeled
A
‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hr-analytics-job-change-of-data-scientists-db67/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘HR Analytics: Job Change of Data Scientists’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context and Content

A company which is active in Big Data and Data Science wants to hire data scientists among people who successfully pass some courses which conduct by the company. Many people signup for their training. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Information related to demographics, education, experience are in hands from candidates signup and enrollment.

This dataset designed to understand the factors that lead a person to leave current job for HR researches too. By model(s) that uses the current credentials,demographics,experience data you will predict the probability of a candidate to look for a new job or will work for the company, as well as interpreting affected factors on employee decision.

The whole data divided to train and test . Target isn't included in test but the test target values data file is in hands for related tasks. A sample submission correspond to enrollee_id of test set provided too with columns : enrollee _id , target

Note: - The dataset is imbalanced. - Most features are categorical (Nominal, Ordinal, Binary), some with high cardinality. - Missing imputation can be a part of your pipeline as well.

# Features #
- enrollee_id : Unique ID for candidate

city: City code

city_ development _index : Developement index of the city (scaled)

gender: Gender of candidate

relevent_experience: Relevant experience of candidate

enrolled_university: Type of University course enrolled if any

education_level: Education level of candidate

major_discipline :Education major discipline of candidate

experience: Candidate total experience in years

company_size: No of employees in current employer's company

company_type : Type of current employer

last_new_job: Difference in years between previous job and current job

training_hours: training hours completed

target: 0 – Not looking for job change, 1 – Looking for a job change

Inspiration

Predict the probability of a candidate will work for the company

Interpret model(s) such a way that illustrate which features affect candidate decision # Please refer to the following task for more details: https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists/tasks?taskId=3015

--- Original source retains full ownership of the source dataset ---
Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad...
data.gov.tt
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.tt (2023). Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad and Tobago Open Data Platform [Dataset]. https://data.gov.tt/dataset/retail-sales-index
Explore at:
Dataset updated
Sep 28, 2023
Dataset provided by
Data.govhttps://data.gov/
Area covered
Trinidad and Tobago
Description
The Retail Sales Index (RSI) is like a health check-up for the shopping world, done every three (3) months. Imagine visiting many different stores, from big to small, and noting how much they are selling. That is what the RSI does. It adds up the sales from these stores to get a feel for how well retail businesses are doing. This index helps us understand if people spend more or less at shops, which is a big deal for the economy. Think of it as a way to gauge our shopping habits. Plus, by comparing it with the Retail Price Index (RPI), which tracks price changes, we can see how much we are spending but how much stuff we are actually buying, considering price changes.
m
Chapter 12: Data Preparation for Fraud Analytics: Project: Human Recourses...
data.mendeley.com
Updated Nov 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ABDELRAHIM AQQAD (2023). Chapter 12: Data Preparation for Fraud Analytics: Project: Human Recourses Analysis - Human_Resources.csv [Dataset]. http://doi.org/10.17632/smypp8574h.1
Explore at:
Unique identifier
https://doi.org/10.17632/smypp8574h.1
Dataset updated
Nov 1, 2023
Authors
ABDELRAHIM AQQAD
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Project: Human Recourses Analysis - Human_Resources.csv

Description:

The dataset, named "Human_Resources.csv", is a comprehensive collection of employee records from a fictional company. Each row represents an individual employee, and the columns represent various features associated with that employee.

The dataset is rich, highlighting features like 'Age', 'MonthlyIncome', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department', 'EducationField', 'JobSatisfaction', and many more. The main focus is the 'Attrition' variable, which indicates whether an employee left the company or not.

Employee data were sourced from various departments, encompassing a diverse array of job roles and levels. Each employee's record provides an in-depth look into their background, job specifics, and satisfaction levels.

The dataset further includes specific indicators and parameters that were considered during employee performance assessments, offering a granular look into the complexities of each employee's experience.

For privacy reasons, certain personal details and specific identifiers have been anonymized or fictionalized. Instead of names or direct identifiers, each entry is associated with a unique 'EmployeeNumber', ensuring data privacy while retaining data integrity.

The employee records were subjected to rigorous examination, encompassing both manual assessments and automated checks. The end result of this examination, specifically whether an employee left the company or not, is clearly indicated for each record.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset

Customer Shopping Trends Dataset

Journey into Consumer Insights and Retail Evolution with Synthetic Data

Explore at:

32 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 5, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sourav Banerjee

Description

Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

Customer ID - Unique identifier for each customer
Age - Age of the customer
Gender - Gender of the customer (Male/Female)
Item Purchased - The item purchased by the customer
Category - Category of the item purchased
Purchase Amount (USD) - The amount of the purchase in USD
Location - Location where the purchase was made
Size - Size of the purchased item
Color - Color of the purchased item
Season - Season during which the purchase was made
Review Rating - Rating given by the customer for the purchased item
Subscription Status - Indicates if the customer has a subscription (Yes/No)
Shipping Type - Type of shipping chosen by the customer
Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
Payment Method - Customer's most preferred payment method
Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon

Clear search

Close search

Google apps

Main menu

Customer Shopping Trends Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

B2B Marketing Data | B2B Leads Data | 181M+ Records | Decision Makers,...

People Data Labs Company Dataset

Spain Job Offers Scraped Data

Spain Job Offers Scraped Data

Uncovering Qualifications and Requirements

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

L...

Crunchbase Datasets

Number of data compromises and impacted individuals in U.S. 2005-2024

Real-Time Verified Search Fund Data | 200mm US Records | Personal Emails &...

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Company Dataset

Top Global Companies Innovators & Giants 🌍🏢

Data Description

About Data

People Data | Authoritative Database

Global impact of AI and big-data analytics on jobs 2023-2027

A stakeholder-centered determination of High-Value Data sets: the use-case...

Bridge Fund Small Business Dataset

Lobbyist Agent Employers

Data (i.e., evidence) about evidence based medicine

ML-You-Can-Use Wikidata Employers labeled

License

‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2

Context and Content

Inspiration

Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad...

Chapter 12: Data Preparation for Fraud Analytics: Project: Human Recourses...

Customer Shopping Trends DatasetSee More Versions

Journey into Consumer Insights and Retail Evolution with Synthetic Data

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

Customer Shopping Trends Dataset