72 datasets found
  1. Customer Shopping Trends Dataset

    • kaggle.com
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sourav Banerjee
    Description

    Context

    The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

    Content

    This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

    Dataset Glossary (Column-wise)

    • Customer ID - Unique identifier for each customer
    • Age - Age of the customer
    • Gender - Gender of the customer (Male/Female)
    • Item Purchased - The item purchased by the customer
    • Category - Category of the item purchased
    • Purchase Amount (USD) - The amount of the purchase in USD
    • Location - Location where the purchase was made
    • Size - Size of the purchased item
    • Color - Color of the purchased item
    • Season - Season during which the purchase was made
    • Review Rating - Rating given by the customer for the purchased item
    • Subscription Status - Indicates if the customer has a subscription (Yes/No)
    • Shipping Type - Type of shipping chosen by the customer
    • Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
    • Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
    • Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
    • Payment Method - Customer's most preferred payment method
    • Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

    Structure of the Dataset

    https://i.imgur.com/6UEqejq.png" alt="">

    Acknowledgement

    This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

    Cover Photo by: Freepik

    Thumbnail by: Clothing icons created by Flat Icons - Flaticon

  2. d

    B2B Marketing Data | B2B Leads Data | 181M+ Records | Decision Makers,...

    • datarade.ai
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Exellius Systems (2023). B2B Marketing Data | B2B Leads Data | 181M+ Records | Decision Makers, Executives, CEO, MD | 20+ Attributes, Direct E-mail & Phone [Dataset]. https://datarade.ai/data-products/exellius-systems-decision-makers-executives-b2b-contact-data-exellius-systems
    Explore at:
    .xml, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 27, 2023
    Dataset authored and provided by
    Exellius Systems
    Area covered
    Togo, Yemen, Somalia, Papua New Guinea, Antarctica, Albania, Kiribati, Ghana, State of, Bangladesh
    Description

    Transform Your Business with Our Comprehensive B2B Marketing Data Our B2B Marketing Data is designed to be a cornerstone for data-driven professionals looking to optimize their business strategies. With an unwavering commitment to data integrity and quality, our dataset empowers you to make informed decisions, enhance your outreach efforts, and drive business growth.

    Why Choose Our B2B Marketing Data? Unmatched Data Integrity and Quality Our data is meticulously sourced and validated through rigorous processes to ensure its accuracy, relevance, and reliability. This commitment to excellence guarantees that you are equipped with the most up-to-date information, empowering your business to thrive in a competitive landscape.

    Versatile and Strategic Applications This versatile dataset caters to a wide range of business needs, including:

    Lead Generation: Identify and connect with potential clients who align with your business goals. Market Segmentation: Tailor your marketing efforts by segmenting your audience based on industry, company size, or geographical location. Personalized Marketing Campaigns: Craft personalized outreach strategies that resonate with your target audience, increasing engagement and conversion rates. B2B Communication Strategies: Enhance your communication efforts with direct access to decision-makers, ensuring your message reaches the right people. Comprehensive Data Attributes Our B2B Marketing Data offers more than just basic contact information. With over 20+ attributes, you gain in-depth insights into:

    Decision-Maker Roles: Understand the responsibilities and influence of key figures within an organization, such as CEOs, executives, and other senior management. Industry Affiliations: Analyze industry-specific data to tailor your approach to the unique dynamics of each sector. Contact Information: Direct email addresses and phone numbers streamline communication, enabling you to engage with your audience effectively and efficiently. Expansive Global Coverage Our dataset spans a wide array of countries, providing a truly global perspective for your business initiatives. Whether you're looking to expand into new markets or strengthen your presence in existing ones, our data ensures comprehensive coverage across the following regions:

    North America: United States, Canada, Mexico Europe: United Kingdom, Germany, France, Italy, Spain, Netherlands, Sweden, and more Asia: China, Japan, India, South Korea, Singapore, Malaysia, and more South America: Brazil, Argentina, Chile, Colombia, and more Africa: South Africa, Nigeria, Kenya, Egypt, and more Australia and Oceania: Australia, New Zealand Middle East: United Arab Emirates, Saudi Arabia, Israel, Qatar, and more Industry-Wide Reach Our B2B Marketing Data covers an extensive range of industries, ensuring that no matter your focus, you have access to the insights you need:

    Finance and Banking Technology Healthcare Manufacturing Retail Education Energy Real Estate Telecommunications Hospitality Transportation and Logistics Government and Public Sector Non-Profit Organizations And many more… Comprehensive Employee and Revenue Size Information Our dataset includes detailed records on company size and revenue, offering you the ability to:

    Employee Size: From small businesses with a handful of employees to large multinational corporations, we provide data across all scales. Revenue Size: Analyze companies based on their revenue brackets, allowing for precise market segmentation and targeted marketing efforts. Seamless Integration with Broader Data Offerings Our B2B Marketing Data is not just a standalone product; it integrates seamlessly with our broader suite of premium datasets. This integration enables you to create a holistic and customized approach to your data-driven initiatives, ensuring that every aspect of your business strategy is informed by the most accurate and comprehensive data available.

    Elevate Your Business with Data-Driven Precision Optimize your marketing strategies with our high-quality, reliable, and scalable B2B Marketing Data. Identify new opportunities, understand market dynamics, and connect with key decision-makers to drive your business forward. With our dataset, you’ll stay ahead of the competition and foster meaningful business relationships that lead to sustained growth.

    Unlock the full potential of your business with our B2B Marketing Data – the ultimate resource for growth, reliability, and scalability.

  3. People Data Labs Company Dataset

    • datarade.ai
    .json, .csv
    Updated Oct 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    People Data Labs (2021). People Data Labs Company Dataset [Dataset]. https://datarade.ai/data-products/people-data-labs-company-dataset-people-data-labs
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Oct 18, 2021
    Dataset provided by
    People Data Labs Inc.
    Authors
    People Data Labs
    Area covered
    Christmas Island, Romania, Tokelau, Dominican Republic, Martinique, Antarctica, South Sudan, Paraguay, Barbados, Slovenia
    Description

    People Data Labs is an aggregator of B2B person and company data. We source our globally compliant person dataset via our "Data Union".

    The "Data Union" is our proprietary data sharing co-op. Customers opt-in to sharing their data and warrant that their data is fully compliant with global data privacy regulations. Some data sources are provided as a one time dump, others are refreshed every time we do a new data build. Our data sources come from a variety of verticals including HR Tech, Real Estate Tech, Identity/Anti-Fraud, Martech, and others. People Data Labs works with customers on compliance based topics. If a customer wishes to ensure anonymity, we work with them to anonymize the data.

    Our company data has identifying information (name, website, social profiles), company attributes (industry, size, founded date), and tags + free text that is useful for segmentation.

  4. Spain Job Offers Scraped Data

    • kaggle.com
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Spain Job Offers Scraped Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/spain-job-offers-scraped-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Spain
    Description

    Spain Job Offers Scraped Data

    Uncovering Qualifications and Requirements

    By [source]

    About this dataset

    This dataset contains valuable web scraping information about job offers located in Spain, and gives details such as the offer name, company, location, and time of offer to potential employers. Having this knowledge is incredibly beneficial for any job seeker looking to target potential employers in Spain, understand the qualifications and requirements needed to be considered for a role and know approximately how long an offer is likely to stay on Linkedin. This dataset can also be extremely useful for recruiters who need a detailed overview of all job offers currently active in the Spanish market in order to filter out relevant vacancies. Lastly, professionals who have an eye on the Spanish job market can especially benefit from this dataset as it provides useful insights that can help optimise their search even more. This dataset consequently makes it easy for users interested in uncovering opportunities within Spain’s labour landscape with access detailed information about current job opportunities at their fingertips

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide will help those looking to use this dataset to discover the job market in Spain. The data provided in the dataset can be a great starting point for people who want to optimize their job search and uncover potential opportunities available.

    • Understand What Is Being Measured:The dataset contains details such as a job offer name, company, and location along with other factors such as time of offer and type of schedule asked. It is important to understand what each column represents before using the data set.
    • Number of Job Offers Available:This dataset provides an insight on how many job offers are available throughout Spain by showing which areas have a high number of jobs listed and what types of jobs are needed in certain areas or businesses. This information could be used for expanding your career or for searching for specific jobs within different regions in Spain that match your skillset or desired salary range .
    • Required Qualifications & Skill Set:The type of schedule being asked by businesses is also mentioned, allowing users to understand if certain employers require multiple shifts, weekend work or hours outside the normal 9 - 5 depending on positions needed within companies located throughout the country . Additionally, understanding what skills sets are required not only quality you prioritize when learning new technologies or gaining qualifications but can give you an idea about what other soft skills may be required by businesses like team work , communication etc..
    • Location Opportunities:This web scraping list allows users to gain access into potential companies located throughout Spain such as Madrid , Barcelona , Valencia etc.. By understanding where business demand exists across different regions one could look at taking up new roles with higher remuneration , specialize more closely in recruitments/searches tailored specifically towards various regions around Spain .

    By following this guide, you should now have a robust understanding about how best utilize this dataset obtained from UOC along with an increased knowledge on identifying job opportunities available through webscraping for those seeking work experience/positions across multiple regions within the country

    Research Ideas

    • Analyzing the job market in Spain - Companies offering jobs can be compared and contrasted using this dataset, such as locations of where they are looking to hire, types of schedules they offer, length of job postings, etc. This information can let users to target potential employers instead of wasting time randomly applying for jobs online.
    • Optimizing a Job Search- Web scraping allows users to quickly gather job postings from all sources on a daily basis and view relevant qualifications and requirements needed for each post in order to better optimize their job search process.
    • Leveraging data insights – Insights collected by analyzing this web scraping dataset can be used for strategic advantage when creating LinkedIn or recruitment campaigns targeting Spanish markets based on the available applicants’ preferences – such as hours per week or area/position within particular companies typically offered in the datas set available from UOC

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    L...

  5. Crunchbase Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Crunchbase Datasets [Dataset]. https://brightdata.com/products/datasets/crunchbase
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 10, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Bright Data’s datasets are created by utilizing proprietary technology for retrieving public web data at scale, resulting in fresh, complete, and accurate datasets. CrunchBase datasets provide unique insights into the latest industry trends. They enable the tracking of company growth, identifying key businesses and professionals, tracking employee movement between companies, as well as enabling more efficient competitive intelligence. Easily define your Crunchbase dataset using our smart filter capabilities, enabling you to customize pre-existing datasets, ensuring the data received fits your business needs. Bright Data’s Crunchbase company data includes over 2.8 million company profiles, with subsets available by industry, region, and any other parameters according to your requirements. There are over 70 data points per company, including overview, details, news, financials, investors, products, people, and more. Choose between full coverage or a subset. Get your Crunchbase dataset Today!

  6. Number of data compromises and impacted individuals in U.S. 2005-2024

    • statista.com
    • ai-chatbox.pro
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

  7. Real-Time Verified Search Fund Data | 200mm US Records | Personal Emails &...

    • datarade.ai
    .csv, .xls
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wiza (2024). Real-Time Verified Search Fund Data | 200mm US Records | Personal Emails & 100mm Mobile Phone Numbers | Live-Sourced Linkedin Data [Dataset]. https://datarade.ai/data-products/wiza-real-time-verified-search-fund-data-200mm-us-records-wiza
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Wiza, Inc
    Authors
    Wiza
    Area covered
    United States
    Description

    Stop relying on outdated and inaccurate databases and let Wiza be your source of truth for all deal sourcing and founder / CEO outreach.

    Why we're different: The search fund market is dynamic and competitive - Wiza is not a static financial database that gets refreshed on occasion. Every datapoint is sourced and verified the moment that you receive the information. We verify deliverability of every single email ahead of providing the data, and we ensure that each person in your dataset has 100% job title and company accuracy by leveraging Linkedin Data sourced through their live Linkedin profile.

    Key Features:

    Comprehensive Data Coverage: Stop contacting the same people as everyone else. Wiza's search fund Data is sourced live, not stored in a limited database. When you tell us the type of company or person you would like to contact, we leverage Linkedin Data (the largest, most accurate database in the world) to find everyone who matches your ICP, and then we source the contact data and company data in real-time.

    High-Quality, Accurate Data: Wiza ensures accuracy of all datapoints by taking a few key steps that other data providers fail to take: (1) Every email is SMTP verified ahead of delivery, ensuring they will not bounce (2) Every person's Linkedin profile is checked live to ensure we have 100% job title, company, location, etc. accuracy, ahead of providing any data (3) Phone numbers are constantly being verified with AI to ensure accuracy

    Linkedin Data: Wiza is able to provide Linkedin Data points, sourced live from each person's Linkedin profile, including Subtitle, Bio, Job Title, Job Description, Skills, Languages, Certifications, Work History, Education, Open to Work, Premium Status, and more!

    Personal Data: Wiza has access to industry leading volumes of B2C Contact Data, meaning you can find gmail/yahoo/hotmail email addresses, and mobile phone number data to contact your potential partners.

  8. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

  9. R

    Company Dataset

    • universe.roboflow.com
    zip
    Updated Jun 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Universiti Sains Malaysia (2023). Company Dataset [Dataset]. https://universe.roboflow.com/universiti-sains-malaysia-kmhru/company/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 27, 2023
    Dataset authored and provided by
    Universiti Sains Malaysia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Person Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Security Surveillance: The "Company" model can be used in a security surveillance system where it identifies and logs individuals detected in the footage, helping to maintain safe environments in both public and private settings.

    2. Attendance Management: For office environments or events, the model could be used to manage attendance by recognizing and recording the entrance and exit of individuals.

    3. Retail Analytics: The model could provide valuable insights to retailers about foot traffic, tracking who comes in and out of the store, distinguishing between staff and customer.

    4. Interactive Experiences: In museums or educational facilities, it could be used to create interactive experiences where the system identifies the number of people watching an exhibit and personalizes the content accordingly.

    5. Smart Home Technology: "Company" model can also be used in smart home technologies for recognizing authorized personnel in a given space to automate certain processes like personalized settings, security alerts, etc.

  10. Top Global Companies Innovators & Giants 🌍🏢

    • kaggle.com
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheikh Muhammad Abdullah (2024). Top Global Companies Innovators & Giants 🌍🏢 [Dataset]. https://www.kaggle.com/datasets/abdmental01/top-companies
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sheikh Muhammad Abdullah
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Data Description

    The dataset provided includes information about various companies, their stock symbols, financial metrics such as price-to-book ratio and share price, as well as details about their origin countries. Additionally, the dataset contains frequency distribution information for certain ranges of price-to-book ratios and share prices.

    About Data

    The dataset appears to be a compilation of financial data for different companies, likely for investment analysis or comparison purposes. It includes the following key components:

    • Rank: Rank of the company based on some criteria (not explicitly mentioned).
    • Company: Name of the company.
    • Stock Symbol: Symbol used to identify the company's stock in trading.
    • Price to Book Ratio: Financial metric indicating the relationship between a company's market value and its book value.
    • Share Price (USD): Price of a single share of the company's stock in US dollars.
    • Company Origin: Country where the company is based.
    • Label Count: Frequency distribution information for certain ranges of price-to-book ratios and share prices.

    This dataset can be utilized for various financial analyses such as company valuation, comparison of financial metrics across companies, and investment decision-making.

  11. People Data | Authoritative Database

    • lseg.com
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LSEG (2025). People Data | Authoritative Database [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/company-data/company-profile-information/people-data
    Explore at:
    csv,python,user interface,xmlAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    London Stock Exchange Grouphttp://www.londonstockexchangegroup.com/
    Authors
    LSEG
    License

    https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer

    Description

    People data provides complete people information and gives the ability to link individual information to organizations and roles.

  12. Global impact of AI and big-data analytics on jobs 2023-2027

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Global impact of AI and big-data analytics on jobs 2023-2027 [Dataset]. https://www.statista.com/statistics/1383919/ai-bigdata-impact-jobs/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2022 - Feb 2023
    Area covered
    Worldwide
    Description

    Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.

  13. A stakeholder-centered determination of High-Value Data sets: the use-case...

    • zenodo.org
    • data.niaid.nih.gov
    txt
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija Nikiforova; Anastasija Nikiforova (2021). A stakeholder-centered determination of High-Value Data sets: the use-case of Latvia [Dataset]. http://doi.org/10.5281/zenodo.5142817
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anastasija Nikiforova; Anastasija Nikiforova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Latvia
    Description

    The data in this dataset were collected in the result of the survey of Latvian society (2021) aimed at identifying high-value data set for Latvia, i.e. data sets that, in the view of Latvian society, could create the value for the Latvian economy and society.
    The survey is created for both individuals and businesses.
    It being made public both to act as supplementary data for "Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia" paper (author: Anastasija Nikiforova, University of Latvia) and in order for other researchers to use these data in their own work.

    The survey was distributed among Latvian citizens and organisations. The structure of the survey is available in the supplementary file available (see Survey_HighValueDataSets.odt)

    ***Description of the data in this data set: structure of the survey and pre-defined answers (if any)***
    1. Have you ever used open (government) data? - {(1) yes, once; (2) yes, there has been a little experience; (3) yes, continuously, (4) no, it wasn’t needed for me; (5) no, have tried but has failed}
    2. How would you assess the value of open govenment data that are currently available for your personal use or your business? - 5-point Likert scale, where 1 – any to 5 – very high
    3. If you ever used the open (government) data, what was the purpose of using them? - {(1) Have not had to use; (2) to identify the situation for an object or ab event (e.g. Covid-19 current state); (3) data-driven decision-making; (4) for the enrichment of my data, i.e. by supplementing them; (5) for better understanding of decisions of the government; (6) awareness of governments’ actions (increasing transparency); (7) forecasting (e.g. trendings etc.); (8) for developing data-driven solutions that use only the open data; (9) for developing data-driven solutions, using open data as a supplement to existing data; (10) for training and education purposes; (11) for entertainment; (12) other (open-ended question)
    4. What category(ies) of “high value datasets” is, in you opinion, able to create added value for society or the economy? {(1)Geospatial data; (2) Earth observation and environment; (3) Meteorological; (4) Statistics; (5) Companies and company ownership; (6) Mobility}
    5. To what extent do you think the current data catalogue of Latvia’s Open data portal corresponds to the needs of data users/ consumers? - 10-point Likert scale, where 1 – no data are useful, but 10 – fully correspond, i.e. all potentially valuable datasets are available
    6. Which of the current data categories in Latvia’s open data portals, in you opinion, most corresponds to the “high value dataset”? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
    7. Which of them form your TOP-3? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
    8. How would you assess the value of the following data categories?
    8.1. sensor data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
    8.2. real-time data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
    8.3. geospatial data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
    9. What would be these datasets? I.e. what (sub)topic could these data be associated with? - open-ended question
    10. Which of the data sets currently available could be valauble and useful for society and businesses? - open-ended question
    11. Which of the data sets currently NOT available in Latvia’s open data portal could, in your opinion, be valauble and useful for society and businesses? - open-ended question
    12. How did you define them? - {(1)Subjective opinion; (2) experience with data; (3) filtering out the most popular datasets, i.e. basing the on public opinion; (4) other (open-ended question)}
    13. How high could be the value of these data sets value for you or your business? - 5-point Likert scale, where 1 – not valuable, 5 – highly valuable
    14. Do you represent any company/ organization (are you working anywhere)? (if “yes”, please, fill out the survey twice, i.e. as an individual user AND a company representative) - {yes; no; I am an individual data user; other (open-ended)}
    15. What industry/ sector does your company/ organization belong to? (if you do not work at the moment, please, choose the last option) - {Information and communication services; Financial and ansurance activities; Accommodation and catering services; Education; Real estate operations; Wholesale and retail trade; repair of motor vehicles and motorcycles; transport and storage; construction; water supply; waste water; waste management and recovery; electricity, gas supple, heating and air conditioning; manufacturing industry; mining and quarrying; agriculture, forestry and fisheries professional, scientific and technical services; operation of administrative and service services; public administration and defence; compulsory social insurance; health and social care; art, entertainment and recreation; activities of households as employers;; CSO/NGO; Iam not a representative of any company
    16. To which category does your company/ organization belong to in terms of its size? - {small; medium; large; self-employeed; I am not a representative of any company}
    17. What is the age group that you belong to? (if you are an individual user, not a company representative) - {11..15, 16..20, 21..25, 26..30, 31..35, 36..40, 41..45, 46+, “do not want to reveal”}
    18. Please, indicate your education or a scientific degree that corresponds most to you? (if you are an individual user, not a company representative) - {master degree; bachelor’s degree; Dr. and/ or PhD; student (bachelor level); student (master level); doctoral candidate; pupil; do not want to reveal these data}

    ***Format of the file***
    .xls, .csv (for the first spreadsheet only), .odt

    ***Licenses or restrictions***
    CC-BY

  14. s

    Bridge Fund Small Business Dataset

    • information.stpaul.gov
    • hub.arcgis.com
    Updated Oct 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saint Paul GIS (2021). Bridge Fund Small Business Dataset [Dataset]. https://information.stpaul.gov/datasets/stpaul::bridge-fund-small-business-dataset/about
    Explore at:
    Dataset updated
    Oct 12, 2021
    Dataset authored and provided by
    Saint Paul GIS
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In March 2020, Mayor Carter announced the Saint Paul Bridge Fund to provide emergency relief for families and small businesses most vulnerable to the economic impacts of the COVID-19 pandemic. The program was funded through $3.25 million dollars from the Saint Paul Housing and Redevelopment Authority along with contributions from philanthropic, corporate and individual donors. Through these additional contributions, the fund provided $4.1 million to families and small businesses in Saint Paul.Data previously shared in this space included only the 380 recipients funded through "Phase 1". This dataset includes all three phases that were ultimately rolled out through the Bridge Fund for Small Business program.Nearly 2,000 unique applications applied for a small business grant of $7,50036% were from ACP50 areas (Areas of Concentrated Poverty where 50% or more of the residents are people of color)The applications were reviewed in order of a random number assigned at application close. Of these applications:633 small businesses were awarded a $7,500 grant36% of applications in the city were from ACP50 areas86% of applicants in the city cited they were ordered closed under one of the Governor’s Executive OrdersThis is a dataset of the small businesses that applied for the Bridge Fund and includes:Self-reported survey responsesAward informationGeographic information Additional information about the Saint Paul Bridge Fund may be found at stpaul.gov/bridge-fund.

  15. d

    Lobbyist Agent Employers

    • catalog.data.gov
    • data.wa.gov
    • +2more
    Updated Jun 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.wa.gov (2025). Lobbyist Agent Employers [Dataset]. https://catalog.data.gov/dataset/lobbyist-agent-employers
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    data.wa.gov
    Description

    This dataset contains information about the agents employed by a lobbying firm and the employers they ultimately lobby for. A lobbyist/firm registers with the PDC, not individual agents (employees) of that firm. The PDC provides this data as a way to see the individuals that lobby for a firm and all the employers of that firm. This does not indicate that a particular agent necessarily lobbied for a particular employer, merely that the agent's firm lobbied for that employer. This dataset is a best-effort by the PDC to provide a complete set of records as described herewith and may contain incomplete or incorrect information. The PDC provides access to the original reports for the purpose of record verification. Descriptions attached to this dataset do not constitute legal definitions; please consult RCW 42.17A and WAC Title 390 for legal definitions and additional information regarding political finance disclosure requirements. CONDITION OF RELEASE: This publication and or referenced documents constitutes a list of individuals prepared by the Washington State Public Disclosure Commission and may not be used for commercial purposes. This list is provided on the condition and with the understanding that the persons receiving it agree to this statutorily imposed limitation on its use. See RCW 42.56.070(9) and AGO 1975 No. 15.

  16. Data (i.e., evidence) about evidence based medicine

    • figshare.com
    • search.datacite.org
    png
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge H Ramirez (2023). Data (i.e., evidence) about evidence based medicine [Dataset]. http://doi.org/10.6084/m9.figshare.1093997.v24
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jorge H Ramirez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Update — December 7, 2014. – Evidence-based medicine (EBM) is not working for many reasons, for example: 1. Incorrect in their foundations (paradox): hierarchical levels of evidence are supported by opinions (i.e., lowest strength of evidence according to EBM) instead of real data collected from different types of study designs (i.e., evidence). http://dx.doi.org/10.6084/m9.figshare.1122534 2. The effect of criminal practices by pharmaceutical companies is only possible because of the complicity of others: healthcare systems, professional associations, governmental and academic institutions. Pharmaceutical companies also corrupt at the personal level, politicians and political parties are on their payroll, medical professionals seduced by different types of gifts in exchange of prescriptions (i.e., bribery) which very likely results in patients not receiving the proper treatment for their disease, many times there is no such thing: healthy persons not needing pharmacological treatments of any kind are constantly misdiagnosed and treated with unnecessary drugs. Some medical professionals are converted in K.O.L. which is only a puppet appearing on stage to spread lies to their peers, a person supposedly trained to improve the well-being of others, now deceits on behalf of pharmaceutical companies. Probably the saddest thing is that many honest doctors are being misled by these lies created by the rules of pharmaceutical marketing instead of scientific, medical, and ethical principles. Interpretation of EBM in this context was not anticipated by their creators. “The main reason we take so many drugs is that drug companies don’t sell drugs, they sell lies about drugs.” ―Peter C. Gøtzsche “doctors and their organisations should recognise that it is unethical to receive money that has been earned in part through crimes that have harmed those people whose interests doctors are expected to take care of. Many crimes would be impossible to carry out if doctors weren’t willing to participate in them.” —Peter C Gøtzsche, The BMJ, 2012, Big pharma often commits corporate crime, and this must be stopped. Pending (Colombia): Health Promoter Entities (In Spanish: EPS ―Empresas Promotoras de Salud).

    1. Misinterpretations New technologies or concepts are difficult to understand in the beginning, it doesn’t matter their simplicity, we need to get used to new tools aimed to improve our professional practice. Probably the best explanation is here in these videos (credits to Antonio Villafaina for sharing these videos with me). English https://www.youtube.com/watch?v=pQHX-SjgQvQ&w=420&h=315 Spanish https://www.youtube.com/watch?v=DApozQBrlhU&w=420&h=315 ----------------------- Hypothesis: hierarchical levels of evidence based medicine are wrong Dear Editor, I have data to support the hypothesis described in the title of this letter. Before rejecting the null hypothesis I would like to ask the following open question:Could you support with data that hierarchical levels of evidence based medicine are correct? (1,2) Additional explanation to this question: – Only respond to this question attaching publicly available raw data.– Be aware that more than a question this is a challenge: I have data (i.e., evidence) which is contrary to classic (i.e., McMaster) or current (i.e., Oxford) hierarchical levels of evidence based medicine. An important part of this data (but not all) is publicly available. References
    2. Ramirez, Jorge H (2014): The EBM challenge. figshare. http://dx.doi.org/10.6084/m9.figshare.1135873
    3. The EBM Challenge Day 1: No Answers. Competing interests: I endorse the principles of open data in human biomedical research Read this letter on The BMJ – August 13, 2014.http://www.bmj.com/content/348/bmj.g3725/rr/762595Re: Greenhalgh T, et al. Evidence based medicine: a movement in crisis? BMJ 2014; 348: g3725. _ Fileset contents Raw data: Excel archive: Raw data, interactive figures, and PubMed search terms. Google Spreadsheet is also available (URL below the article description). Figure 1. Unadjusted (Fig 1A) and adjusted (Fig 1B) PubMed publication trends (01/01/1992 to 30/06/2014). Figure 2. Adjusted PubMed publication trends (07/01/2008 to 29/06/2014) Figure 3. Google search trends: Jan 2004 to Jun 2014 / 1-week periods. Figure 4. PubMed publication trends (1962-2013) systematic reviews and meta-analysis, clinical trials, and observational studies.
      Figure 5. Ramirez, Jorge H (2014): Infographics: Unpublished US phase 3 clinical trials (2002-2014) completed before Jan 2011 = 50.8%. figshare.http://dx.doi.org/10.6084/m9.figshare.1121675 Raw data: "13377 studies found for: Completed | Interventional Studies | Phase 3 | received from 01/01/2002 to 01/01/2014 | Worldwide". This database complies with the terms and conditions of ClinicalTrials.gov: http://clinicaltrials.gov/ct2/about-site/terms-conditions Supplementary Figures (S1-S6). PubMed publication delay in the indexation processes does not explain the descending trends in the scientific output of evidence-based medicine. Acknowledgments I would like to acknowledge the following persons for providing valuable concepts in data visualization and infographics:
    4. Maria Fernanda Ramírez. Professor of graphic design. Universidad del Valle. Cali, Colombia.
    5. Lorena Franco. Graphic design student. Universidad del Valle. Cali, Colombia. Related articles by this author (Jorge H. Ramírez)
    6. Ramirez JH. Lack of transparency in clinical trials: a call for action. Colomb Med (Cali) 2013;44(4):243-6. URL: http://www.ncbi.nlm.nih.gov/pubmed/24892242
    7. Ramirez JH. Re: Evidence based medicine is broken (17 June 2014). http://www.bmj.com/node/759181
    8. Ramirez JH. Re: Global rules for global health: why we need an independent, impartial WHO (19 June 2014). http://www.bmj.com/node/759151
    9. Ramirez JH. PubMed publication trends (1992 to 2014): evidence based medicine and clinical practice guidelines (04 July 2014). http://www.bmj.com/content/348/bmj.g3725/rr/759895 Recommended articles
    10. Greenhalgh Trisha, Howick Jeremy,Maskrey Neal. Evidence based medicine: a movement in crisis? BMJ 2014;348:g3725
    11. Spence Des. Evidence based medicine is broken BMJ 2014; 348:g22
    12. Schünemann Holger J, Oxman Andrew D,Brozek Jan, Glasziou Paul, JaeschkeRoman, Vist Gunn E et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies BMJ 2008; 336:1106
    13. Lau Joseph, Ioannidis John P A, TerrinNorma, Schmid Christopher H, OlkinIngram. The case of the misleading funnel plot BMJ 2006; 333:597
    14. Moynihan R, Henry D, Moons KGM (2014) Using Evidence to Combat Overdiagnosis and Overtreatment: Evaluating Treatments, Tests, and Disease Definitions in the Time of Too Much. PLoS Med 11(7): e1001655. doi:10.1371/journal.pmed.1001655
    15. Katz D. A-holistic view of evidence based medicinehttp://thehealthcareblog.com/blog/2014/05/02/a-holistic-view-of-evidence-based-medicine/ ---
  17. o

    ML-You-Can-Use Wikidata Employers labeled

    • opendatabay.com
    .undefined
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). ML-You-Can-Use Wikidata Employers labeled [Dataset]. https://www.opendatabay.com/data/ai-ml/e31ecab8-d78b-4108-89df-7ea2d5d3e09e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    E-commerce & Online Transactions
    Description

    Context For Wikidata entries related to people, one of the fields is employer. An employer is typically defined as a company or entity which provides people with work. Many Wikidata entries are accurately described, but a number of entries don't conform to most people's expectations of what's a reasonably valid employer. This dataset is a distinct, labeled subset of the Wikidata employers.

    Wikidata is a great resource of free data. However to interact with it meaningfully most people will find it necessary to clean the data.

    Clean data means different things to different users, so I have provided metadata, statistics and labels so that individual users can decide which parts of the dataset are acceptable and useful.

    Guidelines used for the Labeling Employer: one that employs or makes use of something or somebody (especially): a person or company that provides a job paying wages or a salary to one or more people (m-w.com dictionary definition)

    Most commonly, an employer should indicate a company employing people. Used in a sentence, a company could be substituted for the employer name.

    • I work for Tuttle and Click CPA.
    • I work for a CPA company. Most commonly, an employer should indicate an entity, or a collective entity, but not a person.

    • I work for Tom Steyer. - no

    • I work for the Tom Steyer Charity. - yes Similarly:

    • oncology - no, that's the field

    • Oncology Department - yes, that's an employer entity. etc.

    Plurals are invalid because they indicate an multitude of entities, instead of a single specific entity:

    • I work with Universities. - no
    • I worked at Berlin University - yes, one specific entity Some flexibility in labeling is desirable; it's important to screen out the extreme bad cases. Hence we provide metadata where possible.

    For more details on how some data was labeled manually, how BERT embeddings were used to build a classifier, and how Cleanlab was used to detect problematic labels, please visit the ML-You-Can-Use notebooks to learn more about our label provenance.

    Content The data comes from a dump of Wikidata (2/2/2020). It uses the English labels and descriptions of the Wikidata item codes (courtesy of the Kensho dataset).

    item_id - The Wikidata item_id (QCode without the Q prefix) employer_count - the Wikidata item count employer - the en_label (Kensho) description - the en_description (Kensho) Additional Metadata Provided:

    in_google_news - 0 no, 1 yes: does the occupation exists in the GoogleNews embedding language_detected - 3 digit language code, using FastText language detection source - Wikidata, Wikipedia, manual label - 0 invalid employer, 1 valid employer labeled_by - human, classifier_gnew, classifier_bert, cleanlab label_error_reason - domain, plural Acknowledgements Wikimedia Foundation Kensho Derived Wikimedia Dataset GoogleNews Word Embeddings FastText Language detection ML-You-Can-Use data provenance notebooks Inspiration This dataset can be useful for solving some interesting problems:

    Detecting new trends in employers and occupations, and employment nomenclature

    Automatic error correction of employers

    Converting plurals to singulars Training an NER model

    Training a Question/Answer model

    Improving the FastText language detection model

    Assessing FastText accuracy with limited data

    License

    CC BY-SA

    Original Data Source: ML-You-Can-Use Wikidata Employers labeled

  18. A

    ‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hr-analytics-job-change-of-data-scientists-db67/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘HR Analytics: Job Change of Data Scientists’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context and Content

    A company which is active in Big Data and Data Science wants to hire data scientists among people who successfully pass some courses which conduct by the company. Many people signup for their training. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Information related to demographics, education, experience are in hands from candidates signup and enrollment.

    This dataset designed to understand the factors that lead a person to leave current job for HR researches too. By model(s) that uses the current credentials,demographics,experience data you will predict the probability of a candidate to look for a new job or will work for the company, as well as interpreting affected factors on employee decision.

    The whole data divided to train and test . Target isn't included in test but the test target values data file is in hands for related tasks. A sample submission correspond to enrollee_id of test set provided too with columns : enrollee _id , target

    Note: - The dataset is imbalanced. - Most features are categorical (Nominal, Ordinal, Binary), some with high cardinality. - Missing imputation can be a part of your pipeline as well.

    # Features #
    - enrollee_id : Unique ID for candidate

    • city: City code

    • city_ development _index : Developement index of the city (scaled)

    • gender: Gender of candidate

    • relevent_experience: Relevant experience of candidate

    • enrolled_university: Type of University course enrolled if any

    • education_level: Education level of candidate

    • major_discipline :Education major discipline of candidate

    • experience: Candidate total experience in years

    • company_size: No of employees in current employer's company

    • company_type : Type of current employer

    • last_new_job: Difference in years between previous job and current job

    • training_hours: training hours completed

    • target: 0 – Not looking for job change, 1 – Looking for a job change

    Inspiration

    --- Original source retains full ownership of the source dataset ---

  19. Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad...

    • data.gov.tt
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.tt (2023). Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad and Tobago Open Data Platform [Dataset]. https://data.gov.tt/dataset/retail-sales-index
    Explore at:
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    Data.govhttps://data.gov/
    Area covered
    Trinidad and Tobago
    Description

    The Retail Sales Index (RSI) is like a health check-up for the shopping world, done every three (3) months. Imagine visiting many different stores, from big to small, and noting how much they are selling. That is what the RSI does. It adds up the sales from these stores to get a feel for how well retail businesses are doing. This index helps us understand if people spend more or less at shops, which is a big deal for the economy. Think of it as a way to gauge our shopping habits. Plus, by comparing it with the Retail Price Index (RPI), which tracks price changes, we can see how much we are spending but how much stuff we are actually buying, considering price changes.

  20. m

    Chapter 12: Data Preparation for Fraud Analytics: Project: Human Recourses...

    • data.mendeley.com
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABDELRAHIM AQQAD (2023). Chapter 12: Data Preparation for Fraud Analytics: Project: Human Recourses Analysis - Human_Resources.csv [Dataset]. http://doi.org/10.17632/smypp8574h.1
    Explore at:
    Dataset updated
    Nov 1, 2023
    Authors
    ABDELRAHIM AQQAD
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Project: Human Recourses Analysis - Human_Resources.csv

    Description:

    The dataset, named "Human_Resources.csv", is a comprehensive collection of employee records from a fictional company. Each row represents an individual employee, and the columns represent various features associated with that employee.

    The dataset is rich, highlighting features like 'Age', 'MonthlyIncome', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department', 'EducationField', 'JobSatisfaction', and many more. The main focus is the 'Attrition' variable, which indicates whether an employee left the company or not.

    Employee data were sourced from various departments, encompassing a diverse array of job roles and levels. Each employee's record provides an in-depth look into their background, job specifics, and satisfaction levels.

    The dataset further includes specific indicators and parameters that were considered during employee performance assessments, offering a granular look into the complexities of each employee's experience.

    For privacy reasons, certain personal details and specific identifiers have been anonymized or fictionalized. Instead of names or direct identifiers, each entry is associated with a unique 'EmployeeNumber', ensuring data privacy while retaining data integrity.

    The employee records were subjected to rigorous examination, encompassing both manual assessments and automated checks. The end result of this examination, specifically whether an employee left the company or not, is clearly indicated for each record.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
Organization logo

Customer Shopping Trends Dataset

Journey into Consumer Insights and Retail Evolution with Synthetic Data

Explore at:
32 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description

Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

  • Customer ID - Unique identifier for each customer
  • Age - Age of the customer
  • Gender - Gender of the customer (Male/Female)
  • Item Purchased - The item purchased by the customer
  • Category - Category of the item purchased
  • Purchase Amount (USD) - The amount of the purchase in USD
  • Location - Location where the purchase was made
  • Size - Size of the purchased item
  • Color - Color of the purchased item
  • Season - Season during which the purchase was made
  • Review Rating - Rating given by the customer for the purchased item
  • Subscription Status - Indicates if the customer has a subscription (Yes/No)
  • Shipping Type - Type of shipping chosen by the customer
  • Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
  • Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
  • Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
  • Payment Method - Customer's most preferred payment method
  • Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon

Search
Clear search
Close search
Google apps
Main menu