The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.
This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.
https://i.imgur.com/6UEqejq.png" alt="">
This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.
Cover Photo by: Freepik
Thumbnail by: Clothing icons created by Flat Icons - Flaticon
Transform Your Business with Our Comprehensive B2B Marketing Data Our B2B Marketing Data is designed to be a cornerstone for data-driven professionals looking to optimize their business strategies. With an unwavering commitment to data integrity and quality, our dataset empowers you to make informed decisions, enhance your outreach efforts, and drive business growth.
Why Choose Our B2B Marketing Data? Unmatched Data Integrity and Quality Our data is meticulously sourced and validated through rigorous processes to ensure its accuracy, relevance, and reliability. This commitment to excellence guarantees that you are equipped with the most up-to-date information, empowering your business to thrive in a competitive landscape.
Versatile and Strategic Applications This versatile dataset caters to a wide range of business needs, including:
Lead Generation: Identify and connect with potential clients who align with your business goals. Market Segmentation: Tailor your marketing efforts by segmenting your audience based on industry, company size, or geographical location. Personalized Marketing Campaigns: Craft personalized outreach strategies that resonate with your target audience, increasing engagement and conversion rates. B2B Communication Strategies: Enhance your communication efforts with direct access to decision-makers, ensuring your message reaches the right people. Comprehensive Data Attributes Our B2B Marketing Data offers more than just basic contact information. With over 20+ attributes, you gain in-depth insights into:
Decision-Maker Roles: Understand the responsibilities and influence of key figures within an organization, such as CEOs, executives, and other senior management. Industry Affiliations: Analyze industry-specific data to tailor your approach to the unique dynamics of each sector. Contact Information: Direct email addresses and phone numbers streamline communication, enabling you to engage with your audience effectively and efficiently. Expansive Global Coverage Our dataset spans a wide array of countries, providing a truly global perspective for your business initiatives. Whether you're looking to expand into new markets or strengthen your presence in existing ones, our data ensures comprehensive coverage across the following regions:
North America: United States, Canada, Mexico Europe: United Kingdom, Germany, France, Italy, Spain, Netherlands, Sweden, and more Asia: China, Japan, India, South Korea, Singapore, Malaysia, and more South America: Brazil, Argentina, Chile, Colombia, and more Africa: South Africa, Nigeria, Kenya, Egypt, and more Australia and Oceania: Australia, New Zealand Middle East: United Arab Emirates, Saudi Arabia, Israel, Qatar, and more Industry-Wide Reach Our B2B Marketing Data covers an extensive range of industries, ensuring that no matter your focus, you have access to the insights you need:
Finance and Banking Technology Healthcare Manufacturing Retail Education Energy Real Estate Telecommunications Hospitality Transportation and Logistics Government and Public Sector Non-Profit Organizations And many more… Comprehensive Employee and Revenue Size Information Our dataset includes detailed records on company size and revenue, offering you the ability to:
Employee Size: From small businesses with a handful of employees to large multinational corporations, we provide data across all scales. Revenue Size: Analyze companies based on their revenue brackets, allowing for precise market segmentation and targeted marketing efforts. Seamless Integration with Broader Data Offerings Our B2B Marketing Data is not just a standalone product; it integrates seamlessly with our broader suite of premium datasets. This integration enables you to create a holistic and customized approach to your data-driven initiatives, ensuring that every aspect of your business strategy is informed by the most accurate and comprehensive data available.
Elevate Your Business with Data-Driven Precision Optimize your marketing strategies with our high-quality, reliable, and scalable B2B Marketing Data. Identify new opportunities, understand market dynamics, and connect with key decision-makers to drive your business forward. With our dataset, you’ll stay ahead of the competition and foster meaningful business relationships that lead to sustained growth.
Unlock the full potential of your business with our B2B Marketing Data – the ultimate resource for growth, reliability, and scalability.
People Data Labs is an aggregator of B2B person and company data. We source our globally compliant person dataset via our "Data Union".
The "Data Union" is our proprietary data sharing co-op. Customers opt-in to sharing their data and warrant that their data is fully compliant with global data privacy regulations. Some data sources are provided as a one time dump, others are refreshed every time we do a new data build. Our data sources come from a variety of verticals including HR Tech, Real Estate Tech, Identity/Anti-Fraud, Martech, and others. People Data Labs works with customers on compliance based topics. If a customer wishes to ensure anonymity, we work with them to anonymize the data.
Our company data has identifying information (name, website, social profiles), company attributes (industry, size, founded date), and tags + free text that is useful for segmentation.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains valuable web scraping information about job offers located in Spain, and gives details such as the offer name, company, location, and time of offer to potential employers. Having this knowledge is incredibly beneficial for any job seeker looking to target potential employers in Spain, understand the qualifications and requirements needed to be considered for a role and know approximately how long an offer is likely to stay on Linkedin. This dataset can also be extremely useful for recruiters who need a detailed overview of all job offers currently active in the Spanish market in order to filter out relevant vacancies. Lastly, professionals who have an eye on the Spanish job market can especially benefit from this dataset as it provides useful insights that can help optimise their search even more. This dataset consequently makes it easy for users interested in uncovering opportunities within Spain’s labour landscape with access detailed information about current job opportunities at their fingertips
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This guide will help those looking to use this dataset to discover the job market in Spain. The data provided in the dataset can be a great starting point for people who want to optimize their job search and uncover potential opportunities available.
- Understand What Is Being Measured:The dataset contains details such as a job offer name, company, and location along with other factors such as time of offer and type of schedule asked. It is important to understand what each column represents before using the data set.
- Number of Job Offers Available:This dataset provides an insight on how many job offers are available throughout Spain by showing which areas have a high number of jobs listed and what types of jobs are needed in certain areas or businesses. This information could be used for expanding your career or for searching for specific jobs within different regions in Spain that match your skillset or desired salary range .
- Required Qualifications & Skill Set:The type of schedule being asked by businesses is also mentioned, allowing users to understand if certain employers require multiple shifts, weekend work or hours outside the normal 9 - 5 depending on positions needed within companies located throughout the country . Additionally, understanding what skills sets are required not only quality you prioritize when learning new technologies or gaining qualifications but can give you an idea about what other soft skills may be required by businesses like team work , communication etc..
- Location Opportunities:This web scraping list allows users to gain access into potential companies located throughout Spain such as Madrid , Barcelona , Valencia etc.. By understanding where business demand exists across different regions one could look at taking up new roles with higher remuneration , specialize more closely in recruitments/searches tailored specifically towards various regions around Spain .
By following this guide, you should now have a robust understanding about how best utilize this dataset obtained from UOC along with an increased knowledge on identifying job opportunities available through webscraping for those seeking work experience/positions across multiple regions within the country
- Analyzing the job market in Spain - Companies offering jobs can be compared and contrasted using this dataset, such as locations of where they are looking to hire, types of schedules they offer, length of job postings, etc. This information can let users to target potential employers instead of wasting time randomly applying for jobs online.
- Optimizing a Job Search- Web scraping allows users to quickly gather job postings from all sources on a daily basis and view relevant qualifications and requirements needed for each post in order to better optimize their job search process.
- Leveraging data insights – Insights collected by analyzing this web scraping dataset can be used for strategic advantage when creating LinkedIn or recruitment campaigns targeting Spanish markets based on the available applicants’ preferences – such as hours per week or area/position within particular companies typically offered in the datas set available from UOC
If you use this dataset in your research, please credit the original authors. Data Source
https://brightdata.com/licensehttps://brightdata.com/license
Bright Data’s datasets are created by utilizing proprietary technology for retrieving public web data at scale, resulting in fresh, complete, and accurate datasets. CrunchBase datasets provide unique insights into the latest industry trends. They enable the tracking of company growth, identifying key businesses and professionals, tracking employee movement between companies, as well as enabling more efficient competitive intelligence. Easily define your Crunchbase dataset using our smart filter capabilities, enabling you to customize pre-existing datasets, ensuring the data received fits your business needs. Bright Data’s Crunchbase company data includes over 2.8 million company profiles, with subsets available by industry, region, and any other parameters according to your requirements. There are over 70 data points per company, including overview, details, news, financials, investors, products, people, and more. Choose between full coverage or a subset. Get your Crunchbase dataset Today!
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Stop relying on outdated and inaccurate databases and let Wiza be your source of truth for all deal sourcing and founder / CEO outreach.
Why we're different: The search fund market is dynamic and competitive - Wiza is not a static financial database that gets refreshed on occasion. Every datapoint is sourced and verified the moment that you receive the information. We verify deliverability of every single email ahead of providing the data, and we ensure that each person in your dataset has 100% job title and company accuracy by leveraging Linkedin Data sourced through their live Linkedin profile.
Key Features:
Comprehensive Data Coverage: Stop contacting the same people as everyone else. Wiza's search fund Data is sourced live, not stored in a limited database. When you tell us the type of company or person you would like to contact, we leverage Linkedin Data (the largest, most accurate database in the world) to find everyone who matches your ICP, and then we source the contact data and company data in real-time.
High-Quality, Accurate Data: Wiza ensures accuracy of all datapoints by taking a few key steps that other data providers fail to take: (1) Every email is SMTP verified ahead of delivery, ensuring they will not bounce (2) Every person's Linkedin profile is checked live to ensure we have 100% job title, company, location, etc. accuracy, ahead of providing any data (3) Phone numbers are constantly being verified with AI to ensure accuracy
Linkedin Data: Wiza is able to provide Linkedin Data points, sourced live from each person's Linkedin profile, including Subtitle, Bio, Job Title, Job Description, Skills, Languages, Certifications, Work History, Education, Open to Work, Premium Status, and more!
Personal Data: Wiza has access to industry leading volumes of B2C Contact Data, meaning you can find gmail/yahoo/hotmail email addresses, and mobile phone number data to contact your potential partners.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Security Surveillance: The "Company" model can be used in a security surveillance system where it identifies and logs individuals detected in the footage, helping to maintain safe environments in both public and private settings.
Attendance Management: For office environments or events, the model could be used to manage attendance by recognizing and recording the entrance and exit of individuals.
Retail Analytics: The model could provide valuable insights to retailers about foot traffic, tracking who comes in and out of the store, distinguishing between staff and customer.
Interactive Experiences: In museums or educational facilities, it could be used to create interactive experiences where the system identifies the number of people watching an exhibit and personalizes the content accordingly.
Smart Home Technology: "Company" model can also be used in smart home technologies for recognizing authorized personnel in a given space to automate certain processes like personalized settings, security alerts, etc.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset provided includes information about various companies, their stock symbols, financial metrics such as price-to-book ratio and share price, as well as details about their origin countries. Additionally, the dataset contains frequency distribution information for certain ranges of price-to-book ratios and share prices.
The dataset appears to be a compilation of financial data for different companies, likely for investment analysis or comparison purposes. It includes the following key components:
This dataset can be utilized for various financial analyses such as company valuation, comparison of financial metrics across companies, and investment decision-making.
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
People data provides complete people information and gives the ability to link individual information to organizations and roles.
Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data in this dataset were collected in the result of the survey of Latvian society (2021) aimed at identifying high-value data set for Latvia, i.e. data sets that, in the view of Latvian society, could create the value for the Latvian economy and society.
The survey is created for both individuals and businesses.
It being made public both to act as supplementary data for "Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia" paper (author: Anastasija Nikiforova, University of Latvia) and in order for other researchers to use these data in their own work.
The survey was distributed among Latvian citizens and organisations. The structure of the survey is available in the supplementary file available (see Survey_HighValueDataSets.odt)
***Description of the data in this data set: structure of the survey and pre-defined answers (if any)***
1. Have you ever used open (government) data? - {(1) yes, once; (2) yes, there has been a little experience; (3) yes, continuously, (4) no, it wasn’t needed for me; (5) no, have tried but has failed}
2. How would you assess the value of open govenment data that are currently available for your personal use or your business? - 5-point Likert scale, where 1 – any to 5 – very high
3. If you ever used the open (government) data, what was the purpose of using them? - {(1) Have not had to use; (2) to identify the situation for an object or ab event (e.g. Covid-19 current state); (3) data-driven decision-making; (4) for the enrichment of my data, i.e. by supplementing them; (5) for better understanding of decisions of the government; (6) awareness of governments’ actions (increasing transparency); (7) forecasting (e.g. trendings etc.); (8) for developing data-driven solutions that use only the open data; (9) for developing data-driven solutions, using open data as a supplement to existing data; (10) for training and education purposes; (11) for entertainment; (12) other (open-ended question)
4. What category(ies) of “high value datasets” is, in you opinion, able to create added value for society or the economy? {(1)Geospatial data; (2) Earth observation and environment; (3) Meteorological; (4) Statistics; (5) Companies and company ownership; (6) Mobility}
5. To what extent do you think the current data catalogue of Latvia’s Open data portal corresponds to the needs of data users/ consumers? - 10-point Likert scale, where 1 – no data are useful, but 10 – fully correspond, i.e. all potentially valuable datasets are available
6. Which of the current data categories in Latvia’s open data portals, in you opinion, most corresponds to the “high value dataset”? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
7. Which of them form your TOP-3? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
8. How would you assess the value of the following data categories?
8.1. sensor data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
8.2. real-time data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
8.3. geospatial data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
9. What would be these datasets? I.e. what (sub)topic could these data be associated with? - open-ended question
10. Which of the data sets currently available could be valauble and useful for society and businesses? - open-ended question
11. Which of the data sets currently NOT available in Latvia’s open data portal could, in your opinion, be valauble and useful for society and businesses? - open-ended question
12. How did you define them? - {(1)Subjective opinion; (2) experience with data; (3) filtering out the most popular datasets, i.e. basing the on public opinion; (4) other (open-ended question)}
13. How high could be the value of these data sets value for you or your business? - 5-point Likert scale, where 1 – not valuable, 5 – highly valuable
14. Do you represent any company/ organization (are you working anywhere)? (if “yes”, please, fill out the survey twice, i.e. as an individual user AND a company representative) - {yes; no; I am an individual data user; other (open-ended)}
15. What industry/ sector does your company/ organization belong to? (if you do not work at the moment, please, choose the last option) - {Information and communication services; Financial and ansurance activities; Accommodation and catering services; Education; Real estate operations; Wholesale and retail trade; repair of motor vehicles and motorcycles; transport and storage; construction; water supply; waste water; waste management and recovery; electricity, gas supple, heating and air conditioning; manufacturing industry; mining and quarrying; agriculture, forestry and fisheries professional, scientific and technical services; operation of administrative and service services; public administration and defence; compulsory social insurance; health and social care; art, entertainment and recreation; activities of households as employers;; CSO/NGO; Iam not a representative of any company
16. To which category does your company/ organization belong to in terms of its size? - {small; medium; large; self-employeed; I am not a representative of any company}
17. What is the age group that you belong to? (if you are an individual user, not a company representative) - {11..15, 16..20, 21..25, 26..30, 31..35, 36..40, 41..45, 46+, “do not want to reveal”}
18. Please, indicate your education or a scientific degree that corresponds most to you? (if you are an individual user, not a company representative) - {master degree; bachelor’s degree; Dr. and/ or PhD; student (bachelor level); student (master level); doctoral candidate; pupil; do not want to reveal these data}
***Format of the file***
.xls, .csv (for the first spreadsheet only), .odt
***Licenses or restrictions***
CC-BY
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
In March 2020, Mayor Carter announced the Saint Paul Bridge Fund to provide emergency relief for families and small businesses most vulnerable to the economic impacts of the COVID-19 pandemic. The program was funded through $3.25 million dollars from the Saint Paul Housing and Redevelopment Authority along with contributions from philanthropic, corporate and individual donors. Through these additional contributions, the fund provided $4.1 million to families and small businesses in Saint Paul.Data previously shared in this space included only the 380 recipients funded through "Phase 1". This dataset includes all three phases that were ultimately rolled out through the Bridge Fund for Small Business program.Nearly 2,000 unique applications applied for a small business grant of $7,50036% were from ACP50 areas (Areas of Concentrated Poverty where 50% or more of the residents are people of color)The applications were reviewed in order of a random number assigned at application close. Of these applications:633 small businesses were awarded a $7,500 grant36% of applications in the city were from ACP50 areas86% of applicants in the city cited they were ordered closed under one of the Governor’s Executive OrdersThis is a dataset of the small businesses that applied for the Bridge Fund and includes:Self-reported survey responsesAward informationGeographic information Additional information about the Saint Paul Bridge Fund may be found at stpaul.gov/bridge-fund.
This dataset contains information about the agents employed by a lobbying firm and the employers they ultimately lobby for. A lobbyist/firm registers with the PDC, not individual agents (employees) of that firm. The PDC provides this data as a way to see the individuals that lobby for a firm and all the employers of that firm. This does not indicate that a particular agent necessarily lobbied for a particular employer, merely that the agent's firm lobbied for that employer. This dataset is a best-effort by the PDC to provide a complete set of records as described herewith and may contain incomplete or incorrect information. The PDC provides access to the original reports for the purpose of record verification. Descriptions attached to this dataset do not constitute legal definitions; please consult RCW 42.17A and WAC Title 390 for legal definitions and additional information regarding political finance disclosure requirements. CONDITION OF RELEASE: This publication and or referenced documents constitutes a list of individuals prepared by the Washington State Public Disclosure Commission and may not be used for commercial purposes. This list is provided on the condition and with the understanding that the persons receiving it agree to this statutorily imposed limitation on its use. See RCW 42.56.070(9) and AGO 1975 No. 15.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Update — December 7, 2014. – Evidence-based medicine (EBM) is not working for many reasons, for example: 1. Incorrect in their foundations (paradox): hierarchical levels of evidence are supported by opinions (i.e., lowest strength of evidence according to EBM) instead of real data collected from different types of study designs (i.e., evidence). http://dx.doi.org/10.6084/m9.figshare.1122534 2. The effect of criminal practices by pharmaceutical companies is only possible because of the complicity of others: healthcare systems, professional associations, governmental and academic institutions. Pharmaceutical companies also corrupt at the personal level, politicians and political parties are on their payroll, medical professionals seduced by different types of gifts in exchange of prescriptions (i.e., bribery) which very likely results in patients not receiving the proper treatment for their disease, many times there is no such thing: healthy persons not needing pharmacological treatments of any kind are constantly misdiagnosed and treated with unnecessary drugs. Some medical professionals are converted in K.O.L. which is only a puppet appearing on stage to spread lies to their peers, a person supposedly trained to improve the well-being of others, now deceits on behalf of pharmaceutical companies. Probably the saddest thing is that many honest doctors are being misled by these lies created by the rules of pharmaceutical marketing instead of scientific, medical, and ethical principles. Interpretation of EBM in this context was not anticipated by their creators. “The main reason we take so many drugs is that drug companies don’t sell drugs, they sell lies about drugs.” ―Peter C. Gøtzsche “doctors and their organisations should recognise that it is unethical to receive money that has been earned in part through crimes that have harmed those people whose interests doctors are expected to take care of. Many crimes would be impossible to carry out if doctors weren’t willing to participate in them.” —Peter C Gøtzsche, The BMJ, 2012, Big pharma often commits corporate crime, and this must be stopped. Pending (Colombia): Health Promoter Entities (In Spanish: EPS ―Empresas Promotoras de Salud).
Context For Wikidata entries related to people, one of the fields is employer. An employer is typically defined as a company or entity which provides people with work. Many Wikidata entries are accurately described, but a number of entries don't conform to most people's expectations of what's a reasonably valid employer. This dataset is a distinct, labeled subset of the Wikidata employers.
Wikidata is a great resource of free data. However to interact with it meaningfully most people will find it necessary to clean the data.
Clean data means different things to different users, so I have provided metadata, statistics and labels so that individual users can decide which parts of the dataset are acceptable and useful.
Guidelines used for the Labeling Employer: one that employs or makes use of something or somebody (especially): a person or company that provides a job paying wages or a salary to one or more people (m-w.com dictionary definition)
Most commonly, an employer should indicate a company employing people. Used in a sentence, a company could be substituted for the employer name.
Tuttle and Click
CPA.I work for a CPA company
.
Most commonly, an employer should indicate an entity, or a collective entity, but not a person.
I work for Tom Steyer. - no
I work for the Tom Steyer Charity. - yes Similarly:
oncology - no, that's the field
Oncology Department - yes, that's an employer entity. etc.
Plurals are invalid because they indicate an multitude of entities, instead of a single specific entity:
For more details on how some data was labeled manually, how BERT embeddings were used to build a classifier, and how Cleanlab was used to detect problematic labels, please visit the ML-You-Can-Use notebooks to learn more about our label provenance.
Content The data comes from a dump of Wikidata (2/2/2020). It uses the English labels and descriptions of the Wikidata item codes (courtesy of the Kensho dataset).
item_id - The Wikidata item_id (QCode without the Q prefix) employer_count - the Wikidata item count employer - the en_label (Kensho) description - the en_description (Kensho) Additional Metadata Provided:
in_google_news - 0 no, 1 yes: does the occupation exists in the GoogleNews embedding language_detected - 3 digit language code, using FastText language detection source - Wikidata, Wikipedia, manual label - 0 invalid employer, 1 valid employer labeled_by - human, classifier_gnew, classifier_bert, cleanlab label_error_reason - domain, plural Acknowledgements Wikimedia Foundation Kensho Derived Wikimedia Dataset GoogleNews Word Embeddings FastText Language detection ML-You-Can-Use data provenance notebooks Inspiration This dataset can be useful for solving some interesting problems:
Detecting new trends in employers and occupations, and employment nomenclature
Automatic error correction of employers
Converting plurals to singulars Training an NER model
Training a Question/Answer model
Improving the FastText language detection model
Assessing FastText accuracy with limited data
CC BY-SA
Original Data Source: ML-You-Can-Use Wikidata Employers labeled
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘HR Analytics: Job Change of Data Scientists’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists on 28 January 2022.
--- Dataset description provided by original source is as follows ---
A company which is active in Big Data and Data Science wants to hire data scientists among people who successfully pass some courses which conduct by the company. Many people signup for their training. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Information related to demographics, education, experience are in hands from candidates signup and enrollment.
This dataset designed to understand the factors that lead a person to leave current job for HR researches too. By model(s) that uses the current credentials,demographics,experience data you will predict the probability of a candidate to look for a new job or will work for the company, as well as interpreting affected factors on employee decision.
The whole data divided to train and test . Target isn't included in test but the test target values data file is in hands for related tasks. A sample submission correspond to enrollee_id of test set provided too with columns : enrollee _id , target
Note: - The dataset is imbalanced. - Most features are categorical (Nominal, Ordinal, Binary), some with high cardinality. - Missing imputation can be a part of your pipeline as well.
#
Features
#
- enrollee_id : Unique ID for candidate
city: City code
city_ development _index : Developement index of the city (scaled)
gender: Gender of candidate
relevent_experience: Relevant experience of candidate
enrolled_university: Type of University course enrolled if any
education_level: Education level of candidate
major_discipline :Education major discipline of candidate
experience: Candidate total experience in years
company_size: No of employees in current employer's company
company_type : Type of current employer
last_new_job: Difference in years between previous job and current job
training_hours: training hours completed
target: 0 – Not looking for job change, 1 – Looking for a job change
--- Original source retains full ownership of the source dataset ---
The Retail Sales Index (RSI) is like a health check-up for the shopping world, done every three (3) months. Imagine visiting many different stores, from big to small, and noting how much they are selling. That is what the RSI does. It adds up the sales from these stores to get a feel for how well retail businesses are doing. This index helps us understand if people spend more or less at shops, which is a big deal for the economy. Think of it as a way to gauge our shopping habits. Plus, by comparing it with the Retail Price Index (RPI), which tracks price changes, we can see how much we are spending but how much stuff we are actually buying, considering price changes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project: Human Recourses Analysis - Human_Resources.csv
Description:
The dataset, named "Human_Resources.csv", is a comprehensive collection of employee records from a fictional company. Each row represents an individual employee, and the columns represent various features associated with that employee.
The dataset is rich, highlighting features like 'Age', 'MonthlyIncome', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department', 'EducationField', 'JobSatisfaction', and many more. The main focus is the 'Attrition' variable, which indicates whether an employee left the company or not.
Employee data were sourced from various departments, encompassing a diverse array of job roles and levels. Each employee's record provides an in-depth look into their background, job specifics, and satisfaction levels.
The dataset further includes specific indicators and parameters that were considered during employee performance assessments, offering a granular look into the complexities of each employee's experience.
For privacy reasons, certain personal details and specific identifiers have been anonymized or fictionalized. Instead of names or direct identifiers, each entry is associated with a unique 'EmployeeNumber', ensuring data privacy while retaining data integrity.
The employee records were subjected to rigorous examination, encompassing both manual assessments and automated checks. The end result of this examination, specifically whether an employee left the company or not, is clearly indicated for each record.
The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.
This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.
https://i.imgur.com/6UEqejq.png" alt="">
This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.
Cover Photo by: Freepik
Thumbnail by: Clothing icons created by Flat Icons - Flaticon