29 datasets found

Fake Dataset for Practice
kaggle.com
zip
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Kumar Basak-4004 (2023). Fake Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/fake-dataset-for-practice
Explore at:
zip(1515599 bytes)Available download formats
Dataset updated
Aug 21, 2023
Authors
Shuvo Kumar Basak-4004
Description
Description: This dataset is created solely for the purpose of practice and learning. It contains entirely fake and fabricated information, including names, phone numbers, emails, cities, ages, and other attributes. None of the information in this dataset corresponds to real individuals or entities. It serves as a resource for those who are learning data manipulation, analysis, and machine learning techniques. Please note that the data is completely fictional and should not be treated as representing any real-world scenarios or individuals.

Attributes: - phone_number: Fake phone numbers in various formats. - name: Fictitious names generated for practice purposes. - email: Imaginary email addresses created for the dataset. - city: Made-up city names to simulate geographical diversity. - age: Randomly generated ages for practice analysis. - sex: Simulated gender values (Male, Female). - married_status: Synthetic marital status information. - job: Fictional job titles for practicing data analysis. - income: Fake income values for learning data manipulation. - religion: Pretend religious affiliations for practice. - nationality: Simulated nationalities for practice purposes.

Please be aware that this dataset is not based on real data and should be used exclusively for educational purposes.
Fake Employee Dataset
kaggle.com
zip
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oyekanmi Olamilekan (2023). Fake Employee Dataset [Dataset]. https://www.kaggle.com/datasets/oyekanmiolamilekan/fake-employee-dataset
Explore at:
zip(162874 bytes)Available download formats
Dataset updated
Nov 20, 2023
Authors
Oyekanmi Olamilekan
Description
Creating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.

Code Url: https://github.com/intellisenseCodez/faker-data-generator
Employee Records Dataset
kaggle.com
zip
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cankat Saraç (2023). Employee Records Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/employee-records-dataset
Explore at:
zip(98365 bytes)Available download formats
Dataset updated
Mar 28, 2023
Authors
Cankat Saraç
Description
Description: This dataset contains simulated employee records for a fictional company. The dataset was generated using the Python Faker library to create realistic but fake data. The dataset includes the following fields for each employee:

Employee ID: A unique identifier for each employee (integer). Name: A randomly generated full name (string). Job title: A randomly generated job title (string). Department: A randomly selected department from a predefined list (HR, Marketing, Sales, IT, or Finance) (string). Email: A randomly generated email address (string). Phone number: A randomly generated phone number (string). Date of hiring: A randomly generated hiring date within the last 10 years (date). Salary: A randomly generated salary value between 30,000 and 150,000 (decimal). Please note that this dataset is for demonstration and testing purposes only. The data is entirely fictional and should not be used for any decision-making or analysis.
Realistic Email Categorization Dataset (Synthetic)
kaggle.com
zip
Updated Jan 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fenil Sonani (2025). Realistic Email Categorization Dataset (Synthetic) [Dataset]. https://www.kaggle.com/datasets/fenilsonani/email-data-for-email-classification
Explore at:
zip(2746947 bytes)Available download formats
Dataset updated
Jan 4, 2025
Authors
Fenil Sonani
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset, titled "Realistic Email Categorization Dataset for BERT (Synthetic)," contains 20,000 entries of diverse and realistic email addresses generated using a Python script. The dataset is meticulously crafted to mimic real-world email categorization scenarios, making it an excellent resource for training and evaluating machine learning models, particularly transformer-based models like BERT.

Features:

Email Address: The complete email address (e.g., john.doe@example.com).

Category: Broad classification of the email (e.g., Sales, Support, Marketing).

Subcategory: Granular classification within the main category (e.g., Technical Support, Domestic Sales).

Local Part: The part of the email address before the @ symbol.

Domain: The part of the email address after the @ symbol.

Length: Total character count of the email address.

Character Bi-grams & Tri-grams: Sequences of two and three consecutive characters extracted from the local part.

Email Content: Randomly generated textual snippets associated with the email.

Timestamp: Simulated creation or usage timestamp within the past two years.

Disposable & Spam Indicators: Boolean flags indicating whether the email is disposable or marked as spam.

Country & Language: Geographical and linguistic metadata derived from the domain.

Key Highlights:

The data is entirely synthetic and generated using the Faker Python library and additional randomization techniques.

Designed to simulate realistic email structures, including name-based, role-based, and department-specific addresses.

Features a diverse range of domains, subdomains, TLDs, and email patterns, ensuring applicability across a wide range of machine learning and natural language processing tasks.

Includes advanced annotations like email content, spam indicators, and geographical metadata, providing rich contextual information for model training.

Applications:

Email Classification: Train models to categorize emails into predefined categories.

Spam Detection: Use the spam indicators to train or evaluate anti-spam algorithms.

Feature Engineering: Explore the impact of local part, domain, and character n-grams on machine learning models.

BERT Fine-Tuning: Leverage the email content and category labels to fine-tune transformer models for NLP tasks.
Eazydinner-ahmedabad-dataset
kaggle.com
zip
Updated Jul 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
crawlmagic (2023). Eazydinner-ahmedabad-dataset [Dataset]. https://www.kaggle.com/datasets/crawlmagic/eazydinner-ahmedabad-dataset
Explore at:
zip(48537 bytes)Available download formats
Dataset updated
Jul 5, 2023
Authors
crawlmagic
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Area covered
Ahmedabad
Description
This is a sample dataset of eazydinner in Ahmedabad, dataset contains each piece of information. For example Restaurant name, address, URL, service type, phone no, email, and many more.

For more details Check: Crawlmagic
LMS Tracking Dataset
kaggle.com
zip
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). LMS Tracking Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/lms-tracking-dataset
Explore at:
zip(5419 bytes)Available download formats
Dataset updated
May 6, 2024
Authors
Prasad Patil
Description
This dataset was collected by a edtech startup. The startup is into teaching entrepreneurial life-skills in animated-gamified format through its video series to kids between the age group of 6-14 years. Through its learning management system the company tracks the progress made by all of its subscribers on the platform. Company records platform content usage activity data and tries to follow up with parents if there is any inactiveness on the platform by their child. Here's more information about the dataset

Dataset Information:

Child Name: Name of the subscriber kid

Email Address: Email address created by parent

Contact: Contact details of the parent

follow up: Responses received by the company employee after progress follow-up over the phone.

response: segregating the follow-up responses in to categories

Introduction: Tutorial 1

Activity:- Know your personality, a fun way:Tutorial 2

A Simple Quiz on the previous Video: Quiz on the Tutorial 2

Lets see what ‘Product’ is…:Tutorial 3

A Simple Quiz on the previous Video:Quiz on the Tutorial 3

Product that represents me: Tutorial 4

Let's see what 'Service' means: Tutorial 5

A Simple Quiz on the previous Video:Quiz on the Tutorial 5

Instruction for 'Product & Service' worksheet:Tutorial 6

Activity:- Product and Service Worksheet: Exercise on Tutorial 6

Instructions for Product Word Association:Tutorial 7

Activity:- Product Word Association:Exercise on Tutorial 7

Life without products??.... Impossible !:Tutorial 8

What Is a Need?:Tutorial 9

A Simple Quiz on the previous Video:Quiz on the Tutorial 9

Summary of Session 1: Summarizing all the learnings from the Tutorials 1-9

Your Feedback on Session 1: Feedback page

There is some missing data as well. I hope it would be good dataset for beginners practicing their NLP skills.

Image by Steven Weirather from Pixabay

Note: This dataset is partially synthetic meaning names, email and contact details mentioned are not of the actual customers. Kindly use it for educational and research purposes.
Saree Retailers Database in India
kaggle.com
zip
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Saree Retailers Database in India [Dataset]. https://www.kaggle.com/datasets/thedevastator/saree-retailers-database-in-india-april-2021/code
Explore at:
zip(5430 bytes)Available download formats
Dataset updated
Jan 5, 2023
Authors
The Devastator
Area covered
India
Description
Saree Retailers Database in India

Accurate Up-to-Date Data for All Types of Business Purposes

By Amresh [source]

About this dataset

This All India Saree Retailers Database is a comprehensive collection of up-to-date information on 10,000 Saree Retailers located all over India. The database is updated in April 2021 and offers an overall accuracy rate of around 90%.

For business owners, marketers, and data analysts and researchers, this dataset is an invaluable resource. It contains contact details of store name, contact person names, phone number and email address along with store location information like city state and pin code to help you target the right audience precisely.

The database can be accessed in Microsoft Excel (.xlsx) format which makes it easy to read or manipulate the file according to your needs. Apart from this wide range of payment options like Credit/Debit Card; Online Transfer; NEFT; Cash Deposit; Paytm; PhonePe; Google Pay or PayPal allow quick download access within 2-3 business hours.

So if you are looking for reliable business intelligence data related to Indian saree retailers that can help you unlock incredible opportunities for your business then make sure to download our All India Saree Retailers Database at the earliest!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a comprehensive list of Saree retailers in India, including store name, contact person, email address, mobile number, phone number, address details like city and state along with pin code. It contains 10 thousand records updated in April 2021 with an overall accuracy rate of around 90%. This data can be used to understand customer behaviour as well as to analyse geographical customer pattern.

Using this dataset you can: - Target specific states or cities where potential customers are located for your Saree business. - Get in touch with local Saree retailers for possible collaborations and partnerships. - Learn more about industry trends from actual store owners who can offer insights into the latest ongoing trends and identify new opportunities for you to grow your business. 4 .Analyse existing competitors’ market share by studying the cities/states where they operate and their contact information such as Mobile Number & Email Ids .
5 .Identify potential new customers for better sales conversion rates by understanding who is already operating in similar products nearby or have similar target audience as yours that help your company reach out to them quickly & effectively using direct marketing techniques such as emails & SMS etc.,

Research Ideas

Creating targeted email campaigns to increase Saree sales: The dataset can be used to create targeted email campaigns that can reach the 10,000 Saree Retailers in India. This will allow businesses to increase sales by directing their message about promotions and discounts directly to potential customers.

Customizing online product recommendations for each retailer: The dataset can be used to identify the specific products that each individual retailer is interested in selling, so product recommendations on an e-commerce website could be tailored accordingly. This would optimize customer experience giving them more accurate and relevant results when searching for a particular item they are looking for while shopping online.

Using GPS technology to generate location-based marketing campaigns: By creating geo-fenced areas around each store using the pin code database, it would be possible to send out marketing messages based on people's physical location instead of just sending them out in certain neighborhoods or cities without regard for store locations within those areas. This could help reach specific customers with relevant messages about products or promotions that may interested them more effectively than a standard marketing campaign with no location targeting involved

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: 301-Saree-Garment-Retailer-Database-Sample.csv

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Amresh.
Invoices Dataset
kaggle.com
zip
Updated Jan 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cankat Saraç (2022). Invoices Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/invoices
Explore at:
zip(574249 bytes)Available download formats
Dataset updated
Jan 18, 2022
Authors
Cankat Saraç
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.
Retail Analysis on Large Dataset
kaggle.com
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahil Prajapati (2024). Retail Analysis on Large Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/8693643
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/8693643
Dataset updated
Jun 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sahil Prajapati
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Description:

The dataset represents retail transactional data. It contains information about customers, their purchases, products, and transaction details. The data includes various attributes such as customer ID, name, email, phone, address, city, state, zipcode, country, age, gender, income, customer segment, last purchase date, total purchases, amount spent, product category, product brand, product type, feedback, shipping method, payment method, and order status.

Key Points:

Customer Information:

Includes customer details like ID, name, email, phone, address, city, state, zipcode, country, age, and gender. Customer segments are categorized into Premium, Regular, and New. ##Transaction Details:

Transaction-specific data such as transaction ID, last purchase date, total purchases, amount spent, total purchase amount, feedback, shipping method, payment method, and order status. ##Product Information:

Contains product-related details such as product category, brand, and type. Products are categorized into electronics, clothing, grocery, books, and home decor. ##Geographic Information:

Contains location details including city, state, and country. Available for various countries including USA, UK, Canada, Australia, and Germany. ##Temporal Information:

Last purchase date is provided along with separate columns for year, month, date, and time. Allows analysis based on temporal patterns and trends. ##Data Quality:

Some rows contain null values, and others are duplicates, which may need to be handled during data preprocessing. Null values are randomly distributed across rows. Duplicate rows are available at different parts of the dataset. ##Potential Analysis:

Customer segmentation analysis based on demographics, purchase behavior, and feedback. Sales trend analysis over time to identify peak seasons or trends. Product performance analysis to determine popular categories, brands, or types. Geographic analysis to understand regional preferences and trends. Payment and shipping method analysis to optimize services. Customer satisfaction analysis based on feedback and order status. ##Data Preprocessing:

Handling null values and duplicates. Parsing and formatting temporal data. Encoding categorical variables. Scaling numerical variables if required. Splitting data into training and testing sets for modeling.

54k Resume dataset (structured)

kaggle.com

zip

Updated Nov 14, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Suriya Ganesh (2024). 54k Resume dataset (structured) [Dataset]. https://www.kaggle.com/datasets/suriyaganesh/resume-dataset-structured

Explore at:

zip(39830263 bytes)Available download formats

Dataset updated

Nov 14, 2024

Authors

Suriya Ganesh

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset is aggregated from sources such as

Entirely available in the public domain.

Resumes are usually in pdf format. OCR was used to convert the PDF into text and LLMs were used to convert the data into a structured format.

Dataset Overview

This dataset contains structured information extracted from professional resumes, normalized into multiple related tables. The data includes personal information, educational background, work experience, professional skills, and abilities.

Table Schemas

1. people.csv

Primary table containing core information about each individual.

Column Name	Data Type	Description	Constraints	Example
person_id	INTEGER	Unique identifier for each person	Primary Key, Not Null	1
name	VARCHAR(255)	Full name of the person	May be Null	"Database Administrator"
email	VARCHAR(255)	Email address	May be Null	"john.doe@email.com"
phone	VARCHAR(50)	Contact number	May be Null	"+1-555-0123"
linkedin	VARCHAR(255)	LinkedIn profile URL	May be Null	"linkedin.com/in/johndoe"

2. abilities.csv

Detailed abilities and competencies listed by individuals.

Column Name	Data Type	Description	Constraints	Example
person_id	INTEGER	Reference to people table	Foreign Key, Not Null	1
ability	TEXT	Description of ability	Not Null	"Installation and Building Server"

3. education.csv

Contains educational history for each person.

Column Name	Data Type	Description	Constraints	Example
person_id	INTEGER	Reference to people table	Foreign Key, Not Null	1
institution	VARCHAR(255)	Name of educational institution	May be Null	"Lead City University"
program	VARCHAR(255)	Degree or program name	May be Null	"Bachelor of Science"
start_date	VARCHAR(7)	Start date of education	May be Null	"07/2013"
location	VARCHAR(255)	Location of institution	May be Null	"Atlanta, GA"

4. experience.csv

Details of work experience entries.

Column Name	Data Type	Description	Constraints	Example
person_id	INTEGER	Reference to people table	Foreign Key, Not Null	1
title	VARCHAR(255)	Job title	May be Null	"Database Administrator"
firm	VARCHAR(255)	Company name	May be Null	"Family Private Care LLC"
start_date	VARCHAR(7)	Employment start date	May be Null	"04/2017"
end_date	VARCHAR(7)	Employment end date	May be Null	"Present"
location	VARCHAR(255)	Job location	May be Null	"Roswell, GA"

4. person_skills.csv

Mapping table connecting people to their skills.

Column Name	Data Type	Description	Constraints	Example
person_id	INTEGER	Reference to people table	Foreign Key, Not Null	1
skill	VARCHAR(255)	Reference to skills table	Foreign Key, Not Null	"SQL Server"

5. skills.csv

Master list of unique skills mentioned across all resumes.

Column Name	Data Type	Description	Constraints	Example
skill	VARCHAR(255)	Unique skill name	Primary Key, Not Null	"SQL Server"

Relationships

Each person (people.csv) can have:
- Multiple education entries (education.csv)
- Multiple experience entries (experience.csv)
- Multiple skills (person_skills.csv)
- Multiple abilities (abilities.csv)
Skills (skills.csv) can be associated with multiple people
All relationships are maintained through the person_id field

Data Characteristics

Date Formats

All dates are stored in MM/YYYY format
Current positions use "Present" for end_date

Text Fields

All text fields preserve original case
NULL values indicate missing information
No maximum length enforced for TEXT fields
VARCHAR fields have practical limits noted in schema

Identifiers

person_id starts at 1 and increments sequentially
No natural or composite keys used
All relationships maintained through person_id

Common Usage Patterns

Basic Queries

-- Get all skills for a person
SELECT s.skill 
FROM person_skills ps
JOIN skills s ON ps.skill = s.skill
WHERE ps.person_id = 1;

-- Get complete work history
SELECT * 
FROM experience
WHERE person_id = 1
ORDER BY start_date DESC;

Analytics Queries

-- Most common skills
SELECT s.skill, COUNT(*) as frequency
FROM person_skills ps
...

Healthcare Management System
kaggle.com
zip
Updated Dec 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
Explore at:
zip(74279 bytes)Available download formats
Dataset updated
Dec 23, 2023
Authors
Anouska Abhisikta
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Patients Table:

PatientID: Unique identifier for each patient.

firstname: First name of the patient.

lastname: Last name of the patient.

email: Email address of the patient.

This table stores information about individual patients, including their names and contact details.

Doctors Table:

DoctorID: Unique identifier for each doctor.

DoctorName: Full name of the doctor.

Specialization: Area of medical specialization.

DoctorContact: Contact details of the doctor.

This table contains details about healthcare providers, including their names, specializations, and contact information.

Appointments Table:

AppointmentID: Unique identifier for each appointment.

Date: Date of the appointment.

Time: Time of the appointment.

PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.

DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

This table records scheduled appointments, linking patients to doctors.

MedicalProcedure Table:

ProcedureID: Unique identifier for each medical procedure.

ProcedureName: Name or description of the medical procedure.

AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

This table stores details about medical procedures associated with specific appointments.

Billing Table:

InvoiceID: Unique identifier for each billing transaction.

PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.

Items: Description of items or services billed.

Amount: Amount charged for the billing transaction.

This table maintains records of billing transactions, associating them with specific patients.

demo Table:

ID: Primary key, serves as a unique identifier for each record.

Name: Name of the entity.

Hint: Additional information or hint about the entity.

This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.
Looker Ecommerce BigQuery Dataset
kaggle.com
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description
Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

Columns:

id: Unique identifier for each distribution center.

name: Name of the distribution center.

latitude: Latitude coordinate of the distribution center.

longitude: Longitude coordinate of the distribution center.

2. events.csv

Columns:

id: Unique identifier for each event.

user_id: Identifier for the user associated with the event.

sequence_number: Sequence number of the event.

session_id: Identifier for the session during which the event occurred.

created_at: Timestamp indicating when the event took place.

ip_address: IP address from which the event originated.

city: City where the event occurred.

state: State where the event occurred.

postal_code: Postal code of the event location.

browser: Web browser used during the event.

traffic_source: Source of the traffic leading to the event.

uri: Uniform Resource Identifier associated with the event.

event_type: Type of event recorded.

3. inventory_items.csv

Columns:

id: Unique identifier for each inventory item.

product_id: Identifier for the associated product.

created_at: Timestamp indicating when the inventory item was created.

sold_at: Timestamp indicating when the item was sold.

cost: Cost of the inventory item.

product_category: Category of the associated product.

product_name: Name of the associated product.

product_brand: Brand of the associated product.

product_retail_price: Retail price of the associated product.

product_department: Department to which the product belongs.

product_sku: Stock Keeping Unit (SKU) of the product.

product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

Columns:

id: Unique identifier for each order item.

order_id: Identifier for the associated order.

user_id: Identifier for the user who placed the order.

product_id: Identifier for the associated product.

inventory_item_id: Identifier for the associated inventory item.

status: Status of the order item.

created_at: Timestamp indicating when the order item was created.

shipped_at: Timestamp indicating when the order item was shipped.

delivered_at: Timestamp indicating when the order item was delivered.

returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

Columns:

order_id: Unique identifier for each order.

user_id: Identifier for the user who placed the order.

status: Status of the order.

gender: Gender information of the user.

created_at: Timestamp indicating when the order was created.

returned_at: Timestamp indicating when the order was returned.

shipped_at: Timestamp indicating when the order was shipped.

delivered_at: Timestamp indicating when the order was delivered.

num_of_item: Number of items in the order.

6. products.csv

Columns:

id: Unique identifier for each product.

cost: Cost of the product.

category: Category to which the product belongs.

name: Name of the product.

brand: Brand of the product.

retail_price: Retail price of the product.

department: Department to which the product belongs.

sku: Stock Keeping Unit (SKU) of the product.

distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

Columns:

id: Unique identifier for each user.

first_name: First name of the user.

last_name: Last name of the user.

email: Email address of the user.

age: Age of the user.

gender: Gender of the user.

state: State where t...

Data from: Famous Quotes Dataset

kaggle.com

zip

Updated Aug 26, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Dev Bhise (2024). Famous Quotes Dataset [Dataset]. https://www.kaggle.com/datasets/devbhise/quote-of-genius

Explore at:

zip(211995 bytes)Available download formats

Dataset updated

Aug 26, 2024

Authors

Dev Bhise

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset Title: Famous Quotes Dataset

Subtitle: A collection of famous quotes from notable figures.

Description

This dataset contains a curated collection of quotes from various renowned individuals. Each entry in the dataset includes the text of the quote and the name of the author. It is designed for use in text analysis, natural language processing (NLP), and sentiment analysis tasks.

Dataset Details

Number of Records: 500
Columns:
- quote: The text of the quote.
- author: The name of the author of the quote.

Sample Data

Quote	Author
“Whatever you do, you need courage. ...”	Ralph Waldo Emerson
“To be yourself in a world that is constantly trying to make you something else is the greatest accomplishment.”	Ralph Waldo Emerson

Source

The quotes have been collected from various sources including books, websites, and public domain materials. The data has been verified for accuracy to the best extent possible.

Usage

This dataset is suitable for: - Sentiment Analysis - Text Classification - NLP Models - Data Visualization

License

Specify the license under which the dataset is distributed. For example, "Creative Commons Attribution 4.0 International (CC BY 4.0)" or any other license that fits your requirements.

Acknowledgements

Acknowledgements to any contributors or sources of the dataset if applicable.

Additional Information

Data Quality: The dataset is carefully curated, but please verify individual quotes for accuracy as needed.
Contact Information: If you have any questions or need further information, please contact [Your Email Address or Contact Information].

SAP DATASET | BigQuery Dataset

kaggle.com

zip

Updated Aug 20, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion

Explore at:

zip(365940125 bytes)Available download formats

Dataset updated

Aug 20, 2024

Authors

Mustafa Keser

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

Dataset Description: SAP Replicated Data

Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

Tables:

Here's a Markdown table with the information you provided:

File Name	Description
adr6.csv	Addresses with organizational units. Contains address details related to organizational units like departments or branches.
adrc.csv	General Address Data. Provides information about addresses, including details such as street, city, and postal codes.
adrct.csv	Address Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
adrt.csv	Address Details. Includes detailed address data such as street addresses, city, and country codes.
ankt.csv	Accounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
anla.csv	Asset Master Data. Contains information about fixed assets, including asset identification and classification.
bkpf.csv	Accounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
bseg.csv	Accounting Document Segment. Details line items within accounting documents, including account details and amounts.
but000.csv	Business Partners. Contains basic information about business partners, including IDs and names.
but020.csv	Business Partner Addresses. Provides address details associated with business partners.
cepc.csv	Customer Master Data - Central. Contains centralized data for customer master records.
cepct.csv	Customer Master Data - Contact. Provides contact details associated with customer records.
csks.csv	Cost Center Master Data. Contains data about cost centers within the organization.
cskt.csv	Cost Center Texts. Provides text descriptions and labels for cost centers.
dd03l.csv	Data Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
ekbe.csv	Purchase Order History. Details history of purchase orders, including quantities and values.
ekes.csv	Purchasing Document History. Contains history of purchasing documents including changes and statuses.
eket.csv	Purchase Order Item History. Details changes and statuses for individual purchase order items.
ekkn.csv	Purchase Order Account Assignment. Provides account assignment details for purchas...

Resume_Dataset

kaggle.com

zip

Updated Jul 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

RayyanKauchali0 (2025). Resume_Dataset [Dataset]. https://www.kaggle.com/datasets/rayyankauchali0/resume-dataset

Explore at:

zip(3616108 bytes)Available download formats

Dataset updated

Jul 26, 2025

Authors

RayyanKauchali0

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Tech Resume Dataset (3,500+ Samples):

This dataset is designed for cutting-edge NLP research in resume parsing, job classification, and ATS system development. Below are extensive details and several ready-made diagrams you can include in your Kaggle upload (just save and upload as “Additional Files” or use them in your dataset description).

Dataset Composition and Sourcing

Total Resumes: 3,500+
Sources:
- Real Data: 2,047 resumes (58.5%) from ResumeAtlas and reputable open repositories; all records strictly anonymized.
- Template-Based Synthetic: 573 resumes featuring varied narratives and realistic achievements for classic, modern, and professional styles.
- LLM-Generated Variations: 460 unique samples using structured prompts to diversify skills, summaries, and career tracks, focusing on AI, ML, and data.
- Faker-Seeded Synthetic: 420 resumes, especially for junior/support/cloud/network tracks, populated with robust Faker-generated work and education fields.
Role Coverage:
- 15 major technology clusters (Software Engineering, DevOps, Cloud, AI/ML, Security, Data Engineering, QA, UI/UX, and more)
- At least 200 samples per primary role group for label balance
- 60+ subcategories reflecting granular tech job roles

Key Dataset Fields (JSONL Schema)

Field	Description	Example/Data Type
ResumeID	Unique, anonymized string	"DIS4JE91Z..." (string)
Category	Tech job category/label	"DevOps Engineer"
Name	Anonymized (Faker-generated) name	"Jordan Patel"
Email	Anonymized email address	"jpatel@example.com"
Phone	Anonymized phone number	"+1-555-343-2123"
Location	City, country or region (anonymized)	"Austin, TX, USA"
Summary	Professional summary/intro	String (3-6 sentences)
Skills	List or comma-separated tech/soft skills	"Python, Kubernetes..."
Experience	Work chronology, organizations, bullet-point details	String (multiline)
Education	Universities, degrees, certs	String (multiline)
Source	"real", "template", "llm", "faker"	String

https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/a5b5a057-7265-4428-9827-0a4c92f88d19/0e26c38c.png" alt="Dataset Schema Overview with Field Descriptions and Data Types">

Dataset Schema Overview with Field Descriptions and Data Types

Technical Validation & Quality Assurance

Formatting:
- Uniform schema, right-tab alignment for dates (MMM-YYYY)
- Standard ATS/NLP-friendly section headers
De-duplication:
- All records checked with BERT/MinHash for uniqueness (cosine similarity >0.9 removed)
PII Scrubbing:
- Names, contacts, locations anonymized with Python Faker
Role/Skill Taxonomy:
- Job titles & skills mapped to ESCO, O*NET, NIST NICE, CNCF lexicons for research alignment
Quality Checks:
- Automatic and manual validation for section presence, data type conformity, and format alignment

Role & Source Coverage Visualizations

Composition by Data Source:

https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/a5aafe90-c5b6-4d07-ad9c-cf5244266561/5723c094.png" alt="Composition of Tech Resume Dataset by Data Source">

Composition of Tech Resume Dataset by Data Source

Role Cluster Diversity:

https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/8c6ba5d6-f676-4213-b4f7-16a133081e00/e9cc61b6.png" alt="Distribution of Major Tech Role Clusters in the 3,500 Resumes Dataset">

Distribution of Major Tech Role Clusters in the 3,500 Resumes Dataset

Alternative: Dataset by Source Type (Pie Chart):

https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/2325f133-7fe5-4294-9a9d-4db19be3584f/b85a47bd.png" alt="Resume Dataset Composition by Source Type">

Resume Dataset Composition by Source Type

Typical Use Cases

Resume parsing & sectioning (training for models like BERT, RoBERTa, spaCy)
Fine-tuning for NER, job classification (60+ labels), skill extraction, and ATS research
Development or benchmarking of AI-powered job matching, candidate ranking, and automated tracking tools
ML/data science education and demo pipelines

How to Use the JSONL File

Each line in tech_resumes_dataset.jsonl is a single, fully structured resume object:

import json

with open('tech_resumes_dataset.jsonl', 'r', encoding='utf-8') as f:
  resumes = [json.loads(line) for line in f]
# Each record is now a Python dictionary

Citing and Sharing

If you use this dataset, credit it as “[your Kaggle dataset URL]” and mention original sources (ResumeAtlas, Resume_Classification, Kaggle Resume Dataset, and synthetic methodology as described).

⁂

Customer_Purchase_Parquet_Dataset
kaggle.com
zip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kiran Shridhar Alva (2025). Customer_Purchase_Parquet_Dataset [Dataset]. https://www.kaggle.com/datasets/kiranalva/customer-purchase-parquet-dataset
Explore at:
zip(92463 bytes)Available download formats
Dataset updated
Mar 27, 2025
Authors
Kiran Shridhar Alva
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview: This dataset contains synthetic customer data for a CRM system in Parquet format. It includes customer demographic information, transaction details, and behavioral attributes.

Data Fields: customer_id: Unique identifier for each customer (UUID).

name: Full name of the customer.

email: Email address of the customer.

join_date: The date when the customer joined the platform.

total_spent: Total money spent by the customer.

purchase_count: Number of purchases made by the customer.

last_purchase: Date of the last purchase made by the customer.

File Format: Parquet: The dataset is stored in Parquet format. It provides better performance and compression compared to CSV.

Use Cases: Customer segmentation

Transaction analysis

Predictive modeling

Notes: This dataset was generated synthetically and does not represent real customers.

The data was generated using the Faker library and random values.

AdventureWorks 2022 Denormalized

kaggle.com

Updated Nov 25, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Bhavesh J (2024). AdventureWorks 2022 Denormalized [Dataset]. https://www.kaggle.com/datasets/bjaising/adventureworks-2022-denormalized

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 25, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Bhavesh J

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Adventure Works 2022 Denormalized dataset

How this Dataset is created?

The CSV data was sourced from the existing Kaggle dataset titled "Adventure Works 2022" by Algorismus. This data was normalized and consisted of seven individual CSV files. The Sales table served as a fact table that connected to other dimensions. To consolidate all the data into a single table, it was loaded into a SQLite database and transformed accordingly. The final denormalized table was then exported as a single CSV file (delimited by | ), and the column names were updated to follow snake_case style.

DOI

doi.org/10.6084/m9.figshare.27899706

Data Dictionary

Column Name	Description
sales_order_number	Unique identifier for each sales order.
sales_order_date	The date and time when the sales order was placed. (e.g., Friday, August 25, 2017)
sales_order_date_day_of_week	The day of the week when the sales order was placed (e.g., Monday, Tuesday).
sales_order_date_month	The month when the sales order was placed (e.g., January, February).
sales_order_date_day	The day of the month when the sales order was placed (1-31).
sales_order_date_year	The year when the sales order was placed (e.g., 2022).
quantity	The number of units sold in the sales order.
unit_price	The price per unit of the product sold.
total_sales	The total sales amount for the sales order (quantity * unit price).
cost	The total cost associated with the products sold in the sales order.
product_key	Unique identifier for the product sold.
product_name	The name of the product sold.
reseller_key	Unique identifier for the reseller.
reseller_name	The name of the reseller.
reseller_business_type	The type of business of the reseller (e.g., Warehouse, Value Reseller, Specialty Bike Shop).
reseller_city	The city where the reseller is located.
reseller_state	The state where the reseller is located.
reseller_country	The country where the reseller is located.
employee_key	Unique identifier for the employee associated with the sales order.
employee_id	The ID of the employee who processed the sales order.
salesperson_fullname	The full name of the salesperson associated with the sales order.
salesperson_title	The title of the salesperson (e.g., North American Sales Manager, Sales Representative).
email_address	The email address of the salesperson.
sales_territory_key	Unique identifier for the sales territory for the actual sale. (e.g. 3)
assigned_sales_territory	List of sales_territory_key separated by comma assigned to the salesperson. (e.g., 3,4)
sales_territory_region	The region of the sales territory. US territory broken down in regions. International regions listed as country name (e.g., Northeast, France).
sales_territory_country	The country associated with the sales territory.
sales_territory_group	The group classification of the sales territory. (e.g., Europe, North America, Pacific)
target	The ...

User Subscription Dummy Data
kaggle.com
Updated Sep 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nitin Choudhary (2022). User Subscription Dummy Data [Dataset]. https://www.kaggle.com/datasets/nitinchoudhary012/user-subscription-dummy-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nitin Choudhary
Description
This data is purely random and created for learning purpose.

In situations where data is not readily available but needed, you'll have to resort to building up the data yourself. There are many methods you can use to acquire this data from web scraping to APIs. But sometimes, you'll end up needing to create fake or “dummy” data. Dummy data can be useful in times where you know the exact features you’ll be using and the data types included but, you just don’t have the data itself.

Features Description

ID — a unique string of characters to identify each user.

Gender — string data type of three choices.

Subscriber — a binary True/False choice of their subscription status.

Name — string data type of the first and last name of the user.

Email —string data type of the email address of the user.

Last Login — string data type of the last login time.

Date of Birth — string format of year-month-day.

Education — current education level as a string data type.

Bio — short string descriptions of random words.

Rating — integer type of a 1 through 5 rating of something.

Note - This Data is Purely Random (Dummy Data). if you wish, you can perform some data visualization and model building part into it.

Reference - https://towardsdatascience.com/build-a-your-own-custom-dataset-using-python-9296540a0178
SPORTS_DATA_ANALYSIS_ON_EXCEL
kaggle.com
zip
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
Explore at:
zip(1203633 bytes)Available download formats
Dataset updated
Dec 12, 2024
Authors
Nil kamal Saha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
PROJECT OBJECTIVE

We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

Questions (KPIs)

TASK 1: STANDARDIZING THE DATASET

Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)

Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data

Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data

Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)

Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

TASK 2: DATA FORMATING

Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)

Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)

Display the units for the WEIGHT column (Prescribed format example: 80 kg)

Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

In COLUMNS; Group : GENDER.

In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).

In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.

Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).

In the cross table,get the count of candidates from each COUNTRY and GENDER type.

TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

Change the report layout to TABULAR form.

Remove expand and collapse buttons.

Remove GRAND TOTALs.

Allow user to filter the data by SPORT LOCATION.

Process

Verify data for any missing values and anomalies, and sort out the same.

Made sure data is consistent and clean with respect to data type, data format and values used.

Created pivot tables according to the questions asked.

Hyderabad_house_price

kaggle.com

zip

Updated Jul 1, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohammed Faisal Parvez (2024). Hyderabad_house_price [Dataset]. https://www.kaggle.com/datasets/faisal012/hyderabad-house-price

Explore at:

zip(43970 bytes)Available download formats

Dataset updated

Jul 1, 2024

Authors

Mohammed Faisal Parvez

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Area covered

Hyderabad

Description

Dataset Description: Hyderabad City House Prices

Overview

The Hyderabad City House Prices dataset is a detailed collection of real estate data for residential properties across various localities in Hyderabad. This dataset is aimed at real estate analysts, data scientists, urban planners, and researchers who are interested in studying the housing market, price trends, and neighborhood dynamics within Hyderabad, one of India's rapidly growing metropolitan cities.

Features

The dataset includes the following features:

Title: The headline or main title of the property listing.
Location: Specific address or locality details within Hyderabad.
Price (L): The listed price of the property in Indian Lakhs.
Rate per Sqft: The cost per square foot of the property.
Area in Sqft: The total area of the property in square feet.
Building Status: The construction status of the property (e.g., Under Construction, Ready to Move).

Usage

This dataset can be utilized for various purposes, including: - Market Analysis: Understanding pricing trends, supply and demand, and market conditions in different localities of Hyderabad. - Price Prediction Models: Developing machine learning models to predict property prices based on the given features. - Investment Analysis: Identifying potential investment opportunities by analyzing location, property type, and price data. - Urban Planning: Assisting urban planners in understanding housing distribution and development trends across the city.

Data Collection

The data has been scraped from popular real estate websites such as Magicbricks, 99acres, and Housing.com using the Scrapy framework. The data was collected in [insert month/year] and represents a snapshot of the real estate market in Hyderabad at that time.

Sample Data

Title	Location	Price (L)	Rate per Sqft	Area in Sqft	Building Status
Luxurious 3 BHK Apartment	Jubilee Hills	300	15,000	2000	Ready to Move
Spacious 4 BHK Villa	Gachibowli	450	10,000	4500	Under Construction
Affordable 2 BHK Flat	Madhapur	80	8,000	1000	Ready to Move

Contact

For more information or to access the dataset, please contact [Your Name] at [Your Email Address].

This dataset provides valuable insights into Hyderabad's diverse real estate market, helping stakeholders make informed decisions based on accurate and up-to-date data.

Facebook

Twitter

Click to copy link

Link copied

Cite

Shuvo Kumar Basak-4004 (2023). Fake Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/fake-dataset-for-practice

Fake Dataset for Practice

Explore at:

zip(1515599 bytes)Available download formats

Dataset updated

Aug 21, 2023

Authors

Shuvo Kumar Basak-4004

Description

Description: This dataset is created solely for the purpose of practice and learning. It contains entirely fake and fabricated information, including names, phone numbers, emails, cities, ages, and other attributes. None of the information in this dataset corresponds to real individuals or entities. It serves as a resource for those who are learning data manipulation, analysis, and machine learning techniques. Please note that the data is completely fictional and should not be treated as representing any real-world scenarios or individuals.

Attributes: - phone_number: Fake phone numbers in various formats. - name: Fictitious names generated for practice purposes. - email: Imaginary email addresses created for the dataset. - city: Made-up city names to simulate geographical diversity. - age: Randomly generated ages for practice analysis. - sex: Simulated gender values (Male, Female). - married_status: Synthetic marital status information. - job: Fictional job titles for practicing data analysis. - income: Fake income values for learning data manipulation. - religion: Pretend religious affiliations for practice. - nationality: Simulated nationalities for practice purposes.

Please be aware that this dataset is not based on real data and should be used exclusively for educational purposes.

Clear search

Close search

Google apps

Main menu

Fake Dataset for Practice

Fake Employee Dataset

Employee Records Dataset

Realistic Email Categorization Dataset (Synthetic)

Features:

Key Highlights:

Applications:

Eazydinner-ahmedabad-dataset

LMS Tracking Dataset

Dataset Information:

Note: This dataset is partially synthetic meaning names, email and contact details mentioned are not of the actual customers. Kindly use it for educational and research purposes.

Saree Retailers Database in India

Saree Retailers Database in India

Accurate Up-to-Date Data for All Types of Business Purposes

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Invoices Dataset

Retail Analysis on Large Dataset

Dataset Description:

Key Points:

Customer Information:

54k Resume dataset (structured)

Dataset Overview

Table Schemas

1. people.csv

2. abilities.csv

3. education.csv

4. experience.csv

4. person_skills.csv

5. skills.csv

Relationships

Data Characteristics

Date Formats

Text Fields

Identifiers

Common Usage Patterns

Basic Queries

Analytics Queries

Healthcare Management System

Looker Ecommerce BigQuery Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

Data from: Famous Quotes Dataset

Dataset Title: Famous Quotes Dataset

Subtitle: A collection of famous quotes from notable figures.

Description

Dataset Details

Sample Data

Source

Usage

License

Acknowledgements

Additional Information

SAP DATASET | BigQuery Dataset

Dataset Description: SAP Replicated Data

Tables:

Resume_Dataset

Tech Resume Dataset (3,500+ Samples):

Dataset Composition and Sourcing

Key Dataset Fields (JSONL Schema)

Technical Validation & Quality Assurance

Role & Source Coverage Visualizations

Typical Use Cases

How to Use the JSONL File

Citing and Sharing

Customer_Purchase_Parquet_Dataset

AdventureWorks 2022 Denormalized

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`