29 datasets found
  1. Fake Dataset for Practice

    • kaggle.com
    zip
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Kumar Basak-4004 (2023). Fake Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/fake-dataset-for-practice
    Explore at:
    zip(1515599 bytes)Available download formats
    Dataset updated
    Aug 21, 2023
    Authors
    Shuvo Kumar Basak-4004
    Description

    Description: This dataset is created solely for the purpose of practice and learning. It contains entirely fake and fabricated information, including names, phone numbers, emails, cities, ages, and other attributes. None of the information in this dataset corresponds to real individuals or entities. It serves as a resource for those who are learning data manipulation, analysis, and machine learning techniques. Please note that the data is completely fictional and should not be treated as representing any real-world scenarios or individuals.

    Attributes: - phone_number: Fake phone numbers in various formats. - name: Fictitious names generated for practice purposes. - email: Imaginary email addresses created for the dataset. - city: Made-up city names to simulate geographical diversity. - age: Randomly generated ages for practice analysis. - sex: Simulated gender values (Male, Female). - married_status: Synthetic marital status information. - job: Fictional job titles for practicing data analysis. - income: Fake income values for learning data manipulation. - religion: Pretend religious affiliations for practice. - nationality: Simulated nationalities for practice purposes.

    Please be aware that this dataset is not based on real data and should be used exclusively for educational purposes.

  2. Fake Employee Dataset

    • kaggle.com
    zip
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oyekanmi Olamilekan (2023). Fake Employee Dataset [Dataset]. https://www.kaggle.com/datasets/oyekanmiolamilekan/fake-employee-dataset
    Explore at:
    zip(162874 bytes)Available download formats
    Dataset updated
    Nov 20, 2023
    Authors
    Oyekanmi Olamilekan
    Description

    Creating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.

    Code Url: https://github.com/intellisenseCodez/faker-data-generator

  3. Employee Records Dataset

    • kaggle.com
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cankat Saraç (2023). Employee Records Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/employee-records-dataset
    Explore at:
    zip(98365 bytes)Available download formats
    Dataset updated
    Mar 28, 2023
    Authors
    Cankat Saraç
    Description

    Description: This dataset contains simulated employee records for a fictional company. The dataset was generated using the Python Faker library to create realistic but fake data. The dataset includes the following fields for each employee:

    Employee ID: A unique identifier for each employee (integer). Name: A randomly generated full name (string). Job title: A randomly generated job title (string). Department: A randomly selected department from a predefined list (HR, Marketing, Sales, IT, or Finance) (string). Email: A randomly generated email address (string). Phone number: A randomly generated phone number (string). Date of hiring: A randomly generated hiring date within the last 10 years (date). Salary: A randomly generated salary value between 30,000 and 150,000 (decimal). Please note that this dataset is for demonstration and testing purposes only. The data is entirely fictional and should not be used for any decision-making or analysis.

  4. Realistic Email Categorization Dataset (Synthetic)

    • kaggle.com
    zip
    Updated Jan 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fenil Sonani (2025). Realistic Email Categorization Dataset (Synthetic) [Dataset]. https://www.kaggle.com/datasets/fenilsonani/email-data-for-email-classification
    Explore at:
    zip(2746947 bytes)Available download formats
    Dataset updated
    Jan 4, 2025
    Authors
    Fenil Sonani
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset, titled "Realistic Email Categorization Dataset for BERT (Synthetic)," contains 20,000 entries of diverse and realistic email addresses generated using a Python script. The dataset is meticulously crafted to mimic real-world email categorization scenarios, making it an excellent resource for training and evaluating machine learning models, particularly transformer-based models like BERT.

    Features:

    • Email Address: The complete email address (e.g., john.doe@example.com).
    • Category: Broad classification of the email (e.g., Sales, Support, Marketing).
    • Subcategory: Granular classification within the main category (e.g., Technical Support, Domestic Sales).
    • Local Part: The part of the email address before the @ symbol.
    • Domain: The part of the email address after the @ symbol.
    • Length: Total character count of the email address.
    • Character Bi-grams & Tri-grams: Sequences of two and three consecutive characters extracted from the local part.
    • Email Content: Randomly generated textual snippets associated with the email.
    • Timestamp: Simulated creation or usage timestamp within the past two years.
    • Disposable & Spam Indicators: Boolean flags indicating whether the email is disposable or marked as spam.
    • Country & Language: Geographical and linguistic metadata derived from the domain.

    Key Highlights:

    • The data is entirely synthetic and generated using the Faker Python library and additional randomization techniques.
    • Designed to simulate realistic email structures, including name-based, role-based, and department-specific addresses.
    • Features a diverse range of domains, subdomains, TLDs, and email patterns, ensuring applicability across a wide range of machine learning and natural language processing tasks.
    • Includes advanced annotations like email content, spam indicators, and geographical metadata, providing rich contextual information for model training.

    Applications:

    • Email Classification: Train models to categorize emails into predefined categories.
    • Spam Detection: Use the spam indicators to train or evaluate anti-spam algorithms.
    • Feature Engineering: Explore the impact of local part, domain, and character n-grams on machine learning models.
    • BERT Fine-Tuning: Leverage the email content and category labels to fine-tune transformer models for NLP tasks.
  5. Eazydinner-ahmedabad-dataset

    • kaggle.com
    zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    crawlmagic (2023). Eazydinner-ahmedabad-dataset [Dataset]. https://www.kaggle.com/datasets/crawlmagic/eazydinner-ahmedabad-dataset
    Explore at:
    zip(48537 bytes)Available download formats
    Dataset updated
    Jul 5, 2023
    Authors
    crawlmagic
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Area covered
    Ahmedabad
    Description

    This is a sample dataset of eazydinner in Ahmedabad, dataset contains each piece of information. For example Restaurant name, address, URL, service type, phone no, email, and many more.

    For more details Check: Crawlmagic

  6. LMS Tracking Dataset

    • kaggle.com
    zip
    Updated May 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). LMS Tracking Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/lms-tracking-dataset
    Explore at:
    zip(5419 bytes)Available download formats
    Dataset updated
    May 6, 2024
    Authors
    Prasad Patil
    Description

    This dataset was collected by a edtech startup. The startup is into teaching entrepreneurial life-skills in animated-gamified format through its video series to kids between the age group of 6-14 years. Through its learning management system the company tracks the progress made by all of its subscribers on the platform. Company records platform content usage activity data and tries to follow up with parents if there is any inactiveness on the platform by their child. Here's more information about the dataset

    Dataset Information:

    • Child Name: Name of the subscriber kid
    • Email Address: Email address created by parent
    • Contact: Contact details of the parent
    • follow up: Responses received by the company employee after progress follow-up over the phone.
    • response: segregating the follow-up responses in to categories
    • Introduction: Tutorial 1
    • Activity:- Know your personality, a fun way:Tutorial 2
    • A Simple Quiz on the previous Video: Quiz on the Tutorial 2
    • Lets see what ‘Product’ is…:Tutorial 3
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 3
    • Product that represents me: Tutorial 4
    • Let's see what 'Service' means: Tutorial 5
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 5
    • Instruction for 'Product & Service' worksheet:Tutorial 6
    • Activity:- Product and Service Worksheet: Exercise on Tutorial 6
    • Instructions for Product Word Association:Tutorial 7
    • Activity:- Product Word Association:Exercise on Tutorial 7
    • Life without products??.... Impossible !:Tutorial 8
    • What Is a Need?:Tutorial 9
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 9
    • Summary of Session 1: Summarizing all the learnings from the Tutorials 1-9
    • Your Feedback on Session 1: Feedback page

    There is some missing data as well. I hope it would be good dataset for beginners practicing their NLP skills.

    Image by Steven Weirather from Pixabay

    Note: This dataset is partially synthetic meaning names, email and contact details mentioned are not of the actual customers. Kindly use it for educational and research purposes.

  7. Saree Retailers Database in India

    • kaggle.com
    zip
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Saree Retailers Database in India [Dataset]. https://www.kaggle.com/datasets/thedevastator/saree-retailers-database-in-india-april-2021/code
    Explore at:
    zip(5430 bytes)Available download formats
    Dataset updated
    Jan 5, 2023
    Authors
    The Devastator
    Area covered
    India
    Description

    Saree Retailers Database in India

    Accurate Up-to-Date Data for All Types of Business Purposes

    By Amresh [source]

    About this dataset

    This All India Saree Retailers Database is a comprehensive collection of up-to-date information on 10,000 Saree Retailers located all over India. The database is updated in April 2021 and offers an overall accuracy rate of around 90%.

    For business owners, marketers, and data analysts and researchers, this dataset is an invaluable resource. It contains contact details of store name, contact person names, phone number and email address along with store location information like city state and pin code to help you target the right audience precisely.

    The database can be accessed in Microsoft Excel (.xlsx) format which makes it easy to read or manipulate the file according to your needs. Apart from this wide range of payment options like Credit/Debit Card; Online Transfer; NEFT; Cash Deposit; Paytm; PhonePe; Google Pay or PayPal allow quick download access within 2-3 business hours.

    So if you are looking for reliable business intelligence data related to Indian saree retailers that can help you unlock incredible opportunities for your business then make sure to download our All India Saree Retailers Database at the earliest!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a comprehensive list of Saree retailers in India, including store name, contact person, email address, mobile number, phone number, address details like city and state along with pin code. It contains 10 thousand records updated in April 2021 with an overall accuracy rate of around 90%. This data can be used to understand customer behaviour as well as to analyse geographical customer pattern.

    Using this dataset you can: - Target specific states or cities where potential customers are located for your Saree business. - Get in touch with local Saree retailers for possible collaborations and partnerships. - Learn more about industry trends from actual store owners who can offer insights into the latest ongoing trends and identify new opportunities for you to grow your business. 4 .Analyse existing competitors’ market share by studying the cities/states where they operate and their contact information such as Mobile Number & Email Ids .
    5 .Identify potential new customers for better sales conversion rates by understanding who is already operating in similar products nearby or have similar target audience as yours that help your company reach out to them quickly & effectively using direct marketing techniques such as emails & SMS etc.,

    Research Ideas

    • Creating targeted email campaigns to increase Saree sales: The dataset can be used to create targeted email campaigns that can reach the 10,000 Saree Retailers in India. This will allow businesses to increase sales by directing their message about promotions and discounts directly to potential customers.
    • Customizing online product recommendations for each retailer: The dataset can be used to identify the specific products that each individual retailer is interested in selling, so product recommendations on an e-commerce website could be tailored accordingly. This would optimize customer experience giving them more accurate and relevant results when searching for a particular item they are looking for while shopping online.
    • Using GPS technology to generate location-based marketing campaigns: By creating geo-fenced areas around each store using the pin code database, it would be possible to send out marketing messages based on people's physical location instead of just sending them out in certain neighborhoods or cities without regard for store locations within those areas. This could help reach specific customers with relevant messages about products or promotions that may interested them more effectively than a standard marketing campaign with no location targeting involved

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: 301-Saree-Garment-Retailer-Database-Sample.csv

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Amresh.

  8. Invoices Dataset

    • kaggle.com
    zip
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cankat Saraç (2022). Invoices Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/invoices
    Explore at:
    zip(574249 bytes)Available download formats
    Dataset updated
    Jan 18, 2022
    Authors
    Cankat Saraç
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.

  9. Retail Analysis on Large Dataset

    • kaggle.com
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Prajapati (2024). Retail Analysis on Large Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/8693643
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Prajapati
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description:

    • The dataset represents retail transactional data. It contains information about customers, their purchases, products, and transaction details. The data includes various attributes such as customer ID, name, email, phone, address, city, state, zipcode, country, age, gender, income, customer segment, last purchase date, total purchases, amount spent, product category, product brand, product type, feedback, shipping method, payment method, and order status.

    Key Points:

    Customer Information:

    • Includes customer details like ID, name, email, phone, address, city, state, zipcode, country, age, and gender. Customer segments are categorized into Premium, Regular, and New. ##Transaction Details:
    • Transaction-specific data such as transaction ID, last purchase date, total purchases, amount spent, total purchase amount, feedback, shipping method, payment method, and order status. ##Product Information:
    • Contains product-related details such as product category, brand, and type. Products are categorized into electronics, clothing, grocery, books, and home decor. ##Geographic Information:
    • Contains location details including city, state, and country. Available for various countries including USA, UK, Canada, Australia, and Germany. ##Temporal Information:
    • Last purchase date is provided along with separate columns for year, month, date, and time. Allows analysis based on temporal patterns and trends. ##Data Quality:
    • Some rows contain null values, and others are duplicates, which may need to be handled during data preprocessing. Null values are randomly distributed across rows. Duplicate rows are available at different parts of the dataset. ##Potential Analysis:
    • Customer segmentation analysis based on demographics, purchase behavior, and feedback. Sales trend analysis over time to identify peak seasons or trends. Product performance analysis to determine popular categories, brands, or types. Geographic analysis to understand regional preferences and trends. Payment and shipping method analysis to optimize services. Customer satisfaction analysis based on feedback and order status. ##Data Preprocessing:
    • Handling null values and duplicates. Parsing and formatting temporal data. Encoding categorical variables. Scaling numerical variables if required. Splitting data into training and testing sets for modeling.
  10. 54k Resume dataset (structured)

    • kaggle.com
    zip
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suriya Ganesh (2024). 54k Resume dataset (structured) [Dataset]. https://www.kaggle.com/datasets/suriyaganesh/resume-dataset-structured
    Explore at:
    zip(39830263 bytes)Available download formats
    Dataset updated
    Nov 14, 2024
    Authors
    Suriya Ganesh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is aggregated from sources such as

    Entirely available in the public domain.

    Resumes are usually in pdf format. OCR was used to convert the PDF into text and LLMs were used to convert the data into a structured format.

    Dataset Overview

    This dataset contains structured information extracted from professional resumes, normalized into multiple related tables. The data includes personal information, educational background, work experience, professional skills, and abilities.

    Table Schemas

    1. people.csv

    Primary table containing core information about each individual.

    Column NameData TypeDescriptionConstraintsExample
    person_idINTEGERUnique identifier for each personPrimary Key, Not Null1
    nameVARCHAR(255)Full name of the personMay be Null"Database Administrator"
    emailVARCHAR(255)Email addressMay be Null"john.doe@email.com"
    phoneVARCHAR(50)Contact numberMay be Null"+1-555-0123"
    linkedinVARCHAR(255)LinkedIn profile URLMay be Null"linkedin.com/in/johndoe"

    2. abilities.csv

    Detailed abilities and competencies listed by individuals.

    Column NameData TypeDescriptionConstraintsExample
    person_idINTEGERReference to people tableForeign Key, Not Null1
    abilityTEXTDescription of abilityNot Null"Installation and Building Server"

    3. education.csv

    Contains educational history for each person.

    Column NameData TypeDescriptionConstraintsExample
    person_idINTEGERReference to people tableForeign Key, Not Null1
    institutionVARCHAR(255)Name of educational institutionMay be Null"Lead City University"
    programVARCHAR(255)Degree or program nameMay be Null"Bachelor of Science"
    start_dateVARCHAR(7)Start date of educationMay be Null"07/2013"
    locationVARCHAR(255)Location of institutionMay be Null"Atlanta, GA"

    4. experience.csv

    Details of work experience entries.

    Column NameData TypeDescriptionConstraintsExample
    person_idINTEGERReference to people tableForeign Key, Not Null1
    titleVARCHAR(255)Job titleMay be Null"Database Administrator"
    firmVARCHAR(255)Company nameMay be Null"Family Private Care LLC"
    start_dateVARCHAR(7)Employment start dateMay be Null"04/2017"
    end_dateVARCHAR(7)Employment end dateMay be Null"Present"
    locationVARCHAR(255)Job locationMay be Null"Roswell, GA"

    4. person_skills.csv

    Mapping table connecting people to their skills.

    Column NameData TypeDescriptionConstraintsExample
    person_idINTEGERReference to people tableForeign Key, Not Null1
    skillVARCHAR(255)Reference to skills tableForeign Key, Not Null"SQL Server"

    5. skills.csv

    Master list of unique skills mentioned across all resumes.

    Column NameData TypeDescriptionConstraintsExample
    skillVARCHAR(255)Unique skill namePrimary Key, Not Null"SQL Server"

    Relationships

    • Each person (people.csv) can have:
      • Multiple education entries (education.csv)
      • Multiple experience entries (experience.csv)
      • Multiple skills (person_skills.csv)
      • Multiple abilities (abilities.csv)
    • Skills (skills.csv) can be associated with multiple people
    • All relationships are maintained through the person_id field

    Data Characteristics

    Date Formats

    • All dates are stored in MM/YYYY format
    • Current positions use "Present" for end_date

    Text Fields

    • All text fields preserve original case
    • NULL values indicate missing information
    • No maximum length enforced for TEXT fields
    • VARCHAR fields have practical limits noted in schema

    Identifiers

    • person_id starts at 1 and increments sequentially
    • No natural or composite keys used
    • All relationships maintained through person_id

    Common Usage Patterns

    Basic Queries

    -- Get all skills for a person
    SELECT s.skill 
    FROM person_skills ps
    JOIN skills s ON ps.skill = s.skill
    WHERE ps.person_id = 1;
    
    -- Get complete work history
    SELECT * 
    FROM experience
    WHERE person_id = 1
    ORDER BY start_date DESC;
    

    Analytics Queries

    -- Most common skills
    SELECT s.skill, COUNT(*) as frequency
    FROM person_skills ps
    ...
    
  11. Healthcare Management System

    • kaggle.com
    zip
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
    Explore at:
    zip(74279 bytes)Available download formats
    Dataset updated
    Dec 23, 2023
    Authors
    Anouska Abhisikta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Patients Table:

    • PatientID: Unique identifier for each patient.
    • firstname: First name of the patient.
    • lastname: Last name of the patient.
    • email: Email address of the patient.

    This table stores information about individual patients, including their names and contact details.

    Doctors Table:

    • DoctorID: Unique identifier for each doctor.
    • DoctorName: Full name of the doctor.
    • Specialization: Area of medical specialization.
    • DoctorContact: Contact details of the doctor.

    This table contains details about healthcare providers, including their names, specializations, and contact information.

    Appointments Table:

    • AppointmentID: Unique identifier for each appointment.
    • Date: Date of the appointment.
    • Time: Time of the appointment.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
    • DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

    This table records scheduled appointments, linking patients to doctors.

    MedicalProcedure Table:

    • ProcedureID: Unique identifier for each medical procedure.
    • ProcedureName: Name or description of the medical procedure.
    • AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

    This table stores details about medical procedures associated with specific appointments.

    Billing Table:

    • InvoiceID: Unique identifier for each billing transaction.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
    • Items: Description of items or services billed.
    • Amount: Amount charged for the billing transaction.

    This table maintains records of billing transactions, associating them with specific patients.

    demo Table:

    • ID: Primary key, serves as a unique identifier for each record.
    • Name: Name of the entity.
    • Hint: Additional information or hint about the entity.

    This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

    This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

  12. Looker Ecommerce BigQuery Dataset

    • kaggle.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    Description

    Looker Ecommerce Dataset Description

    CSV version of Looker Ecommerce Dataset.

    Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

    1. distribution_centers.csv

    • Columns:
      • id: Unique identifier for each distribution center.
      • name: Name of the distribution center.
      • latitude: Latitude coordinate of the distribution center.
      • longitude: Longitude coordinate of the distribution center.

    2. events.csv

    • Columns:
      • id: Unique identifier for each event.
      • user_id: Identifier for the user associated with the event.
      • sequence_number: Sequence number of the event.
      • session_id: Identifier for the session during which the event occurred.
      • created_at: Timestamp indicating when the event took place.
      • ip_address: IP address from which the event originated.
      • city: City where the event occurred.
      • state: State where the event occurred.
      • postal_code: Postal code of the event location.
      • browser: Web browser used during the event.
      • traffic_source: Source of the traffic leading to the event.
      • uri: Uniform Resource Identifier associated with the event.
      • event_type: Type of event recorded.

    3. inventory_items.csv

    • Columns:
      • id: Unique identifier for each inventory item.
      • product_id: Identifier for the associated product.
      • created_at: Timestamp indicating when the inventory item was created.
      • sold_at: Timestamp indicating when the item was sold.
      • cost: Cost of the inventory item.
      • product_category: Category of the associated product.
      • product_name: Name of the associated product.
      • product_brand: Brand of the associated product.
      • product_retail_price: Retail price of the associated product.
      • product_department: Department to which the product belongs.
      • product_sku: Stock Keeping Unit (SKU) of the product.
      • product_distribution_center_id: Identifier for the distribution center associated with the product.

    4. order_items.csv

    • Columns:
      • id: Unique identifier for each order item.
      • order_id: Identifier for the associated order.
      • user_id: Identifier for the user who placed the order.
      • product_id: Identifier for the associated product.
      • inventory_item_id: Identifier for the associated inventory item.
      • status: Status of the order item.
      • created_at: Timestamp indicating when the order item was created.
      • shipped_at: Timestamp indicating when the order item was shipped.
      • delivered_at: Timestamp indicating when the order item was delivered.
      • returned_at: Timestamp indicating when the order item was returned.

    5. orders.csv

    • Columns:
      • order_id: Unique identifier for each order.
      • user_id: Identifier for the user who placed the order.
      • status: Status of the order.
      • gender: Gender information of the user.
      • created_at: Timestamp indicating when the order was created.
      • returned_at: Timestamp indicating when the order was returned.
      • shipped_at: Timestamp indicating when the order was shipped.
      • delivered_at: Timestamp indicating when the order was delivered.
      • num_of_item: Number of items in the order.

    6. products.csv

    • Columns:
      • id: Unique identifier for each product.
      • cost: Cost of the product.
      • category: Category to which the product belongs.
      • name: Name of the product.
      • brand: Brand of the product.
      • retail_price: Retail price of the product.
      • department: Department to which the product belongs.
      • sku: Stock Keeping Unit (SKU) of the product.
      • distribution_center_id: Identifier for the distribution center associated with the product.

    7. users.csv

    • Columns:
      • id: Unique identifier for each user.
      • first_name: First name of the user.
      • last_name: Last name of the user.
      • email: Email address of the user.
      • age: Age of the user.
      • gender: Gender of the user.
      • state: State where t...
  13. Data from: Famous Quotes Dataset

    • kaggle.com
    zip
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dev Bhise (2024). Famous Quotes Dataset [Dataset]. https://www.kaggle.com/datasets/devbhise/quote-of-genius
    Explore at:
    zip(211995 bytes)Available download formats
    Dataset updated
    Aug 26, 2024
    Authors
    Dev Bhise
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Title: Famous Quotes Dataset

    Subtitle: A collection of famous quotes from notable figures.

    Description

    This dataset contains a curated collection of quotes from various renowned individuals. Each entry in the dataset includes the text of the quote and the name of the author. It is designed for use in text analysis, natural language processing (NLP), and sentiment analysis tasks.

    Dataset Details

    • Number of Records: 500
    • Columns:
      • quote: The text of the quote.
      • author: The name of the author of the quote.

    Sample Data

    QuoteAuthor
    “Whatever you do, you need courage. ...”Ralph Waldo Emerson
    “To be yourself in a world that is constantly trying to make you something else is the greatest accomplishment.”Ralph Waldo Emerson

    Source

    The quotes have been collected from various sources including books, websites, and public domain materials. The data has been verified for accuracy to the best extent possible.

    Usage

    This dataset is suitable for: - Sentiment Analysis - Text Classification - NLP Models - Data Visualization

    License

    Specify the license under which the dataset is distributed. For example, "Creative Commons Attribution 4.0 International (CC BY 4.0)" or any other license that fits your requirements.

    Acknowledgements

    Acknowledgements to any contributors or sources of the dataset if applicable.

    Additional Information

    • Data Quality: The dataset is carefully curated, but please verify individual quotes for accuracy as needed.
    • Contact Information: If you have any questions or need further information, please contact [Your Email Address or Contact Information].
  14. SAP DATASET | BigQuery Dataset

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion
    Explore at:
    zip(365940125 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Mustafa Keser
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

    Dataset Description: SAP Replicated Data

    Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

    Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

    Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

    Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

    Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

    For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

    Tables:

    Here's a Markdown table with the information you provided:

    File NameDescription
    adr6.csvAddresses with organizational units. Contains address details related to organizational units like departments or branches.
    adrc.csvGeneral Address Data. Provides information about addresses, including details such as street, city, and postal codes.
    adrct.csvAddress Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
    adrt.csvAddress Details. Includes detailed address data such as street addresses, city, and country codes.
    ankt.csvAccounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
    anla.csvAsset Master Data. Contains information about fixed assets, including asset identification and classification.
    bkpf.csvAccounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
    bseg.csvAccounting Document Segment. Details line items within accounting documents, including account details and amounts.
    but000.csvBusiness Partners. Contains basic information about business partners, including IDs and names.
    but020.csvBusiness Partner Addresses. Provides address details associated with business partners.
    cepc.csvCustomer Master Data - Central. Contains centralized data for customer master records.
    cepct.csvCustomer Master Data - Contact. Provides contact details associated with customer records.
    csks.csvCost Center Master Data. Contains data about cost centers within the organization.
    cskt.csvCost Center Texts. Provides text descriptions and labels for cost centers.
    dd03l.csvData Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
    ekbe.csvPurchase Order History. Details history of purchase orders, including quantities and values.
    ekes.csvPurchasing Document History. Contains history of purchasing documents including changes and statuses.
    eket.csvPurchase Order Item History. Details changes and statuses for individual purchase order items.
    ekkn.csvPurchase Order Account Assignment. Provides account assignment details for purchas...
  15. Resume_Dataset

    • kaggle.com
    zip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RayyanKauchali0 (2025). Resume_Dataset [Dataset]. https://www.kaggle.com/datasets/rayyankauchali0/resume-dataset
    Explore at:
    zip(3616108 bytes)Available download formats
    Dataset updated
    Jul 26, 2025
    Authors
    RayyanKauchali0
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Tech Resume Dataset (3,500+ Samples):

    This dataset is designed for cutting-edge NLP research in resume parsing, job classification, and ATS system development. Below are extensive details and several ready-made diagrams you can include in your Kaggle upload (just save and upload as “Additional Files” or use them in your dataset description).

    Dataset Composition and Sourcing

    • Total Resumes: 3,500+
    • Sources:
      • Real Data: 2,047 resumes (58.5%) from ResumeAtlas and reputable open repositories; all records strictly anonymized.
      • Template-Based Synthetic: 573 resumes featuring varied narratives and realistic achievements for classic, modern, and professional styles.
      • LLM-Generated Variations: 460 unique samples using structured prompts to diversify skills, summaries, and career tracks, focusing on AI, ML, and data.
      • Faker-Seeded Synthetic: 420 resumes, especially for junior/support/cloud/network tracks, populated with robust Faker-generated work and education fields.
    • Role Coverage:
      • 15 major technology clusters (Software Engineering, DevOps, Cloud, AI/ML, Security, Data Engineering, QA, UI/UX, and more)
      • At least 200 samples per primary role group for label balance
      • 60+ subcategories reflecting granular tech job roles

    Key Dataset Fields (JSONL Schema)

    FieldDescriptionExample/Data Type
    ResumeIDUnique, anonymized string"DIS4JE91Z..." (string)
    CategoryTech job category/label"DevOps Engineer"
    NameAnonymized (Faker-generated) name"Jordan Patel"
    EmailAnonymized email address"jpatel@example.com"
    PhoneAnonymized phone number"+1-555-343-2123"
    LocationCity, country or region (anonymized)"Austin, TX, USA"
    SummaryProfessional summary/introString (3-6 sentences)
    SkillsList or comma-separated tech/soft skills"Python, Kubernetes..."
    ExperienceWork chronology, organizations, bullet-point detailsString (multiline)
    EducationUniversities, degrees, certsString (multiline)
    Source"real", "template", "llm", "faker"String

    https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/a5b5a057-7265-4428-9827-0a4c92f88d19/0e26c38c.png" alt="Dataset Schema Overview with Field Descriptions and Data Types">

    Dataset Schema Overview with Field Descriptions and Data Types

    Technical Validation & Quality Assurance

    • Formatting:
      • Uniform schema, right-tab alignment for dates (MMM-YYYY)
      • Standard ATS/NLP-friendly section headers
    • De-duplication:
      • All records checked with BERT/MinHash for uniqueness (cosine similarity >0.9 removed)
    • PII Scrubbing:
      • Names, contacts, locations anonymized with Python Faker
    • Role/Skill Taxonomy:
      • Job titles & skills mapped to ESCO, O*NET, NIST NICE, CNCF lexicons for research alignment
    • Quality Checks:
      • Automatic and manual validation for section presence, data type conformity, and format alignment

    Role & Source Coverage Visualizations

    Composition by Data Source:

    https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/a5aafe90-c5b6-4d07-ad9c-cf5244266561/5723c094.png" alt="Composition of Tech Resume Dataset by Data Source">

    Composition of Tech Resume Dataset by Data Source

    Role Cluster Diversity:

    https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/8c6ba5d6-f676-4213-b4f7-16a133081e00/e9cc61b6.png" alt="Distribution of Major Tech Role Clusters in the 3,500 Resumes Dataset">

    Distribution of Major Tech Role Clusters in the 3,500 Resumes Dataset

    Alternative: Dataset by Source Type (Pie Chart):

    https://ppl-ai-code-interpreter-files.s3.amazonaws.com/web/direct-files/626086319755b5c5810ff838ca0c0c3b/2325f133-7fe5-4294-9a9d-4db19be3584f/b85a47bd.png" alt="Resume Dataset Composition by Source Type">

    Resume Dataset Composition by Source Type

    Typical Use Cases

    • Resume parsing & sectioning (training for models like BERT, RoBERTa, spaCy)
    • Fine-tuning for NER, job classification (60+ labels), skill extraction, and ATS research
    • Development or benchmarking of AI-powered job matching, candidate ranking, and automated tracking tools
    • ML/data science education and demo pipelines

    How to Use the JSONL File

    Each line in tech_resumes_dataset.jsonl is a single, fully structured resume object:

    import json
    
    with open('tech_resumes_dataset.jsonl', 'r', encoding='utf-8') as f:
      resumes = [json.loads(line) for line in f]
    # Each record is now a Python dictionary
    

    Citing and Sharing

    If you use this dataset, credit it as “[your Kaggle dataset URL]” and mention original sources (ResumeAtlas, Resume_Classification, Kaggle Resume Dataset, and synthetic methodology as described).

  16. Customer_Purchase_Parquet_Dataset

    • kaggle.com
    zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiran Shridhar Alva (2025). Customer_Purchase_Parquet_Dataset [Dataset]. https://www.kaggle.com/datasets/kiranalva/customer-purchase-parquet-dataset
    Explore at:
    zip(92463 bytes)Available download formats
    Dataset updated
    Mar 27, 2025
    Authors
    Kiran Shridhar Alva
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview: This dataset contains synthetic customer data for a CRM system in Parquet format. It includes customer demographic information, transaction details, and behavioral attributes.

    Data Fields: customer_id: Unique identifier for each customer (UUID).

    name: Full name of the customer.

    email: Email address of the customer.

    join_date: The date when the customer joined the platform.

    total_spent: Total money spent by the customer.

    purchase_count: Number of purchases made by the customer.

    last_purchase: Date of the last purchase made by the customer.

    File Format: Parquet: The dataset is stored in Parquet format. It provides better performance and compression compared to CSV.

    Use Cases: Customer segmentation

    Transaction analysis

    Predictive modeling

    Notes: This dataset was generated synthetically and does not represent real customers.

    The data was generated using the Faker library and random values.

  17. AdventureWorks 2022 Denormalized

    • kaggle.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavesh J (2024). AdventureWorks 2022 Denormalized [Dataset]. https://www.kaggle.com/datasets/bjaising/adventureworks-2022-denormalized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhavesh J
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adventure Works 2022 Denormalized dataset

    How this Dataset is created?

    The CSV data was sourced from the existing Kaggle dataset titled "Adventure Works 2022" by Algorismus. This data was normalized and consisted of seven individual CSV files. The Sales table served as a fact table that connected to other dimensions. To consolidate all the data into a single table, it was loaded into a SQLite database and transformed accordingly. The final denormalized table was then exported as a single CSV file (delimited by | ), and the column names were updated to follow snake_case style.

    DOI

    doi.org/10.6084/m9.figshare.27899706

    Data Dictionary

    Column NameDescription
    sales_order_numberUnique identifier for each sales order.
    sales_order_dateThe date and time when the sales order was placed. (e.g., Friday, August 25, 2017)
    sales_order_date_day_of_weekThe day of the week when the sales order was placed (e.g., Monday, Tuesday).
    sales_order_date_monthThe month when the sales order was placed (e.g., January, February).
    sales_order_date_dayThe day of the month when the sales order was placed (1-31).
    sales_order_date_yearThe year when the sales order was placed (e.g., 2022).
    quantityThe number of units sold in the sales order.
    unit_priceThe price per unit of the product sold.
    total_salesThe total sales amount for the sales order (quantity * unit price).
    costThe total cost associated with the products sold in the sales order.
    product_keyUnique identifier for the product sold.
    product_nameThe name of the product sold.
    reseller_keyUnique identifier for the reseller.
    reseller_nameThe name of the reseller.
    reseller_business_typeThe type of business of the reseller (e.g., Warehouse, Value Reseller, Specialty Bike Shop).
    reseller_cityThe city where the reseller is located.
    reseller_stateThe state where the reseller is located.
    reseller_countryThe country where the reseller is located.
    employee_keyUnique identifier for the employee associated with the sales order.
    employee_idThe ID of the employee who processed the sales order.
    salesperson_fullnameThe full name of the salesperson associated with the sales order.
    salesperson_titleThe title of the salesperson (e.g., North American Sales Manager, Sales Representative).
    email_addressThe email address of the salesperson.
    sales_territory_keyUnique identifier for the sales territory for the actual sale. (e.g. 3)
    assigned_sales_territoryList of sales_territory_key separated by comma assigned to the salesperson. (e.g., 3,4)
    sales_territory_regionThe region of the sales territory. US territory broken down in regions. International regions listed as country name (e.g., Northeast, France).
    sales_territory_countryThe country associated with the sales territory.
    sales_territory_groupThe group classification of the sales territory. (e.g., Europe, North America, Pacific)
    targetThe ...
  18. User Subscription Dummy Data

    • kaggle.com
    Updated Sep 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitin Choudhary (2022). User Subscription Dummy Data [Dataset]. https://www.kaggle.com/datasets/nitinchoudhary012/user-subscription-dummy-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nitin Choudhary
    Description

    This data is purely random and created for learning purpose.

    In situations where data is not readily available but needed, you'll have to resort to building up the data yourself. There are many methods you can use to acquire this data from web scraping to APIs. But sometimes, you'll end up needing to create fake or “dummy” data. Dummy data can be useful in times where you know the exact features you’ll be using and the data types included but, you just don’t have the data itself.

    Features Description

    • ID — a unique string of characters to identify each user.
    • Gender — string data type of three choices.
    • Subscriber — a binary True/False choice of their subscription status.
    • Name — string data type of the first and last name of the user.
    • Email —string data type of the email address of the user.
    • Last Login — string data type of the last login time.
    • Date of Birth — string format of year-month-day.
    • Education — current education level as a string data type.
    • Bio — short string descriptions of random words.
    • Rating — integer type of a 1 through 5 rating of something.

    Note - This Data is Purely Random (Dummy Data). if you wish, you can perform some data visualization and model building part into it.

    Reference - https://towardsdatascience.com/build-a-your-own-custom-dataset-using-python-9296540a0178

  19. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
  20. Hyderabad_house_price

    • kaggle.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Faisal Parvez (2024). Hyderabad_house_price [Dataset]. https://www.kaggle.com/datasets/faisal012/hyderabad-house-price
    Explore at:
    zip(43970 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Authors
    Mohammed Faisal Parvez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Hyderabad
    Description

    Dataset Description: Hyderabad City House Prices

    Overview

    The Hyderabad City House Prices dataset is a detailed collection of real estate data for residential properties across various localities in Hyderabad. This dataset is aimed at real estate analysts, data scientists, urban planners, and researchers who are interested in studying the housing market, price trends, and neighborhood dynamics within Hyderabad, one of India's rapidly growing metropolitan cities.

    Features

    The dataset includes the following features:

    1. Title: The headline or main title of the property listing.
    2. Location: Specific address or locality details within Hyderabad.
    3. Price (L): The listed price of the property in Indian Lakhs.
    4. Rate per Sqft: The cost per square foot of the property.
    5. Area in Sqft: The total area of the property in square feet.
    6. Building Status: The construction status of the property (e.g., Under Construction, Ready to Move).

    Usage

    This dataset can be utilized for various purposes, including: - Market Analysis: Understanding pricing trends, supply and demand, and market conditions in different localities of Hyderabad. - Price Prediction Models: Developing machine learning models to predict property prices based on the given features. - Investment Analysis: Identifying potential investment opportunities by analyzing location, property type, and price data. - Urban Planning: Assisting urban planners in understanding housing distribution and development trends across the city.

    Data Collection

    The data has been scraped from popular real estate websites such as Magicbricks, 99acres, and Housing.com using the Scrapy framework. The data was collected in [insert month/year] and represents a snapshot of the real estate market in Hyderabad at that time.

    Sample Data

    TitleLocationPrice (L)Rate per SqftArea in SqftBuilding Status
    Luxurious 3 BHK ApartmentJubilee Hills30015,0002000Ready to Move
    Spacious 4 BHK VillaGachibowli45010,0004500Under Construction
    Affordable 2 BHK FlatMadhapur808,0001000Ready to Move

    Contact

    For more information or to access the dataset, please contact [Your Name] at [Your Email Address].

    This dataset provides valuable insights into Hyderabad's diverse real estate market, helping stakeholders make informed decisions based on accurate and up-to-date data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shuvo Kumar Basak-4004 (2023). Fake Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/fake-dataset-for-practice
Organization logo

Fake Dataset for Practice

Explore at:
zip(1515599 bytes)Available download formats
Dataset updated
Aug 21, 2023
Authors
Shuvo Kumar Basak-4004
Description

Description: This dataset is created solely for the purpose of practice and learning. It contains entirely fake and fabricated information, including names, phone numbers, emails, cities, ages, and other attributes. None of the information in this dataset corresponds to real individuals or entities. It serves as a resource for those who are learning data manipulation, analysis, and machine learning techniques. Please note that the data is completely fictional and should not be treated as representing any real-world scenarios or individuals.

Attributes: - phone_number: Fake phone numbers in various formats. - name: Fictitious names generated for practice purposes. - email: Imaginary email addresses created for the dataset. - city: Made-up city names to simulate geographical diversity. - age: Randomly generated ages for practice analysis. - sex: Simulated gender values (Male, Female). - married_status: Synthetic marital status information. - job: Fictional job titles for practicing data analysis. - income: Fake income values for learning data manipulation. - religion: Pretend religious affiliations for practice. - nationality: Simulated nationalities for practice purposes.

Please be aware that this dataset is not based on real data and should be used exclusively for educational purposes.

Search
Clear search
Close search
Google apps
Main menu