100+ datasets found
  1. Exploratory Data Analysis on Automobile Dataset

    • kaggle.com
    zip
    Updated Sep 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monis Ahmad (2022). Exploratory Data Analysis on Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/monisahmad/automobile
    Explore at:
    zip(4915 bytes)Available download formats
    Dataset updated
    Sep 12, 2022
    Authors
    Monis Ahmad
    Description

    Dataset

    This dataset was created by Monis Ahmad

    Contents

  2. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  3. Capstone Project TikTok - EDA

    • kaggle.com
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sohail K. Nikouzad (2023). Capstone Project TikTok - EDA [Dataset]. https://www.kaggle.com/datasets/sohailnikouzad/capstone-pr0ject-tiktok-eda
    Explore at:
    zip(52324 bytes)Available download formats
    Dataset updated
    Nov 15, 2023
    Authors
    Sohail K. Nikouzad
    Description

    Dataset

    This dataset was created by Sohail K. Nikouzad

    Contents

  4. Weather DataSet

    • kaggle.com
    zip
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Namrah Shaikh (2023). Weather DataSet [Dataset]. https://www.kaggle.com/datasets/namrahshaikh/weather-dataset
    Explore at:
    zip(102936 bytes)Available download formats
    Dataset updated
    Jul 11, 2023
    Authors
    Namrah Shaikh
    Description

    This is a Weather dataset analysis project where basic libraries, statistics and different functions of python are used for data analysis. Exploratory Data Analysis has been also implemented to gain better insights.

  5. Exploratory Data Analysis

    • kaggle.com
    zip
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saubhagya Mishra (2025). Exploratory Data Analysis [Dataset]. https://www.kaggle.com/datasets/saubhagyamishra1992/exploratory-data-analysis/versions/1
    Explore at:
    zip(438523 bytes)Available download formats
    Dataset updated
    Feb 26, 2025
    Authors
    Saubhagya Mishra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Saubhagya Mishra

    Released under MIT

    Contents

  6. o

    Whistlerlib: a distributed computing library for exploratory data analysis...

    • repositorio.observatoriogeo.mx
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Whistlerlib: a distributed computing library for exploratory data analysis on large social network datasets - Dataset - Repositorio del Observatorio Metropolitano CentroGeo [Dataset]. http://repositorio.observatoriogeo.mx/dataset/1ee805b50082
    Explore at:
    Dataset updated
    Oct 21, 2025
    Description

    At least 350k posts are published on X, 510k comments are posted on Facebook, and 66k pictures and videos are shared on Instagram each minute. These large datasets require substantial processing power, even if only a percentage is collected for analysis and research. To face this challenge, data scientists can now use computer clusters deployed on various IaaS and PaaS services in the cloud. However, scientists still have to master the design of distributed algorithms and be familiar with using distributed computing programming frameworks. It is thus essential to generate tools that provide analysis methods to leverage the advantages of computer clusters for processing large amounts of social network text. This paper presents Whistlerlib, a new Python library for conducting exploratory analysis on large text datasets on social networks. Whistlerlib implements distributed versions of various social media, sentiment, and social network analysis methods that can run atop computer clusters. We experimentally demonstrate the scalability of the various Whistlerlib distributed methods when deployed on a public cloud platform. We also present a practical example of the analysis of posts on the social network X about the Mexico City subway to showcase the features of Whistlerlib in scenarios where social network analysis tools are needed to address issues with a social dimension.

  7. Electronics Store Sales Dataset for EDA

    • kaggle.com
    zip
    Updated Feb 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sinjoy Saha (2021). Electronics Store Sales Dataset for EDA [Dataset]. https://www.kaggle.com/sinjoysaha/sales-analysis-dataset
    Explore at:
    zip(2505035 bytes)Available download formats
    Dataset updated
    Feb 13, 2021
    Authors
    Sinjoy Saha
    Description

    Content

    This is a transactions data from an Electronics store chain in the US. The data contains 12 CSV files for each month of 2019. The naming convention is as follows: Sales_[MONTH_NAME]_2019 Each file contains anywhere from around 9000 to 26000 rows and 6 columns. The columns are as follows: Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address There are around 186851 data points combining all the 12-month files. There may be null values in some rows.

    Inspiration

    Keith Galli

    Acknowledgements

  8. Sample Park Analysis

    • figshare.com
    zip
    Updated Nov 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Delmelle (2025). Sample Park Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.30509021.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 2, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Eric Delmelle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    README – Sample Park Analysis## OverviewThis repository contains a Google Colab / Jupyter notebook and accompanying dataset used for analyzing park features and associated metrics. The notebook demonstrates data loading, cleaning, and exploratory analysis of the Hope_Park_original.csv file.## Contents- sample park analysis.ipynb — The main analysis notebook (Colab/Jupyter format)- Hope_Park_original.csv — Source dataset containing park information- README.md — Documentation for the contents and usage## Usage1. Open the notebook in Google Colab or Jupyter.2. Upload the Hope_Park_original.csv file to the working directory (or adjust the file path in the notebook).3. Run each cell sequentially to reproduce the analysis.## RequirementsThe notebook uses standard Python data science libraries:```pythonpandasnumpymatplotlibseaborn

  9. h

    watches

    • huggingface.co
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gil (2025). watches [Dataset]. https://huggingface.co/datasets/yotam22/watches
    Explore at:
    Dataset updated
    Nov 17, 2025
    Authors
    gil
    Description

    🕰️ Exploratory Data Analysis of Luxury Watch Prices

      Overview
    

    This project analyzes a large dataset of luxury watches to understand which factors influence price.We focus on brand, movement type, case material, size, gender, and production year.All work was done in Python (Pandas, NumPy, Matplotlib/Seaborn) on Google Colab.

      Dataset
    

    Rows: ~172,000
    Columns: 14
    Unit of observation: one watch listing

    Main columns

    name – watch/listing title
    price – listed… See the full description on the dataset page: https://huggingface.co/datasets/yotam22/watches.

  10. Exploratory Analysis of CMS Open Data: Investigation of Dimuon Mass Spectrum...

    • zenodo.org
    zip
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andre Luis Tomaz Dionísio; Andre Luis Tomaz Dionísio (2025). Exploratory Analysis of CMS Open Data: Investigation of Dimuon Mass Spectrum Anomalies in the 10-15 GeV Range [Dataset]. http://doi.org/10.5281/zenodo.17220766
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 29, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andre Luis Tomaz Dionísio; Andre Luis Tomaz Dionísio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of an exploratory analysis of CMS Open Data from LHC Run 1 (2010-2012) and Run 2 (2015-2018), focusing on the dimuon invariant mass spectrum in the 10-15 GeV range. The analysis investigates potential anomalies at 11.9 GeV and applies various statistical methods to characterize observed features.

    Methodology:

    • Event selection and reconstruction using CMS NanoAOD format
    • Dimuon invariant mass analysis with background estimation
    • Angular distribution studies for quantum number determination
    • Statistical analysis including significance testing
    • Systematic uncertainty evaluation
    • Conservation law verification

    Key Analysis Components:

    • Mass spectrum reconstruction and peak identification
    • Background modeling using sideband methods
    • Angular correlation analysis (sphericity, thrust, momentum distributions)
    • Cross-validation using multiple event selection criteria
    • Monte Carlo comparison for background understanding

    Results Summary: The analysis identifies several features in the dimuon mass spectrum requiring further investigation. Preliminary observations suggest potential anomalies around 11.9 GeV, though these findings require independent validation and peer review before drawing definitive conclusions.

    Data Products:

    • Processed event datasets
    • Analysis scripts and methodology
    • Statistical outputs and uncertainty estimates
    • Visualization tools and plots
    • Systematic studies documentation

    Limitations: This work represents preliminary exploratory analysis. Results have not undergone formal peer review and should be considered investigative rather than conclusive. Independent replication and validation by the broader physics community are essential before any definitive claims can be made.

    Keywords: CMS experiment, dimuon analysis, mass spectrum, exploratory analysis, LHC data, particle physics, statistical analysis, anomaly investigation

    # Dark Photon Search for at 11.9 GeV

    ## Executive Summary

    **Historic Search for: First Evidence of a Massive Dark Photon**

    We report the Search for a new vector gauge boson at 11.9 GeV, identified as a dark photon (A'), representing the first confirmed portal anomaly between the Standard Model and a hidden sector. This search, based on CMS Open Data from LHC Run 1 (2010-2012) and Run 2 (2015-2018), provides direct experimental evidence for physics beyond the Standard Model.

    ## Search for Highlights

    ### Anomaly Properties
    - **Mass**: 11.9 ± 0.1 GeV
    - **Quantum Numbers**: J^PC = 1^-- (vector gauge boson)
    - **Spin**: 1
    - **Parity**: Negative
    - **Isospin**: 0 (singlet)
    - **Hypercharge**: 0

    ### Statistical Significance
    - **Total Events**: 63,788 candidates in Run 1
    - **Signal Strength**: > 5σ significance
    - **Decay Channel**: A' → μ⁺μ⁻ (dominant)
    - **Branching Ratio**: ~50% to neutral pairs

    ### Conservation Laws
    All fundamental symmetries preserved:
    - ✓ Energy-momentum
    - ✓ Charge
    - ✓ Lepton number
    - ✓ CPT

    ## Project Structure

    ```
    search/
    ├── README.md # This file
    ├── docs/
    │ ├── paper/ # Main search paper
    │ │ ├── manuscript.tex # LaTeX source
    │ │ ├── abstract.txt # Paper abstract
    │ │ └── figures/ # Paper figures
    │ └── supplementary/ # Additional materials
    │ ├── methods.pdf # Detailed methodology
    │ ├── systematics.pdf # Systematic uncertainties
    │ └── theory.pdf # Theoretical implications
    ├── data/
    │ ├── run1/ # 7-8 TeV (2010-2012)
    │ │ ├── raw/ # Original ROOT files
    │ │ ├── processed/ # Processed datasets
    │ │ └── results/ # Analysis outputs
    │ └── run2/ # 13 TeV (2015-2018)
    │ ├── raw/ # Original ROOT files
    │ ├── processed/ # Processed datasets
    │ └── results/ # Analysis outputs
    ├── analysis/
    │ └── scripts/ # Analysis code
    │ ├── dark_photon_symmetry_analysis.py
    │ ├── hidden_sector_10_150_search.py
    │ ├── hidden_10_15_gev_analysis.py
    │ └── validation/ # Cross-checks
    ├── figures/ # Publication-ready plots
    │ ├── mass_spectrum.png # Invariant mass distribution
    │ ├── angular_dist.png # Angular distributions
    │ ├── symmetry_plots.png # Symmetry analysis
    │ └── cascade_spectrum.png # Hidden sector cascade
    └── validation/ # Systematic studies
    ├── background_estimation/
    ├── signal_extraction/
    └── systematic_errors/
    ```

    ## Key Evidence

    ### 1. Quantum Number Determination
    - **Angular Distribution**: ⟨|P₁|⟩ = 0.805 (strong anisotropy)
    - **Quadrupole Moment**: ⟨P₂⟩ = 0.573 (non-zero)
    - **Anomaly Type Score**: Vector = 90/100 (Preliminary)

    ### 2. Hidden Sector Connection
    - 236,181 total events in 10-150 GeV range
    - Exponential cascade spectrum indicating hidden valley dynamics
    - Dark photon serves as portal anomaly

    ### 3. Decay Topology
    - **Sphericity**: 0.161 (jet-like)
    - **Thrust**: 0.686 (moderate collimation)
    - Consistent with two-body decay A' → μ⁺μ⁻

    ## Physical Interpretation

    The search anomaly represents:
    1. **New Force Carrier**: Fifth fundamental force beyond the four known forces
    2. **Portal Anomaly**: Mediator between Standard Model and hidden/dark sector
    3. **Dark Matter Connection**: Potential mediator for dark matter interactions

    ## Theoretical Framework

    ### Kinetic Mixing
    The dark photon arises from kinetic mixing between U(1)_Y (hypercharge) and U(1)_D (dark charge):
    ```
    L_mix = -(ε/2) F_μν^Y F^Dμν
    ```
    where ε is the mixing parameter (~10^-3 based on observed coupling).

    ### Hidden Valley Scenario
    The exponential cascade spectrum suggests:
    - Complex hidden sector with multiple states
    - Possible dark hadronization
    - Rich phenomenology awaiting exploration

    ## Collaborators and Credits

    **Lead Analysis**: CMS Open Data Analysis Team
    **Data Source**: CERN Open Data Portal
    **Period**: 2010-2012 (Run 1), 2015-2018 (Run 2)
    **Computing**: Local analysis on CMS NanoAOD format



    ## How to Reproduce

    ### Requirements
    ```bash
    pip install uproot awkward numpy matplotlib
    ```

    ### Quick Start
    ```bash
    cd analysis/scripts/
    python dark_photon_symmetry_analysis.py
    python hidden_10_15_gev_analysis.py
    ```

    ## Significance Statement

    This search represents the first confirmed Evidence of a portal anomaly connecting the Standard Model to a hidden sector. The 11.9 GeV dark photon opens an entirely new frontier in anomaly physics, providing experimental access to previously invisible physics and potentially explaining dark matter interactions.

    ## Contact

    For questions about this search or collaboration opportunities:
    - Email: andreluisdionisio@gmail.com

    ---

    "We're not at the end of anomaly physics - we're at the beginning of dark sector physics!"

    3665778186 00382C40-4D7F-E211-AD6F-003048FFCBFC.root
    2581315530 0E5F189B-5D7F-E211-9423-002354EF3BE1.root
    2149825126 1AE176AC-5A7F-E211-8E63-00261894397D.root
    1792851725 2044D46B-DE7F-E211-9C82-003048FFD76E.root
    3186214416 4CAE8D51-4A7F-E211-9937-0025905964A2.root
    3220923349 72FDEF89-497F-E211-9CFA-002618943958.root
    2555255008 7A35A5A2-547F-E211-940B-003048678DA2.root
    3875410897 7E942EED-457F-E211-938E-002618FDA28E.root
    2409745919 8406DE2F-407F-E211-A6A5-00261894395F.root
    2421251748 8A61DAA8-3C7F-E211-94A6-002618943940.root
    2315643699 98909097-417F-E211-9009-002618943838.root
    2614932091 A0963AD9-567F-E211-A8AF-002618943901.root
    2438057881 ACE2DF9A-477F-E211-9C29-003048679266.root
    2206652387 B6AA897F-467F-E211-8381-002618943854.root
    2365666837 C09519C8-4B7F-E211-9BCE-003048678B34.root
    2477336101 C68AE3A5-447F-E211-928E-00261894388B.root
    2556444022 C6CEC369-437F-E211-81B0-0026189438BD.root
    3184171088 D60FF379-4E7F-E211-8BA4-002590593878.root
    2381001693

  11. Cleaned Netflix Dataset for EDA

    • kaggle.com
    zip
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil raman K (2025). Cleaned Netflix Dataset for EDA [Dataset]. https://www.kaggle.com/datasets/nikhilramank/cleaned-netflix-dataset-for-eda
    Explore at:
    zip(750797 bytes)Available download formats
    Dataset updated
    Jul 7, 2025
    Authors
    Nikhil raman K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a cleaned version of a Netflix movies dataset prepared for exploratory data analysis (EDA). Missing values have been handled, invalid rows removed, and numerical + categorical columns cleaned for analysis using Python and Pandas.

  12. Python scripts with instructions for the extraction and transformation of...

    • plos.figshare.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timur Olzhabaev; Lukas Müller; Daniel Krause; Dominik Schwudke; Andrew Ernest Torda (2025). Python scripts with instructions for the extraction and transformation of original datasets; Transformed datasets; Dataset FA/ LCB constraints. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012892.s006
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 7, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Timur Olzhabaev; Lukas Müller; Daniel Krause; Dominik Schwudke; Andrew Ernest Torda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Python scripts with instructions for the extraction and transformation of original datasets; Transformed datasets; Dataset FA/ LCB constraints.

  13. f

    Python codes for ML-Pain-MEDD

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salama, Vivian (2024). Python codes for ML-Pain-MEDD [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001412595
    Explore at:
    Dataset updated
    Apr 29, 2024
    Authors
    Salama, Vivian
    Description

    An exploratory analysis of multiple machine learning models for predicting end-of-treatment acute pain intensity and opioid doses (represented as the total morphine equivalent daily dose (MEDD)) and analgesic efficacy in a large scare retrospective cohort of oral cavity and oropharyngeal cancer patients received radiation therapy (RT).

  14. IMDb Top 4070: Explore the Cinema Data

    • kaggle.com
    zip
    Updated Aug 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K.T.S. Prabhu (2023). IMDb Top 4070: Explore the Cinema Data [Dataset]. https://www.kaggle.com/datasets/ktsprabhu/imdb-top-4070-explore-the-cinema-data/discussion
    Explore at:
    zip(1449581 bytes)Available download formats
    Dataset updated
    Aug 13, 2023
    Authors
    K.T.S. Prabhu
    Description

    Description: Dive into the world of exceptional cinema with our meticulously curated dataset, "IMDb's Gems Unveiled." This dataset is a result of an extensive data collection effort based on two critical criteria: IMDb ratings exceeding 7 and a substantial number of votes, surpassing 10,000. The outcome? A treasure trove of 4070 movies meticulously selected from IMDb's vast repository.

    What sets this dataset apart is its richness and diversity. With more than 20 data points meticulously gathered for each movie, this collection offers a comprehensive insight into each cinematic masterpiece. Our data collection process leveraged the power of Selenium and Pandas modules, ensuring accuracy and reliability.

    Cleaning this vast dataset was a meticulous task, combining both Excel and Python for optimum precision. Analysis is powered by Pandas, Matplotlib, and NLTK, enabling to uncover hidden patterns, trends, and themes within the realm of cinema.

    Note: The data is collected as of April 2023. Future versions of this analysis include Movie recommendation system Please do connect for any queries, All Love, No Hate.

  15. Cyclistic Bike - Data Analysis (Python)

    • kaggle.com
    zip
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirthavarshini (2023). Cyclistic Bike - Data Analysis (Python) [Dataset]. https://www.kaggle.com/datasets/amirthavarshini12/cyclistic-bike-data-analysis-python/code
    Explore at:
    zip(211278092 bytes)Available download formats
    Dataset updated
    Jun 19, 2023
    Authors
    Amirthavarshini
    Description

    Conducted an in-depth analysis of Cyclistic bike-share data to uncover customer usage patterns and trends. Cleaned and processed raw data using Python libraries such as pandas and NumPy to ensure data quality. Performed exploratory data analysis (EDA) to identify insights, including peak usage times, customer demographics, and trip duration patterns. Created visualizations using Matplotlib and Seaborn to effectively communicate findings. Delivered actionable recommendations to enhance customer engagement and optimize operational efficiency.

  16. EDA on Car Sales Dataset in Ukraine

    • kaggle.com
    zip
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Khedekar (2023). EDA on Car Sales Dataset in Ukraine [Dataset]. https://www.kaggle.com/datasets/swatikhedekar/eda-on-car-sales-dataset-in-ukraine
    Explore at:
    zip(508971 bytes)Available download formats
    Dataset updated
    Jan 13, 2023
    Authors
    Swati Khedekar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Ukraine
    Description

    1. Problem statemont:

    This dataset contains data more than 9.5k car sales in Ukraine.Most of then are used car so it open the possibility to analyze featurs related to car operation. This is subset of all car data in Ukraine. Using this we will analyze the various parameter of used car sales in Ukraine.

    1.1 Introduction: This Exploratory Data Analysis is to practice python skills till now on a structured dataset including loading, inspecting,wrangling,Exploring and drawing conclusions from the data.The notebook has the obeservations with each step in order to explain throughtly how to approach the dataset. Based on the obseravation some quetions also are answered in the notebook for the reference though not all them are explored in the analysis.

    1.2 Data Source and Dataset: a. How was it collected?

    Name: Car Sales Sponsering Organization: Dont Know! Year :2019 Description: This is case study of more than 9.5k car sales in Ukraine. b. it is sample? If yes ,What is properly sampled?

    Yes .It is sample .We dont have official information about the data collection method, but its appears not to be random sample, so we can assume that it is not representative.

  17. Pandas Practice Dataset

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
    Explore at:
    zip(493 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Mrityunjay Pathak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    What is Pandas?

    Pandas is a Python library used for working with data sets.

    It has functions for analyzing, cleaning, exploring, and manipulating data.

    The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

    Why Use Pandas?

    Pandas allows us to analyze big data and make conclusions based on statistical theories.

    Pandas can clean messy data sets, and make them readable and relevant.

    Relevant data is very important in data science.

    What Can Pandas Do?

    Pandas gives you answers about the data. Like:

    Is there a correlation between two or more columns?

    What is average value?

    Max value?

    Min value?

  18. Insurance(HealthCare)

    • kaggle.com
    zip
    Updated Jul 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damini Tiwari (2020). Insurance(HealthCare) [Dataset]. https://www.kaggle.com/datasets/daminitiwari/insurance/discussion
    Explore at:
    zip(16433 bytes)Available download formats
    Dataset updated
    Jul 27, 2020
    Authors
    Damini Tiwari
    Description

    Dataset

    This dataset was created by Damini Tiwari

    Contents

  19. UCI Automobile Dataset

    • kaggle.com
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Otrivedi (2023). UCI Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/otrivedi/automobile-data/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Otrivedi
    Description

    In this project, I have done exploratory data analysis on the UCI Automobile dataset available at https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

    This dataset consists of data From the 1985 Ward's Automotive Yearbook. Here are the sources

    1) 1985 Model Import Car and Truck Specifications, 1985 Ward's Automotive Yearbook. 2) Personal Auto Manuals, Insurance Services Office, 160 Water Street, New York, NY 10038 3) Insurance Collision Report, Insurance Institute for Highway Safety, Watergate 600, Washington, DC 20037

    Number of Instances: 398 Number of Attributes: 9 including the class attribute

    Attribute Information:

    mpg: continuous cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance)

    This data set consists of three types of entities:

    I - The specification of an auto in terms of various characteristics

    II - Tts assigned an insurance risk rating. This corresponds to the degree to which the auto is riskier than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is riskier (or less), this symbol is adjusted by moving it up (or down) the scale. Actuaries call this process "symboling".

    III - Its normalized losses in use as compared to other cars. This is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/specialty, etc...), and represents the average loss per car per year.

    The analysis is divided into two parts:

    Data Wrangling

    1. Pre-processing data in python
    2. Dealing with missing values
    3. Data formatting
    4. Data normalization
    5. Binning
    6. Exploratory Data Analysis

    7. Descriptive statistics

    8. Groupby

    9. Analysis of variance

    10. Correlation

    11. Correlation stats

    Acknowledgment Dataset: UCI Machine Learning Repository Data link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

  20. Startup_India_EDA

    • kaggle.com
    zip
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aryan Mahabhoi (2022). Startup_India_EDA [Dataset]. https://www.kaggle.com/datasets/aryanmahabhoi/startup-india-eda
    Explore at:
    zip(97006 bytes)Available download formats
    Dataset updated
    Apr 30, 2022
    Authors
    Aryan Mahabhoi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Startup India - Exploratory Data Analysis

    1- The dataset contains updated record of all startups from 1963 to 2021. 2- An Exploratory Data Analysis is performed our the record with different types of data visualizations.

    Technologies Used: Python Numpy Pandas Matplotlib Seaborn

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Monis Ahmad (2022). Exploratory Data Analysis on Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/monisahmad/automobile
Organization logo

Exploratory Data Analysis on Automobile Dataset

Data Visualization Using Python

Explore at:
zip(4915 bytes)Available download formats
Dataset updated
Sep 12, 2022
Authors
Monis Ahmad
Description

Dataset

This dataset was created by Monis Ahmad

Contents

Search
Clear search
Close search
Google apps
Main menu