100+ datasets found

gdacp cs 1
kaggle.com
zip
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmet G. (2023). gdacp cs 1 [Dataset]. https://www.kaggle.com/datasets/burayamail/gdacp-cs-1
Explore at:
zip(205506284 bytes)Available download formats
Dataset updated
Jan 24, 2023
Authors
Ahmet G.
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset includes 12 files with month data from January 2022 to December 2022. The data used is reliable because it is the primary source data based on the company, Cyclistic Bike Share. All the necessary information regarding the conduction of data analysis is included, so the data is comprehensive. The ROCCC is evaluated. In order to evaluate the data RStudio 2022.12.0+353 "Elsbeth Geranium" is used. Even tough there are some missing values, by doing data cleaning, the results were not affected in terms of my main area of interest.

The main area of my study is to differentiate the usage of Cyclistic bikes of annual members and casual members. My dataset and notebook includes a clear statement of the business task as well as a clear description of all the data sources I used. Moreover, the summary of the analysis I made is included in the notebook. In order to make the analysis more understandable by the user I used the support of visualisations and key findings.

At the end of my notebook, you can access the recommendations I made based on the analysis. I would be more than happy to receive all feedbacks, advices and comments.
Exploratory data analysis of a clinical study group: Development of a...
plos.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański (2023). Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data [Dataset]. http://doi.org/10.1371/journal.pone.0201950
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0201950
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.
Sales Data (Project1 IIITD)
kaggle.com
zip
Updated Jan 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Sharma (2022). Sales Data (Project1 IIITD) [Dataset]. https://www.kaggle.com/datasets/rahultheogre/iiitd-project1/discussion
Explore at:
zip(3291260 bytes)Available download formats
Dataset updated
Jan 16, 2022
Authors
Rahul Sharma
Description
Dataset

This dataset was created by Rahul Sharma

Contents
Bike Rental Data
kaggle.com
zip
Updated Jan 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PrepInsta Technologies (2023). Bike Rental Data [Dataset]. https://www.kaggle.com/datasets/prepinstaprime/bike-rental-data
Explore at:
zip(132898 bytes)Available download formats
Dataset updated
Jan 20, 2023
Authors
PrepInsta Technologies
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Problem Statement-

Bike-sharing systems are meant to rent bicycles and return to different places for bike-sharing purposes in Washington DC.

You are provided with rental data spanning 2 years. It would help if you predicted the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.

This is the bike rental dataset, to practice pandas profiling. This dataset contains numerical values.

Tasks to perform : 1. Perform Exploratory Data Analysis 2. Use Pandas Profiling

Compare the pandas profiling report with Exploratory Data Analysis
h
EDA-US-Bankruptcy-Prediction
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
reef zehavi, EDA-US-Bankruptcy-Prediction [Dataset]. https://huggingface.co/datasets/reefzehavi/EDA-US-Bankruptcy-Prediction
Explore at:
Authors
reef zehavi
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Area covered
United States
Description
Assignment 1: EDA - US Company Bankruptcy Prediction

Student Name: Reef Zehavi Date: November 10, 2025

📹 Project Presentation Video

[(https://www.loom.com/share/6920e493e8654ef3bb4f67a10eb9b03d)]

1. Overview and Project Goal

The goal of this project is to perform Exploratory Data Analysis (EDA) on a fundamental dataset of American companies. The analysis focuses on understanding the financial characteristics that differentiate between companies that survived… See the full description on the dataset page: https://huggingface.co/datasets/reefzehavi/EDA-US-Bankruptcy-Prediction.
h
maigurski-customer-personality-assignment1
huggingface.co
Updated Nov 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maigurski (2025). maigurski-customer-personality-assignment1 [Dataset]. https://huggingface.co/datasets/maigurski/maigurski-customer-personality-assignment1
Explore at:
Dataset updated
Nov 19, 2025
Authors
maigurski
Description
Customer Personality Analysis – EDA Results

1. Project Goal

The goal of this project is to use numeric-focused Exploratory Data Analysis (EDA) on the Customer Personality Analysis dataset to understand:

Which customer characteristics are associated with higher spending. How these characteristics differ between customers who responded to the last marketing campaign and those who did not.

The main outcome variable is:

Response (0 = no, 1 = yes) – did the customer respond… See the full description on the dataset page: https://huggingface.co/datasets/maigurski/maigurski-customer-personality-assignment1.
h
diabetes_eda_analysis
huggingface.co
Updated Nov 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GUY SHILO (2025). diabetes_eda_analysis [Dataset]. https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis
Explore at:
Dataset updated
Nov 19, 2025
Authors
GUY SHILO
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Diabetes Dataset — Exploratory Data Analysis (EDA)

This repository contains a diabetes-related tabular dataset and a complete Exploratory Data Analysis (EDA).The main objective of this project was to learn how to conduct a structured EDA, apply best practices, and extract meaningful insights from real-world health data.
The analysis includes correlations, distributions, group comparisons, class balance exploration, and statistical interpretations that illustrate how different… See the full description on the dataset page: https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis.
DQLab Telco Final
kaggle.com
zip
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Robert Ardi Nugraha (2025). DQLab Telco Final [Dataset]. https://www.kaggle.com/samran98/customer-churn-telco-final
Explore at:
zip(113195 bytes)Available download formats
Dataset updated
Mar 9, 2025
Authors
Samuel Robert Ardi Nugraha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
PLEASE UPVOTE THIS DATASET IF THIS HELP YOU... GLAD TO ANY FORKS HERE

BACKGROUND DQLab Telco is a telecommunications company with numerous locations all over the world. In order to ensure that customers are not left behind, DQLab Telco has consistently paid attention to the customer experience since its establishment in 2019.

Even though DQLab Telco is only a little over a year old, many of its customers have already changed their subscriptions to rival companies. By using machine learning, management hopes to lower the number of customers who leave.

After cleaning the data yesterday, it is now time for us to build the best model to forecast customer churn.

TASKS & STEPS Yesterday, we completed "Cleansing Data" as part of project part 1. You are now expected to develop the appropriate model as a data scientist.

You will perform "Machine Learning Modeling" in this assignment using data from the previous month, specifically June 2020.

The actions that must be taken are, 1. Analyze exploratory data first. 2. Carry out pre-processing of the data. 3. Using modeling from machine learning. 4. Picking the Ideal Model.
h
stroke-prediction-eda-yuval-malka
huggingface.co
Updated Nov 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuval Malka (2025). stroke-prediction-eda-yuval-malka [Dataset]. https://huggingface.co/datasets/Yuvalos/stroke-prediction-eda-yuval-malka
Explore at:
Dataset updated
Nov 17, 2025
Authors
Yuval Malka
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Stroke Prediction Dataset — Exploratory Data Analysis (EDA) By Yuval Malka

Project Overview

This project explores the Stroke Prediction Dataset from Kaggle, containing 5,110 rows and 12 features related to demographics, health indicators, and lifestyle factors. The goal is to understand which factors may be associated with the likelihood of having a stroke by performing a full Exploratory Data Analysis (EDA). The target variable is: stroke → 0 = No Stroke, 1 = Stroke This README summarizes… See the full description on the dataset page: https://huggingface.co/datasets/Yuvalos/stroke-prediction-eda-yuval-malka.
w
Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in...
data.wu.ac.at
zip
Updated Mar 6, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HarvestMaster (2018). Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in Low-Temperature Geothermal Play Fairway Analysis (GPFA-AB) ThermalQualityAnalysisThermalResourceInterpolationResultsArcGISToolbox.zip [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/ODcxNmYzNDgtMTM2Zi00MGMxLWJiOTUtMzJhY2U1MTkzMDMz
Explore at:
zipAvailable download formats
Dataset updated
Mar 6, 2018
Dataset provided by
HarvestMaster
Area covered
f6cdecf8c561388b831e8b71e301afe86ed90f0d
Description
This collection of files are part of a larger dataset uploaded in support of Low Temperature Geothermal Play Fairway Analysis for the Appalachian Basin (GPFA-AB, DOE Project DE-EE0006726). Phase 1 of the GPFA-AB project identified potential Geothermal Play Fairways within the Appalachian basin of Pennsylvania, West Virginia and New York. This was accomplished through analysis of 4 key criteria: thermal quality, natural reservoir productivity, risk of seismicity, and heat utilization. Each of these analyses represent a distinct project task, with the fifth task encompassing combination of the 4 risks factors. Supporting data for all five tasks has been uploaded into the Geothermal Data Repository node of the National Geothermal Data System (NGDS).

This submission comprises the data for Thermal Quality Analysis (project task 1) and includes all of the necessary shapefiles, rasters, datasets, code, and references to code repositories that were used to create the thermal resource and risk factor maps as part of the GPFA-AB project. The identified Geothermal Play Fairways are also provided with the larger dataset. Figures (.png) are provided as examples of the shapefiles and rasters. The regional standardized 1 square km grid used in the project is also provided as points (cell centers), polygons, and as a raster. Two ArcGIS toolboxes are available: 1) RegionalGridModels.tbx for creating resource and risk factor maps on the standardized grid, and 2) ThermalRiskFactorModels.tbx for use in making the thermal resource maps and cross sections. These toolboxes contain item description documentation for each model within the toolbox, and for the toolbox itself. This submission also contains three R scripts: 1) AddNewSeisFields.R to add seismic risk data to attribute tables of seismic risk, 2) StratifiedKrigingInterpolation.R for the interpolations used in the thermal resource analysis, and 3) LeaveOneOutCrossValidation.R for the cross validations used in the thermal interpolations.

Some file descriptions make reference to various 'memos'. These are contained within the final report submitted October 16, 2015.

Each zipped file in the submission contains an 'about' document describing the full Thermal Quality Analysis content available, along with key sources, authors, citation, use guidelines, and assumptions, with the specific file(s) contained within the .zip file highlighted.

UPDATE: Newer version of the Thermal Quality Analysis has been added here: https://gdr.openei.org/submissions/879 (Also linked below) Newer version of the Combined Risk Factor Analysis has been added here: https://gdr.openei.org/submissions/880 (Also linked below) This is one of sixteen associated .zip files relating to thermal resource interpolation results within the Thermal Quality Analysis task of the Low Temperature Geothermal Play Fairway Analysis for the Appalachian Basin. This file contains an ArcGIS Toolbox with 6 ArcGIS Models: WellClipsToWormsSections, BufferedRasterToClippedRaster, ExtractThermalPropertiesToCrossSection, AddExtraInfoToCrossSection, and CrossSectionExtraction.

The sixteen files contain the results of the thermal resource interpolation as binary grid (raster) files, images (.png) of the rasters, and toolbox of ArcGIS Models used. Note that raster files ending in “pred” are the predicted mean for that resource, and files ending in “err” are the standard error of the predicted mean for that resource. Leave one out cross validation results are provided for each thermal resource.

Several models were built in order to process the well database with outliers removed. ArcGIS toolbox ThermalRiskFactorModels contains the ArcGIS processing tools used. First, the WellClipsToWormSections model was used to clip the wells to the worm sections (interpolation regions). Then, the 1 square km gridded regions (see series of 14 Worm Based Interpolation Boundaries .zip files) along with the wells in those regions were loaded into R using the rgdal package. Then, a stratified kriging algorithm implemented in the R gstat package was used to create rasters of the predicted mean and the standard error of the predicted mean. The code used to make these rasters is called StratifiedKrigingInterpolation.R Details about the interpolation, and exploratory data analysis on the well data is provided in 9_GPFA-AB_InterpolationThermalFieldEstimation.pdf (Smith, 2015), contained within the final report.

The output rasters from R are brought into ArcGIS for further spatial processing. First, the BufferedRasterToClippedRaster tool is used to clip the interpolations back to the Worm Sections. Then, the Mosaic tool in ArcGIS is used to merge all predicted mean rasters into a single raster, and all error rasters into a single raster for each thermal resource.

A leave one out cross validation was performed on each of the thermal resources. The code used to implement the cross validation is provided in the R script LeaveOneOutCrossValidation.R. The results of the cross validation are given for each thermal resource.

Other tools provided in this toolbox are useful for creating cross sections of the thermal resource. ExtractThermalPropertiesToCrossSection model extracts the predicted mean and the standard error of predicted mean to the attribute table of a line of cross section. The AddExtraInfoToCrossSection model is then used to add any other desired information, such as state and county boundaries, to the cross section attribute table. These two functions can be combined as a single function, as provided by the CrossSectionExtraction model.
NYC_building_energy_data
kaggle.com
zip
Updated Mar 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maksym Dubovyi (2020). NYC_building_energy_data [Dataset]. https://www.kaggle.com/maxbrain/nyc-building-energy-data
Explore at:
zip(9476304 bytes)Available download formats
Dataset updated
Mar 4, 2020
Authors
Maksym Dubovyi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Area covered
New York
Description
In this notebook, we will walk through solving a complete machine learning problem using a real-world dataset. This was a "homework" assignment given to me for a job application over summer 2018. The entire assignment can be viewed here and the one sentence summary is:

Use the provided building energy data to develop a model that can predict a building's Energy Star score, and then interpret the results to find the variables that are most predictive of the score.

This is a supervised, regression machine learning task: given a set of data with targets (in this case the score) included, we want to train a model that can learn to map the features (also known as the explanatory variables) to the target.

Supervised problem: we are given both the features and the target Regression problem: the target is a continous variable, in this case ranging from 0-100 During training, we want the model to learn the relationship between the features and the score so we give it both the features and the answer. Then, to test how well the model has learned, we evaluate it on a testing set where it has never seen the answers!

Machine Learning Workflow Although the exact implementation details can vary, the general structure of a machine learning project stays relatively constant:

Data cleaning and formatting Exploratory data analysis Feature engineering and selection Establish a baseline and compare several machine learning models on a performance metric Perform hyperparameter tuning on the best model to optimize it for the problem Evaluate the best model on the testing set Interpret the model results to the extent possible Draw conclusions and write a well-documented report Setting up the structure of the pipeline ahead of time lets us see how one step flows into the other. However, the machine learning pipeline is an iterative procedure and so we don't always follow these steps in a linear fashion. We may revisit a previous step based on results from further down the pipeline. For example, while we may perform feature selection before building any models, we may use the modeling results to go back and select a different set of features. Or, the modeling may turn up unexpected results that mean we want to explore our data from another angle. Generally, you have to complete one step before moving on to the next, but don't feel like once you have finished one step the first time, you cannot go back and make improvements!

This notebook will cover the first three (and a half) steps of the pipeline with the other parts discussed in two additional notebooks. Throughout this series, the objective is to show how all the different data science practices come together to form a complete project. I try to focus more on the implementations of the methods rather than explaining them at a low-level, but have provided resources for those who want to go deeper. For the single best book (in my opinion) for learning the basics and implementing machine learning practices in Python, check out Hands-On Machine Learning with Scikit-Learn and Tensorflow by Aurelion Geron.

With this outline in place to guide us, let's get started!
h
student-performance-factors-analysis-michael-ozon
huggingface.co
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MICHAEL OZON (2025). student-performance-factors-analysis-michael-ozon [Dataset]. https://huggingface.co/datasets/michaelozon/student-performance-factors-analysis-michael-ozon
Explore at:
Dataset updated
Nov 14, 2025
Authors
MICHAEL OZON
Description
🎓 Student Performance Factors — EDA & Insights Michael Ozon — Assignment #1 (EDA & Dataset) Reichman University – Data Science Course 🎥 Presentation Video https://drive.google.com/drive/folders/1cAXLzcZflMgv12EDlVTeQoKxzVumOjbd?usp=drive_link 📌 Project Overview This project explores the Student Performance Factors dataset, containing 6,607 student records and 20 academic, behavioral, lifestyle, and demographic features. The goal of this Exploratory Data Analysis (EDA) is to understand which… See the full description on the dataset page: https://huggingface.co/datasets/michaelozon/student-performance-factors-analysis-michael-ozon.
Dixie Valley Engineered Geothermal System Exploration Methodology Project,...
catalog.data.gov
gdr.openei.org
+5more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AltaRock Energy Inc (2025). Dixie Valley Engineered Geothermal System Exploration Methodology Project, Baseline Conceptual Model Report [Dataset]. https://catalog.data.gov/dataset/dixie-valley-engineered-geothermal-system-exploration-methodology-project-baseline-concept-7bba8
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
AltaRock Energyhttp://www.altarockenergy.com/
Area covered
Dixie Valley
Description
The Engineered Geothermal System (EGS) Exploration Methodology Project is developing an exploration approach for EGS through the integration of geoscientific data. The overall project area is 2500km2 with the Calibration Area (Dixie Valley Geothermal Wellfield) being about 170km2. The Final Scientific Report (FSR) is submitted in two parts (I and II). FSR part I presents (1) an assessment of the readily available public domain data and some proprietary data provided by terra-gen power, llc, (2) a re-interpretation of these data as required, (3) an exploratory geostatistical data analysis, (4) the baseline geothermal conceptual model, and (5) the EGS favorability/trust mapping. The conceptual model presented applies to both the hydrothermal system and EGS in the Dixie Valley region. FSR Part II presents (1) 278 new gravity stations; (2) enhanced gravity-magnetic modeling; (3) 42 new ambient seismic noise survey stations; (4) an integration of the new seismic noise data with a regional seismic network; (5) a new methodology and approach to interpret this data; (5) a novel method to predict rock type and temperature based on the newly interpreted data; (6) 70 new magnetotelluric (MT) stations; (7) an integrated interpretation of the enhanced MT data set; (8) the results of a 308 station soil CO2 gas survey; (9) new conductive thermal modeling in the project area; (10) new convective modeling in the Calibration Area; (11) pseudo-convective modeling in the Calibration Area; (12) enhanced data implications and qualitative geoscience correlations at three scales (a) Regional, (b) Project, and (c) Calibration Area; (13) quantitative geostatistical exploratory data analysis; and (14) responses to nine questions posed in the proposal for this investigation. Enhanced favorability/trust maps were not generated because there was not a sufficient amount of new, fully-vetted (see below) rock type, temperature, and stress data. The enhanced seismic data did generate a new method to infer rock type and temperature (However, in the opinion of the Principal Investigator for this project, this new methodology needs to be tested and evaluated at other sites in the Basin and Range before it is used to generate the referenced maps. As in the baseline conceptual model, the enhanced findings can be applied to both the hydrothermal system and EGS in the Dixie Valley region).
h
meirneeman
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MEIR NEEMAN, meirneeman [Dataset]. https://huggingface.co/datasets/meirnm13/meirneeman
Explore at:
Authors
MEIR NEEMAN
Description
🏙️ NYC Airbnb Price Analysis 📘 Overview This project analyzes the Airbnb NYC Listings Dataset to explore which property attributes have the greatest influence on an apartment’s nightly rental price. The analysis includes: Data Loading Data Cleaning Handling Missing Values Outlier Detection Feature Preparation Exploratory Data Analysis (EDA) Visualizations Insights & Conclusions 🗂️ 1. Data Loading The dataset was downloaded from Kaggle and contains: Thousands of NYC Airbnb listings 40+… See the full description on the dataset page: https://huggingface.co/datasets/meirnm13/meirneeman.
Dixie Valley Engineered Geothermal System Exploration Methodology Project,...
catalog.data.gov
data.openei.org
+5more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AltaRock Energy Inc (2025). Dixie Valley Engineered Geothermal System Exploration Methodology Project, Baseline Conceptual Model Report [Dataset]. https://catalog.data.gov/dataset/dixie-valley-engineered-geothermal-system-exploration-methodology-project-baseline-concept-177a9
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
AltaRock Energyhttp://www.altarockenergy.com/
Area covered
Dixie Valley
Description
The Engineered Geothermal System (EGS) Exploration Methodology Project is developing an exploration approach for EGS through the integration of geoscientific data. The Project chose the Dixie Valley Geothermal System in Nevada as a field laboratory site for methodlogy calibration purposes because, in the public domain, it is a highly characterized geothermal systems in the Basin and Range with a considerable amount of geoscience and most importantly, well data. This Baseline Conceptual Model report summarizes the results of the first three project tasks (1) collect and assess the existing public domain geoscience data, (2) design and populate a GIS database, and (3) develop a baseline (existing data) geothermal conceptual model, evaluate geostatistical relationships, and generate baseline, coupled EGS favorability/trust maps from +1km above sea level (asl) to -4km asl for the Calibration Area (Dixie Valley Geothermal Wellfield) to identify EGS drilling targets at a scale of 5km x 5km. It presents (1) an assessment of the readily available public domain data and some proprietary data provided by Terra-Gen Power, LLC, (2) a re-interpretation of these data as required, (3) an exploratory geostatistical data analysis, (4) the baseline geothermal conceptual model, and (5) the EGS favorability/trust mapping. The conceptual model presented applies to both the hydrothermal system and EGS in the Dixie Valley region.
Instagram Reach Analysis - Excel Project
kaggle.com
zip
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raghad Al-marshadi (2025). Instagram Reach Analysis - Excel Project [Dataset]. https://www.kaggle.com/datasets/raghadalmarshadi/instagram-reach-analysis-excel-project
Explore at:
zip(291841 bytes)Available download formats
Dataset updated
Jun 14, 2025
Authors
Raghad Al-marshadi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📊 Instagram Reach Analysis | تحليل الوصول في إنستغرام

An exploratory data analysis project using Excel to understand what influences Instagram post reach and engagement.
مشروع تحليل استكشافي لفهم العوامل المؤثرة في وصول منشورات إنستغرام وتفاعل المستخدمين، باستخدام Excel.

📁 Project Description | وصف المشروع

This project uses an Instagram dataset imported from Kaggle to explore how different factors like hashtags, saves, shares, and caption length influence impressions and engagement.
يستخدم هذا المشروع بيانات من إنستغرام تم استيرادها من منصة Kaggle لتحليل كيف تؤثر عوامل مثل الهاشتاقات، الحفظ، المشاركة، وطول التسمية التوضيحية في عدد مرات الظهور والتفاعل.

🛠️ Tools Used | الأدوات المستخدمة

Microsoft Excel

Pivot Tables

TRIM, WRAP, and other Excel formulas

مايكروسوفت إكسل

الجداول المحورية

دوال مثل TRIM و WRAP وغيرها في Excel

🧹 Data Cleaning | تنظيف البيانات

Removed unnecessary spaces using TRIM

Removed 17 duplicate rows → 103 unique rows remained

Standardized formatting: freeze top row, wrap text, center align

إزالة المسافات غير الضرورية باستخدام TRIM

حذف 17 صفًا مكررًا → تبقى 103 صفوف فريدة

تنسيق موحد: تثبيت الصف الأول، لف النص، وتوسيط المحتوى

🔍 Key Analysis Highlights | أبرز نتائج التحليل

1. Impressions by Source | مرات الظهور حسب المصدر

Highest reach: Home > Hashtags > Explore > Other

Some totals exceed 100% due to overlapping

2. Engagement Insights | رؤى حول التفاعل

Saves strongly correlate with higher impressions

Caption length is inversely related to likes

Shares have weak correlation with impressions

3. Hashtag Patterns | تحليل الهاشتاقات

Most used: #Thecleverprogrammer, #Amankharwal, #Python

Repeating hashtags does not guarantee higher reach

✅ Conclusion | الخلاصة

Shorter captions and higher save counts contribute more to reach than repeated hashtags. Profile visits are often linked to new followers.
العناوين القصيرة وعدد الحفظات تلعب دورًا أكبر في الوصول من تكرار الهاشتاقات. كما أن زيارات الملف الشخصي ترتبط غالبًا بزيادة المتابعين.

👩‍💻 Author | المؤلفة

Raghad's LinkedIn

🧠 Inspiration | الإلهام

Inspired by content from TheCleverProgrammer, Aman Kharwal, and Kaggle datasets.
استُلهم المشروع من محتوى TheCleverProgrammer وأمان خروال، وبيانات من Kaggle.

💬 Feedback | الملاحظات

Feel free to open an issue or share suggestions!
يسعدنا تلقي ملاحظاتكم واقتراحاتكم عبر صفحة المشروع.
Daily Machine Learning Practice
kaggle.com
zip
Updated Nov 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Astrid Villalobos (2025). Daily Machine Learning Practice [Dataset]. https://www.kaggle.com/datasets/astridvillalobos/daily-machine-learning-practice
Explore at:
zip(1019861 bytes)Available download formats
Dataset updated
Nov 9, 2025
Authors
Astrid Villalobos
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Daily Machine Learning Practice – 1 Commit per Day

Author: Astrid Villalobos Location: Montréal, QC LinkedIn: https://www.linkedin.com/in/astridcvr/

Objective The goal of this project is to strengthen Machine Learning and data analysis skills through small, consistent daily contributions. Each commit focuses on a specific aspect of data processing, feature engineering, or modeling using Python, Pandas, and Scikit-learn.

Dataset Source: Kaggle – Sample Sales Data File: data/sales_data_sample.csv Variables: ORDERNUMBER, QUANTITYORDERED, PRICEEACH, SALES, COUNTRY, etc. Goal: Analyze e-commerce performance, predict sales trends, segment customers, and forecast demand.

**Project Rules **Rule Description 🟩 1 Commit per Day Minimum one line of code daily to ensure consistency and discipline 🌍 Bilingual Comments Code and documentation in English and French 📈 Visible Progress Daily green squares = daily learning 🧰 Tech Stack

Languages: Python Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn Tools: Jupyter Notebook, GitHub, Kaggle

Learning Outcomes By the end of this challenge: Develop a stronger understanding of data preprocessing, modeling, and evaluation. Build consistent coding habits through daily practice. Apply ML techniques to real-world sales data scenarios.
Freelance Platform Projects
kaggle.com
zip
Updated Apr 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prtpl (2023). Freelance Platform Projects [Dataset]. https://www.kaggle.com/datasets/prtpljdj/freeelance-platform-projects/code
Explore at:
zip(2972943 bytes)Available download formats
Dataset updated
Apr 29, 2023
Authors
Prtpl
Description
This dataset pulls the projects posted by clients on PeoplePerHour. Data collection started on January 20th, 2023, and adds approximately ~40 new projects to this dataset every hour.

Inspiration:

I have been a freelance Python Developer since my graduation (2019). And recently I completed the Google Data Analytics Professional Certificate from Coursera.

Last week I saw this cool video from LUKE BAROUSSE on youtube here's the link. He created a pipeline to scrape Data Analyst jobs in the US on a daily basis and update the dataset daily on Kaggle. Also lately I was not winning a lot of jobs as a freelancer. I have also started looking for a job in Data Analytics. So I thought a lot about it and concluded to do some analysis as it would be a great project to add to my resume.

I hope this dataset proves to be useful to you.
HR Analytics Dataset
kaggle.com
zip
Updated Aug 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurav Mitra91 (2025). HR Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/sauravmitra91/hr-analytics-dataset
Explore at:
zip(163599 bytes)Available download formats
Dataset updated
Aug 2, 2025
Authors
Saurav Mitra91
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
In this Power BI Dashboard, we used data from HR analytics to help an organization improve employee performance and retention (reduce attrition) by creating an HR Analytics Dashboard.

Complete the Power BI project through this data set. Topics covered in this Power BI Project. This dashboard includes topics;

Dashboard Overview Raw HR Analytics Data Dashboard Setup Data Cleaning and processing in Power BI Import Data in Power BI Power Bi Dashboard- KPIs Power Bi Dashboard- Charts & Table Export or share Power Bi Dashboard Insights from Dashboard Measures and Calculations in Power BI
malware_selected_features (1).csv
kaggle.com
zip
Updated May 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tehmina Raja (2024). malware_selected_features (1).csv [Dataset]. https://www.kaggle.com/datasets/tehminaasrar/malware-selected-features-1-csv
Explore at:
zip(2284 bytes)Available download formats
Dataset updated
May 11, 2024
Authors
Tehmina Raja
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset used in this project contains features extracted from various applications, aiming to detect malware using machine learning techniques. Malware detection is a critical task in cybersecurity, as it helps protect users and organizations from potential threats.

Source: The dataset was sourced from [insert dataset source here]. It consists of features extracted from a large number of Android applications, including permissions, API calls, and other attributes. The original dataset was collected for research purposes and is publicly available for download.

Inspiration: The inspiration behind this project came from the increasing prevalence of malware attacks on mobile devices and the need for effective detection methods. By leveraging machine learning algorithms, we aim to develop a model that can accurately classify applications as benign or malicious based on their features. This project is motivated by a desire to contribute to cybersecurity research and develop practical solutions for malware detection.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmet G. (2023). gdacp cs 1 [Dataset]. https://www.kaggle.com/datasets/burayamail/gdacp-cs-1

gdacp cs 1

Google Data Analytics Capstone Project Case 1

Explore at:

zip(205506284 bytes)Available download formats

Dataset updated

Jan 24, 2023

Authors

Ahmet G.

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset includes 12 files with month data from January 2022 to December 2022. The data used is reliable because it is the primary source data based on the company, Cyclistic Bike Share. All the necessary information regarding the conduction of data analysis is included, so the data is comprehensive. The ROCCC is evaluated. In order to evaluate the data RStudio 2022.12.0+353 "Elsbeth Geranium" is used. Even tough there are some missing values, by doing data cleaning, the results were not affected in terms of my main area of interest.

The main area of my study is to differentiate the usage of Cyclistic bikes of annual members and casual members. My dataset and notebook includes a clear statement of the business task as well as a clear description of all the data sources I used. Moreover, the summary of the analysis I made is included in the notebook. In order to make the analysis more understandable by the user I used the support of visualisations and key findings.

At the end of my notebook, you can access the recommendations I made based on the analysis. I would be more than happy to receive all feedbacks, advices and comments.

Clear search

Close search

Google apps

Main menu

gdacp cs 1

Exploratory data analysis of a clinical study group: Development of a...

Sales Data (Project1 IIITD)

Dataset

Contents

Bike Rental Data

EDA-US-Bankruptcy-Prediction

maigurski-customer-personality-assignment1

diabetes_eda_analysis

DQLab Telco Final

stroke-prediction-eda-yuval-malka

Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in...

NYC_building_energy_data

student-performance-factors-analysis-michael-ozon

Dixie Valley Engineered Geothermal System Exploration Methodology Project,...

meirneeman

Dixie Valley Engineered Geothermal System Exploration Methodology Project,...

Instagram Reach Analysis - Excel Project

📊 Instagram Reach Analysis | تحليل الوصول في إنستغرام

📁 Project Description | وصف المشروع

🛠️ Tools Used | الأدوات المستخدمة

🧹 Data Cleaning | تنظيف البيانات

🔍 Key Analysis Highlights | أبرز نتائج التحليل

1. Impressions by Source | مرات الظهور حسب المصدر

2. Engagement Insights | رؤى حول التفاعل

3. Hashtag Patterns | تحليل الهاشتاقات

✅ Conclusion | الخلاصة

👩‍💻 Author | المؤلفة

🧠 Inspiration | الإلهام

💬 Feedback | الملاحظات

Daily Machine Learning Practice

Freelance Platform Projects

HR Analytics Dataset

malware_selected_features (1).csv

gdacp cs 1

Google Data Analytics Capstone Project Case 1