100+ datasets found
  1. gdacp cs 1

    • kaggle.com
    zip
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmet G. (2023). gdacp cs 1 [Dataset]. https://www.kaggle.com/datasets/burayamail/gdacp-cs-1
    Explore at:
    zip(205506284 bytes)Available download formats
    Dataset updated
    Jan 24, 2023
    Authors
    Ahmet G.
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset includes 12 files with month data from January 2022 to December 2022. The data used is reliable because it is the primary source data based on the company, Cyclistic Bike Share. All the necessary information regarding the conduction of data analysis is included, so the data is comprehensive. The ROCCC is evaluated. In order to evaluate the data RStudio 2022.12.0+353 "Elsbeth Geranium" is used. Even tough there are some missing values, by doing data cleaning, the results were not affected in terms of my main area of interest.

    The main area of my study is to differentiate the usage of Cyclistic bikes of annual members and casual members. My dataset and notebook includes a clear statement of the business task as well as a clear description of all the data sources I used. Moreover, the summary of the analysis I made is included in the notebook. In order to make the analysis more understandable by the user I used the support of visualisations and key findings.

    At the end of my notebook, you can access the recommendations I made based on the analysis. I would be more than happy to receive all feedbacks, advices and comments.

  2. Exploratory data analysis of a clinical study group: Development of a...

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański (2023). Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data [Dataset]. http://doi.org/10.1371/journal.pone.0201950
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.

  3. Sales Data (Project1 IIITD)

    • kaggle.com
    zip
    Updated Jan 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Sharma (2022). Sales Data (Project1 IIITD) [Dataset]. https://www.kaggle.com/datasets/rahultheogre/iiitd-project1/discussion
    Explore at:
    zip(3291260 bytes)Available download formats
    Dataset updated
    Jan 16, 2022
    Authors
    Rahul Sharma
    Description

    Dataset

    This dataset was created by Rahul Sharma

    Contents

  4. Bike Rental Data

    • kaggle.com
    zip
    Updated Jan 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PrepInsta Technologies (2023). Bike Rental Data [Dataset]. https://www.kaggle.com/datasets/prepinstaprime/bike-rental-data
    Explore at:
    zip(132898 bytes)Available download formats
    Dataset updated
    Jan 20, 2023
    Authors
    PrepInsta Technologies
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Problem Statement-

    Bike-sharing systems are meant to rent bicycles and return to different places for bike-sharing purposes in Washington DC.

    You are provided with rental data spanning 2 years. It would help if you predicted the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.

    This is the bike rental dataset, to practice pandas profiling. This dataset contains numerical values.

    Tasks to perform : 1. Perform Exploratory Data Analysis 2. Use Pandas Profiling

    Compare the pandas profiling report with Exploratory Data Analysis

  5. h

    EDA-US-Bankruptcy-Prediction

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    reef zehavi, EDA-US-Bankruptcy-Prediction [Dataset]. https://huggingface.co/datasets/reefzehavi/EDA-US-Bankruptcy-Prediction
    Explore at:
    Authors
    reef zehavi
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Area covered
    United States
    Description

    Assignment 1: EDA - US Company Bankruptcy Prediction

    Student Name: Reef Zehavi Date: November 10, 2025

      📹 Project Presentation Video
    

    [(https://www.loom.com/share/6920e493e8654ef3bb4f67a10eb9b03d)]

      1. Overview and Project Goal
    

    The goal of this project is to perform Exploratory Data Analysis (EDA) on a fundamental dataset of American companies. The analysis focuses on understanding the financial characteristics that differentiate between companies that survived… See the full description on the dataset page: https://huggingface.co/datasets/reefzehavi/EDA-US-Bankruptcy-Prediction.

  6. h

    maigurski-customer-personality-assignment1

    • huggingface.co
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    maigurski (2025). maigurski-customer-personality-assignment1 [Dataset]. https://huggingface.co/datasets/maigurski/maigurski-customer-personality-assignment1
    Explore at:
    Dataset updated
    Nov 19, 2025
    Authors
    maigurski
    Description

    Customer Personality Analysis – EDA Results

      1. Project Goal
    

    The goal of this project is to use numeric-focused Exploratory Data Analysis (EDA) on the Customer Personality Analysis dataset to understand:

    Which customer characteristics are associated with higher spending. How these characteristics differ between customers who responded to the last marketing campaign and those who did not.

    The main outcome variable is:

    Response (0 = no, 1 = yes) – did the customer respond… See the full description on the dataset page: https://huggingface.co/datasets/maigurski/maigurski-customer-personality-assignment1.

  7. h

    diabetes_eda_analysis

    • huggingface.co
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GUY SHILO (2025). diabetes_eda_analysis [Dataset]. https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis
    Explore at:
    Dataset updated
    Nov 19, 2025
    Authors
    GUY SHILO
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Diabetes Dataset — Exploratory Data Analysis (EDA)

    This repository contains a diabetes-related tabular dataset and a complete Exploratory Data Analysis (EDA).The main objective of this project was to learn how to conduct a structured EDA, apply best practices, and extract meaningful insights from real-world health data.
    The analysis includes correlations, distributions, group comparisons, class balance exploration, and statistical interpretations that illustrate how different… See the full description on the dataset page: https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis.

  8. DQLab Telco Final

    • kaggle.com
    zip
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Robert Ardi Nugraha (2025). DQLab Telco Final [Dataset]. https://www.kaggle.com/samran98/customer-churn-telco-final
    Explore at:
    zip(113195 bytes)Available download formats
    Dataset updated
    Mar 9, 2025
    Authors
    Samuel Robert Ardi Nugraha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PLEASE UPVOTE THIS DATASET IF THIS HELP YOU... GLAD TO ANY FORKS HERE

    BACKGROUND DQLab Telco is a telecommunications company with numerous locations all over the world. In order to ensure that customers are not left behind, DQLab Telco has consistently paid attention to the customer experience since its establishment in 2019.

    Even though DQLab Telco is only a little over a year old, many of its customers have already changed their subscriptions to rival companies. By using machine learning, management hopes to lower the number of customers who leave.

    After cleaning the data yesterday, it is now time for us to build the best model to forecast customer churn.

    TASKS & STEPS Yesterday, we completed "Cleansing Data" as part of project part 1. You are now expected to develop the appropriate model as a data scientist.

    You will perform "Machine Learning Modeling" in this assignment using data from the previous month, specifically June 2020.

    The actions that must be taken are, 1. Analyze exploratory data first. 2. Carry out pre-processing of the data. 3. Using modeling from machine learning. 4. Picking the Ideal Model.

  9. h

    stroke-prediction-eda-yuval-malka

    • huggingface.co
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuval Malka (2025). stroke-prediction-eda-yuval-malka [Dataset]. https://huggingface.co/datasets/Yuvalos/stroke-prediction-eda-yuval-malka
    Explore at:
    Dataset updated
    Nov 17, 2025
    Authors
    Yuval Malka
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Stroke Prediction Dataset — Exploratory Data Analysis (EDA) By Yuval Malka

    Project Overview

    This project explores the Stroke Prediction Dataset from Kaggle, containing 5,110 rows and 12 features related to demographics, health indicators, and lifestyle factors. The goal is to understand which factors may be associated with the likelihood of having a stroke by performing a full Exploratory Data Analysis (EDA). The target variable is: stroke → 0 = No Stroke, 1 = Stroke This README summarizes… See the full description on the dataset page: https://huggingface.co/datasets/Yuvalos/stroke-prediction-eda-yuval-malka.

  10. w

    Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in...

    • data.wu.ac.at
    zip
    Updated Mar 6, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HarvestMaster (2018). Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in Low-Temperature Geothermal Play Fairway Analysis (GPFA-AB) ThermalQualityAnalysisThermalResourceInterpolationResultsArcGISToolbox.zip [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/ODcxNmYzNDgtMTM2Zi00MGMxLWJiOTUtMzJhY2U1MTkzMDMz
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 6, 2018
    Dataset provided by
    HarvestMaster
    Area covered
    f6cdecf8c561388b831e8b71e301afe86ed90f0d
    Description

    This collection of files are part of a larger dataset uploaded in support of Low Temperature Geothermal Play Fairway Analysis for the Appalachian Basin (GPFA-AB, DOE Project DE-EE0006726). Phase 1 of the GPFA-AB project identified potential Geothermal Play Fairways within the Appalachian basin of Pennsylvania, West Virginia and New York. This was accomplished through analysis of 4 key criteria: thermal quality, natural reservoir productivity, risk of seismicity, and heat utilization. Each of these analyses represent a distinct project task, with the fifth task encompassing combination of the 4 risks factors. Supporting data for all five tasks has been uploaded into the Geothermal Data Repository node of the National Geothermal Data System (NGDS).

    This submission comprises the data for Thermal Quality Analysis (project task 1) and includes all of the necessary shapefiles, rasters, datasets, code, and references to code repositories that were used to create the thermal resource and risk factor maps as part of the GPFA-AB project. The identified Geothermal Play Fairways are also provided with the larger dataset. Figures (.png) are provided as examples of the shapefiles and rasters. The regional standardized 1 square km grid used in the project is also provided as points (cell centers), polygons, and as a raster. Two ArcGIS toolboxes are available: 1) RegionalGridModels.tbx for creating resource and risk factor maps on the standardized grid, and 2) ThermalRiskFactorModels.tbx for use in making the thermal resource maps and cross sections. These toolboxes contain item description documentation for each model within the toolbox, and for the toolbox itself. This submission also contains three R scripts: 1) AddNewSeisFields.R to add seismic risk data to attribute tables of seismic risk, 2) StratifiedKrigingInterpolation.R for the interpolations used in the thermal resource analysis, and 3) LeaveOneOutCrossValidation.R for the cross validations used in the thermal interpolations.

    Some file descriptions make reference to various 'memos'. These are contained within the final report submitted October 16, 2015.

    Each zipped file in the submission contains an 'about' document describing the full Thermal Quality Analysis content available, along with key sources, authors, citation, use guidelines, and assumptions, with the specific file(s) contained within the .zip file highlighted.

    UPDATE: Newer version of the Thermal Quality Analysis has been added here: https://gdr.openei.org/submissions/879 (Also linked below) Newer version of the Combined Risk Factor Analysis has been added here: https://gdr.openei.org/submissions/880 (Also linked below) This is one of sixteen associated .zip files relating to thermal resource interpolation results within the Thermal Quality Analysis task of the Low Temperature Geothermal Play Fairway Analysis for the Appalachian Basin. This file contains an ArcGIS Toolbox with 6 ArcGIS Models: WellClipsToWormsSections, BufferedRasterToClippedRaster, ExtractThermalPropertiesToCrossSection, AddExtraInfoToCrossSection, and CrossSectionExtraction.

    The sixteen files contain the results of the thermal resource interpolation as binary grid (raster) files, images (.png) of the rasters, and toolbox of ArcGIS Models used. Note that raster files ending in “pred” are the predicted mean for that resource, and files ending in “err” are the standard error of the predicted mean for that resource. Leave one out cross validation results are provided for each thermal resource.

    Several models were built in order to process the well database with outliers removed. ArcGIS toolbox ThermalRiskFactorModels contains the ArcGIS processing tools used. First, the WellClipsToWormSections model was used to clip the wells to the worm sections (interpolation regions). Then, the 1 square km gridded regions (see series of 14 Worm Based Interpolation Boundaries .zip files) along with the wells in those regions were loaded into R using the rgdal package. Then, a stratified kriging algorithm implemented in the R gstat package was used to create rasters of the predicted mean and the standard error of the predicted mean. The code used to make these rasters is called StratifiedKrigingInterpolation.R Details about the interpolation, and exploratory data analysis on the well data is provided in 9_GPFA-AB_InterpolationThermalFieldEstimation.pdf (Smith, 2015), contained within the final report.

    The output rasters from R are brought into ArcGIS for further spatial processing. First, the BufferedRasterToClippedRaster tool is used to clip the interpolations back to the Worm Sections. Then, the Mosaic tool in ArcGIS is used to merge all predicted mean rasters into a single raster, and all error rasters into a single raster for each thermal resource.

    A leave one out cross validation was performed on each of the thermal resources. The code used to implement the cross validation is provided in the R script LeaveOneOutCrossValidation.R. The results of the cross validation are given for each thermal resource.

    Other tools provided in this toolbox are useful for creating cross sections of the thermal resource. ExtractThermalPropertiesToCrossSection model extracts the predicted mean and the standard error of predicted mean to the attribute table of a line of cross section. The AddExtraInfoToCrossSection model is then used to add any other desired information, such as state and county boundaries, to the cross section attribute table. These two functions can be combined as a single function, as provided by the CrossSectionExtraction model.

  11. NYC_building_energy_data

    • kaggle.com
    zip
    Updated Mar 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maksym Dubovyi (2020). NYC_building_energy_data [Dataset]. https://www.kaggle.com/maxbrain/nyc-building-energy-data
    Explore at:
    zip(9476304 bytes)Available download formats
    Dataset updated
    Mar 4, 2020
    Authors
    Maksym Dubovyi
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Area covered
    New York
    Description

    In this notebook, we will walk through solving a complete machine learning problem using a real-world dataset. This was a "homework" assignment given to me for a job application over summer 2018. The entire assignment can be viewed here and the one sentence summary is:

    Use the provided building energy data to develop a model that can predict a building's Energy Star score, and then interpret the results to find the variables that are most predictive of the score.

    This is a supervised, regression machine learning task: given a set of data with targets (in this case the score) included, we want to train a model that can learn to map the features (also known as the explanatory variables) to the target.

    Supervised problem: we are given both the features and the target Regression problem: the target is a continous variable, in this case ranging from 0-100 During training, we want the model to learn the relationship between the features and the score so we give it both the features and the answer. Then, to test how well the model has learned, we evaluate it on a testing set where it has never seen the answers!

    Machine Learning Workflow Although the exact implementation details can vary, the general structure of a machine learning project stays relatively constant:

    Data cleaning and formatting Exploratory data analysis Feature engineering and selection Establish a baseline and compare several machine learning models on a performance metric Perform hyperparameter tuning on the best model to optimize it for the problem Evaluate the best model on the testing set Interpret the model results to the extent possible Draw conclusions and write a well-documented report Setting up the structure of the pipeline ahead of time lets us see how one step flows into the other. However, the machine learning pipeline is an iterative procedure and so we don't always follow these steps in a linear fashion. We may revisit a previous step based on results from further down the pipeline. For example, while we may perform feature selection before building any models, we may use the modeling results to go back and select a different set of features. Or, the modeling may turn up unexpected results that mean we want to explore our data from another angle. Generally, you have to complete one step before moving on to the next, but don't feel like once you have finished one step the first time, you cannot go back and make improvements!

    This notebook will cover the first three (and a half) steps of the pipeline with the other parts discussed in two additional notebooks. Throughout this series, the objective is to show how all the different data science practices come together to form a complete project. I try to focus more on the implementations of the methods rather than explaining them at a low-level, but have provided resources for those who want to go deeper. For the single best book (in my opinion) for learning the basics and implementing machine learning practices in Python, check out Hands-On Machine Learning with Scikit-Learn and Tensorflow by Aurelion Geron.

    With this outline in place to guide us, let's get started!

  12. h

    student-performance-factors-analysis-michael-ozon

    • huggingface.co
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MICHAEL OZON (2025). student-performance-factors-analysis-michael-ozon [Dataset]. https://huggingface.co/datasets/michaelozon/student-performance-factors-analysis-michael-ozon
    Explore at:
    Dataset updated
    Nov 14, 2025
    Authors
    MICHAEL OZON
    Description

    🎓 Student Performance Factors — EDA & Insights Michael Ozon — Assignment #1 (EDA & Dataset) Reichman University – Data Science Course 🎥 Presentation Video https://drive.google.com/drive/folders/1cAXLzcZflMgv12EDlVTeQoKxzVumOjbd?usp=drive_link 📌 Project Overview This project explores the Student Performance Factors dataset, containing 6,607 student records and 20 academic, behavioral, lifestyle, and demographic features. The goal of this Exploratory Data Analysis (EDA) is to understand which… See the full description on the dataset page: https://huggingface.co/datasets/michaelozon/student-performance-factors-analysis-michael-ozon.

  13. Dixie Valley Engineered Geothermal System Exploration Methodology Project,...

    • catalog.data.gov
    • gdr.openei.org
    • +5more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AltaRock Energy Inc (2025). Dixie Valley Engineered Geothermal System Exploration Methodology Project, Baseline Conceptual Model Report [Dataset]. https://catalog.data.gov/dataset/dixie-valley-engineered-geothermal-system-exploration-methodology-project-baseline-concept-7bba8
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    AltaRock Energyhttp://www.altarockenergy.com/
    Area covered
    Dixie Valley
    Description

    The Engineered Geothermal System (EGS) Exploration Methodology Project is developing an exploration approach for EGS through the integration of geoscientific data. The overall project area is 2500km2 with the Calibration Area (Dixie Valley Geothermal Wellfield) being about 170km2. The Final Scientific Report (FSR) is submitted in two parts (I and II). FSR part I presents (1) an assessment of the readily available public domain data and some proprietary data provided by terra-gen power, llc, (2) a re-interpretation of these data as required, (3) an exploratory geostatistical data analysis, (4) the baseline geothermal conceptual model, and (5) the EGS favorability/trust mapping. The conceptual model presented applies to both the hydrothermal system and EGS in the Dixie Valley region. FSR Part II presents (1) 278 new gravity stations; (2) enhanced gravity-magnetic modeling; (3) 42 new ambient seismic noise survey stations; (4) an integration of the new seismic noise data with a regional seismic network; (5) a new methodology and approach to interpret this data; (5) a novel method to predict rock type and temperature based on the newly interpreted data; (6) 70 new magnetotelluric (MT) stations; (7) an integrated interpretation of the enhanced MT data set; (8) the results of a 308 station soil CO2 gas survey; (9) new conductive thermal modeling in the project area; (10) new convective modeling in the Calibration Area; (11) pseudo-convective modeling in the Calibration Area; (12) enhanced data implications and qualitative geoscience correlations at three scales (a) Regional, (b) Project, and (c) Calibration Area; (13) quantitative geostatistical exploratory data analysis; and (14) responses to nine questions posed in the proposal for this investigation. Enhanced favorability/trust maps were not generated because there was not a sufficient amount of new, fully-vetted (see below) rock type, temperature, and stress data. The enhanced seismic data did generate a new method to infer rock type and temperature (However, in the opinion of the Principal Investigator for this project, this new methodology needs to be tested and evaluated at other sites in the Basin and Range before it is used to generate the referenced maps. As in the baseline conceptual model, the enhanced findings can be applied to both the hydrothermal system and EGS in the Dixie Valley region).

  14. h

    meirneeman

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MEIR NEEMAN, meirneeman [Dataset]. https://huggingface.co/datasets/meirnm13/meirneeman
    Explore at:
    Authors
    MEIR NEEMAN
    Description

    🏙️ NYC Airbnb Price Analysis 📘 Overview This project analyzes the Airbnb NYC Listings Dataset to explore which property attributes have the greatest influence on an apartment’s nightly rental price. The analysis includes: Data Loading Data Cleaning Handling Missing Values Outlier Detection Feature Preparation Exploratory Data Analysis (EDA) Visualizations Insights & Conclusions 🗂️ 1. Data Loading The dataset was downloaded from Kaggle and contains: Thousands of NYC Airbnb listings 40+… See the full description on the dataset page: https://huggingface.co/datasets/meirnm13/meirneeman.

  15. Dixie Valley Engineered Geothermal System Exploration Methodology Project,...

    • catalog.data.gov
    • data.openei.org
    • +5more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AltaRock Energy Inc (2025). Dixie Valley Engineered Geothermal System Exploration Methodology Project, Baseline Conceptual Model Report [Dataset]. https://catalog.data.gov/dataset/dixie-valley-engineered-geothermal-system-exploration-methodology-project-baseline-concept-177a9
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    AltaRock Energyhttp://www.altarockenergy.com/
    Area covered
    Dixie Valley
    Description

    The Engineered Geothermal System (EGS) Exploration Methodology Project is developing an exploration approach for EGS through the integration of geoscientific data. The Project chose the Dixie Valley Geothermal System in Nevada as a field laboratory site for methodlogy calibration purposes because, in the public domain, it is a highly characterized geothermal systems in the Basin and Range with a considerable amount of geoscience and most importantly, well data. This Baseline Conceptual Model report summarizes the results of the first three project tasks (1) collect and assess the existing public domain geoscience data, (2) design and populate a GIS database, and (3) develop a baseline (existing data) geothermal conceptual model, evaluate geostatistical relationships, and generate baseline, coupled EGS favorability/trust maps from +1km above sea level (asl) to -4km asl for the Calibration Area (Dixie Valley Geothermal Wellfield) to identify EGS drilling targets at a scale of 5km x 5km. It presents (1) an assessment of the readily available public domain data and some proprietary data provided by Terra-Gen Power, LLC, (2) a re-interpretation of these data as required, (3) an exploratory geostatistical data analysis, (4) the baseline geothermal conceptual model, and (5) the EGS favorability/trust mapping. The conceptual model presented applies to both the hydrothermal system and EGS in the Dixie Valley region.

  16. Instagram Reach Analysis - Excel Project

    • kaggle.com
    zip
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raghad Al-marshadi (2025). Instagram Reach Analysis - Excel Project [Dataset]. https://www.kaggle.com/datasets/raghadalmarshadi/instagram-reach-analysis-excel-project
    Explore at:
    zip(291841 bytes)Available download formats
    Dataset updated
    Jun 14, 2025
    Authors
    Raghad Al-marshadi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📊 Instagram Reach Analysis | تحليل الوصول في إنستغرام

    An exploratory data analysis project using Excel to understand what influences Instagram post reach and engagement.
    مشروع تحليل استكشافي لفهم العوامل المؤثرة في وصول منشورات إنستغرام وتفاعل المستخدمين، باستخدام Excel.

    📁 Project Description | وصف المشروع

    This project uses an Instagram dataset imported from Kaggle to explore how different factors like hashtags, saves, shares, and caption length influence impressions and engagement.
    يستخدم هذا المشروع بيانات من إنستغرام تم استيرادها من منصة Kaggle لتحليل كيف تؤثر عوامل مثل الهاشتاقات، الحفظ، المشاركة، وطول التسمية التوضيحية في عدد مرات الظهور والتفاعل.

    🛠️ Tools Used | الأدوات المستخدمة

    • Microsoft Excel
    • Pivot Tables
    • TRIM, WRAP, and other Excel formulas
    • مايكروسوفت إكسل
    • الجداول المحورية
    • دوال مثل TRIM و WRAP وغيرها في Excel

    🧹 Data Cleaning | تنظيف البيانات

    • Removed unnecessary spaces using TRIM
    • Removed 17 duplicate rows → 103 unique rows remained
    • Standardized formatting: freeze top row, wrap text, center align

    • إزالة المسافات غير الضرورية باستخدام TRIM

    • حذف 17 صفًا مكررًا → تبقى 103 صفوف فريدة

    • تنسيق موحد: تثبيت الصف الأول، لف النص، وتوسيط المحتوى

    🔍 Key Analysis Highlights | أبرز نتائج التحليل

    1. Impressions by Source | مرات الظهور حسب المصدر

    • Highest reach: Home > Hashtags > Explore > Other
    • Some totals exceed 100% due to overlapping

    2. Engagement Insights | رؤى حول التفاعل

    • Saves strongly correlate with higher impressions
    • Caption length is inversely related to likes
    • Shares have weak correlation with impressions

    3. Hashtag Patterns | تحليل الهاشتاقات

    • Most used: #Thecleverprogrammer, #Amankharwal, #Python
    • Repeating hashtags does not guarantee higher reach

    ✅ Conclusion | الخلاصة

    Shorter captions and higher save counts contribute more to reach than repeated hashtags. Profile visits are often linked to new followers.
    العناوين القصيرة وعدد الحفظات تلعب دورًا أكبر في الوصول من تكرار الهاشتاقات. كما أن زيارات الملف الشخصي ترتبط غالبًا بزيادة المتابعين.

    👩‍💻 Author | المؤلفة

    Raghad's LinkedIn

    🧠 Inspiration | الإلهام

    Inspired by content from TheCleverProgrammer, Aman Kharwal, and Kaggle datasets.
    استُلهم المشروع من محتوى TheCleverProgrammer وأمان خروال، وبيانات من Kaggle.

    💬 Feedback | الملاحظات

    Feel free to open an issue or share suggestions!
    يسعدنا تلقي ملاحظاتكم واقتراحاتكم عبر صفحة المشروع.

  17. Daily Machine Learning Practice

    • kaggle.com
    zip
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Astrid Villalobos (2025). Daily Machine Learning Practice [Dataset]. https://www.kaggle.com/datasets/astridvillalobos/daily-machine-learning-practice
    Explore at:
    zip(1019861 bytes)Available download formats
    Dataset updated
    Nov 9, 2025
    Authors
    Astrid Villalobos
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Daily Machine Learning Practice – 1 Commit per Day

    Author: Astrid Villalobos Location: Montréal, QC LinkedIn: https://www.linkedin.com/in/astridcvr/

    Objective The goal of this project is to strengthen Machine Learning and data analysis skills through small, consistent daily contributions. Each commit focuses on a specific aspect of data processing, feature engineering, or modeling using Python, Pandas, and Scikit-learn.

    Dataset Source: Kaggle – Sample Sales Data File: data/sales_data_sample.csv Variables: ORDERNUMBER, QUANTITYORDERED, PRICEEACH, SALES, COUNTRY, etc. Goal: Analyze e-commerce performance, predict sales trends, segment customers, and forecast demand.

    **Project Rules **Rule Description 🟩 1 Commit per Day Minimum one line of code daily to ensure consistency and discipline 🌍 Bilingual Comments Code and documentation in English and French 📈 Visible Progress Daily green squares = daily learning 🧰 Tech Stack

    Languages: Python Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn Tools: Jupyter Notebook, GitHub, Kaggle

    Learning Outcomes By the end of this challenge: Develop a stronger understanding of data preprocessing, modeling, and evaluation. Build consistent coding habits through daily practice. Apply ML techniques to real-world sales data scenarios.

  18. Freelance Platform Projects

    • kaggle.com
    zip
    Updated Apr 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prtpl (2023). Freelance Platform Projects [Dataset]. https://www.kaggle.com/datasets/prtpljdj/freeelance-platform-projects/code
    Explore at:
    zip(2972943 bytes)Available download formats
    Dataset updated
    Apr 29, 2023
    Authors
    Prtpl
    Description

    This dataset pulls the projects posted by clients on PeoplePerHour. Data collection started on January 20th, 2023, and adds approximately ~40 new projects to this dataset every hour.

    Inspiration:

    I have been a freelance Python Developer since my graduation (2019). And recently I completed the Google Data Analytics Professional Certificate from Coursera.

    Last week I saw this cool video from LUKE BAROUSSE on youtube here's the link. He created a pipeline to scrape Data Analyst jobs in the US on a daily basis and update the dataset daily on Kaggle. Also lately I was not winning a lot of jobs as a freelancer. I have also started looking for a job in Data Analytics. So I thought a lot about it and concluded to do some analysis as it would be a great project to add to my resume.

    I hope this dataset proves to be useful to you.

  19. HR Analytics Dataset

    • kaggle.com
    zip
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurav Mitra91 (2025). HR Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/sauravmitra91/hr-analytics-dataset
    Explore at:
    zip(163599 bytes)Available download formats
    Dataset updated
    Aug 2, 2025
    Authors
    Saurav Mitra91
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    In this Power BI Dashboard, we used data from HR analytics to help an organization improve employee performance and retention (reduce attrition) by creating an HR Analytics Dashboard.

    Complete the Power BI project through this data set. Topics covered in this Power BI Project. This dashboard includes topics;

    Dashboard Overview Raw HR Analytics Data Dashboard Setup Data Cleaning and processing in Power BI Import Data in Power BI Power Bi Dashboard- KPIs Power Bi Dashboard- Charts & Table Export or share Power Bi Dashboard Insights from Dashboard Measures and Calculations in Power BI

  20. malware_selected_features (1).csv

    • kaggle.com
    zip
    Updated May 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tehmina Raja (2024). malware_selected_features (1).csv [Dataset]. https://www.kaggle.com/datasets/tehminaasrar/malware-selected-features-1-csv
    Explore at:
    zip(2284 bytes)Available download formats
    Dataset updated
    May 11, 2024
    Authors
    Tehmina Raja
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset used in this project contains features extracted from various applications, aiming to detect malware using machine learning techniques. Malware detection is a critical task in cybersecurity, as it helps protect users and organizations from potential threats.

    Source: The dataset was sourced from [insert dataset source here]. It consists of features extracted from a large number of Android applications, including permissions, API calls, and other attributes. The original dataset was collected for research purposes and is publicly available for download.

    Inspiration: The inspiration behind this project came from the increasing prevalence of malware attacks on mobile devices and the need for effective detection methods. By leveraging machine learning algorithms, we aim to develop a model that can accurately classify applications as benign or malicious based on their features. This project is motivated by a desire to contribute to cybersecurity research and develop practical solutions for malware detection.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ahmet G. (2023). gdacp cs 1 [Dataset]. https://www.kaggle.com/datasets/burayamail/gdacp-cs-1
Organization logo

gdacp cs 1

Google Data Analytics Capstone Project Case 1

Explore at:
zip(205506284 bytes)Available download formats
Dataset updated
Jan 24, 2023
Authors
Ahmet G.
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset includes 12 files with month data from January 2022 to December 2022. The data used is reliable because it is the primary source data based on the company, Cyclistic Bike Share. All the necessary information regarding the conduction of data analysis is included, so the data is comprehensive. The ROCCC is evaluated. In order to evaluate the data RStudio 2022.12.0+353 "Elsbeth Geranium" is used. Even tough there are some missing values, by doing data cleaning, the results were not affected in terms of my main area of interest.

The main area of my study is to differentiate the usage of Cyclistic bikes of annual members and casual members. My dataset and notebook includes a clear statement of the business task as well as a clear description of all the data sources I used. Moreover, the summary of the analysis I made is included in the notebook. In order to make the analysis more understandable by the user I used the support of visualisations and key findings.

At the end of my notebook, you can access the recommendations I made based on the analysis. I would be more than happy to receive all feedbacks, advices and comments.

Search
Clear search
Close search
Google apps
Main menu