4 datasets found

Precipitation Prediction in LA
kaggle.com
Updated Jan 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Varun Nagpal Spyz (2022). Precipitation Prediction in LA [Dataset]. https://www.kaggle.com/datasets/varunnagpalspyz/precipitation-prediction-in-la/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Varun Nagpal Spyz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Los Angeles
Description
Context

This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.

Content

The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type

Acknowledgements

All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/

Inspiration

Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.
A
‘Precipitation Prediction in LA’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Precipitation Prediction in LA’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-precipitation-prediction-in-la-8cce/f3c83692/?iid=002-283&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Precipitation Prediction in LA’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varunnagpalspyz/precipitation-prediction-in-la on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.

Content

The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type

Acknowledgements

All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/

Inspiration

Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.

--- Original source retains full ownership of the source dataset ---
A
‘California Housing Data (1990)’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Data (1990)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-data-1990-a0c5/b7389540/?iid=007-628&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
Analysis of ‘California Housing Data (1990)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/housing on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Source

This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

The data is based on California Census in 1990.

About the Data (from the book):

"This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

The following is the description from the book author:

This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

About the Data (From Luís Torgo page):

http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

This is a dataset obtained from the StatLib repository. Here is the included description:

"We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

End-to-End ML Project Steps (Chapter 2 of the book)

Look at the big picture

Get the data

Discover and visualize the data to gain insights

Prepare the data for Machine Learning algorithms

Select a model and train it

Fine-tune your model

Present your solution

Launch, monitor, and maintain your system

The 10-Step Machine Learning Project Workflow (My Version)

Define business object

Make sense of the data from a high level

data types (number, text, object, etc.)

continuous/discrete

basic stats (min, max, std, median, etc.) using boxplot

frequency via histogram

scales and distributions of different features

Create the traning and test sets using proper sampling methods, e.g., random vs. stratified

Correlation analysis (pair-wise and attribute combinations)

Data cleaning (missing data, outliers, data errors)

Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)

Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)

Fine tune the model using trying different combinations of hyperparameters

Evaluate the model with best estimators in the test set

Launch, monitor, and refresh the model and system

--- Original source retains full ownership of the source dataset ---
Smart Water Leak Detection Dataset
kaggle.com
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Talha.97S (2025). Smart Water Leak Detection Dataset [Dataset]. https://www.kaggle.com/datasets/talha97s/smart-water-leak-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Talha.97S
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Water loss due to undetected pipeline leaks is a critical issue in urban infrastructure and smart utility networks. In water transport systems, small leaks can escalate into major inefficiencies, driving up operational costs and wasting precious resources—especially in arid or high-demand regions like the UAE, where this project was inspired.

This dataset simulates real-world IoT sensor data from a smart water transport network, combining geolocation (latitude, longitude) and telemetry values (pressure, flow rate, vibration, RPM, and operational hours) to detect potential pipeline leakage. It supports the development of machine learning models that can power real-time monitoring systems and interactive GIS dashboards.

📡 Sources:

This dataset is synthetically generated but carefully modeled after real-world industrial systems and smart utility practices. Sensor behaviors (e.g., pressure drops, abnormal flow rates) are crafted to mimic patterns observed in real leakage events.

Sensor types: Pressure, flow rate, temperature, vibration, RPM, operational hours

GPS values simulate pipeline segment locations in a grid-style zone system

Labels were generated using a rule-based thresholding logic to indicate leak conditions

If you are working with actual utility providers or have IoT devices, this dataset can serve as a foundation for building real-time predictive models and dashboards.

💡 Inspiration:

This dataset was created to power a complete ML + API + Dashboard workflow, including:

A machine learning model using XGBoost for binary classification

A Flask API for real-time leakage prediction

A Streamlit dashboard with an interactive GIS map to visualize detected leaks

The goal was to build a portfolio-ready, real-world project for smart cities, IoT analytics, and geospatial machine learning—particularly targeting applications in water transport, infrastructure monitoring, and predictive maintenance.

Use Cases:

Build real-time ML pipelines for leakage detection

Visualize water transport failures on interactive maps

Experiment with anomaly detection in geospatial sensor data

Extend into MQTT or real sensor integration for smart cities
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Varun Nagpal Spyz (2022). Precipitation Prediction in LA [Dataset]. https://www.kaggle.com/datasets/varunnagpalspyz/precipitation-prediction-in-la/code

Precipitation Prediction in LA

Basic Dataset for a Basic Project

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 22, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Varun Nagpal Spyz

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

Los Angeles

Description

Context

This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.

Content

The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type

Acknowledgements

All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/

Inspiration

Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.

Clear search

Close search

Google apps

Main menu

Precipitation Prediction in LA

Context

Content

Acknowledgements

Inspiration

‘Precipitation Prediction in LA’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

‘California Housing Data (1990)’ analyzed by Analyst-2

Source

About the Data (from the book):

About the Data (From Luís Torgo page):

End-to-End ML Project Steps (Chapter 2 of the book)

The 10-Step Machine Learning Project Workflow (My Version)

Smart Water Leak Detection Dataset

Precipitation Prediction in LA

Basic Dataset for a Basic Project

Context

Content

Acknowledgements

Inspiration