88 datasets found

Daily time spent on mobile phones in the U.S. 2019-2024
statista.com
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Daily time spent on mobile phones in the U.S. 2019-2024 [Dataset]. https://www.statista.com/statistics/1045353/mobile-device-daily-usage-time-in-the-us/
Explore at:
Dataset updated
Dec 6, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
The average time spent daily on a phone, not counting talking on the phone, has increased in recent years, reaching a total of 4 hours and 30 minutes as of April 2022. This figure is expected to reach around 4 hours and 39 minutes by 2024.
Average daily time spent on social media worldwide 2012-2024
statista.com
ai-chatbox.pro
Updated Apr 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Explore at:
Dataset updated
Apr 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How much time do people spend on social media? As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
Americans time spent with relationships
kaggle.com
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Varun Sai Kanuri (2025). Americans time spent with relationships [Dataset]. https://www.kaggle.com/datasets/varunsaikanuri/americans-time-spent-with-relationships/versions/194
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Varun Sai Kanuri
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This Dataset is about average time spent with others is measured in minutes per day, and shown by the age of the respondent. This is based on averages from surveys between 2009 and 2019 by U.S. Bureau of Labor Statistics American Time Use Survey, accessed on Our World in Data.

Source: U.S. Bureau of Labor Statistics American Time Use Survey, accessed on our Our world in Data
Children's screen time, 2 hours per day or less, by sex, household...
open.canada.ca
data.urbandatacentre.ca
+2more
csv, html, xml
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Children's screen time, 2 hours per day or less, by sex, household population aged 6 to 17, 2015 Canadian Community Health Survey - Nutrition, Canada and provinces [Dataset]. https://open.canada.ca/data/en/dataset/55a4e3da-6726-4abb-a573-6d3bd5b02c08
Explore at:
xml, html, csvAvailable download formats
Dataset updated
Jan 17, 2023
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
This table contains 2376 series, with data for years 2015 - 2015 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (11 items: Canada; Newfoundland and Labrador; Prince Edward Island; Nova Scotia; ...); Age group (3 items: Total, 6 to 17 years; 6 to 11 years; 12 to 17 years); Sex (3 items: Both sexes; Males; Females); Children's screen time (3 items: Total population for the variable children's screen time; 2 hours or less of screen time per day; More than 2 hours of screen time per day); Characteristics (8 items: Number of persons; Low 95% confidence interval, number of persons; High 95% confidence interval, number of persons; Coefficient of variation for number of persons; ...).
Z
Data from: A 24-hour dynamic population distribution dataset based on mobile...
data.niaid.nih.gov
zenodo.org
Updated Feb 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudia Bergroth (2022). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724388
Explore at:
Dataset updated
Feb 16, 2022
Dataset provided by
Olle Järv
Matti Manninen
Tuuli Toivonen
Henrikki Tenkanen
Claudia Bergroth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki Metropolitan Area, Finland
Description
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.

In this dataset:

We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.

Please cite this dataset as:

Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4

Organization of data

The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:

HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.

HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.

HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.

target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.

Column names

YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.

H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)

In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.

License Creative Commons Attribution 4.0 International.

Related datasets

Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612

Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
G
Daily average time spent on various activities by age group and sex, 2015,...
open.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2024). Daily average time spent on various activities by age group and sex, 2015, inactive [Dataset]. https://open.canada.ca/data/en/dataset/f3b35173-0cff-4986-9f50-b93118530cc5
Explore at:
csv, html, xmlAvailable download formats
Dataset updated
Jun 5, 2024
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Daily average time in hours and proportion of day spent on various activities by age group and sex, 15 years and over, Canada and provinces.
d
APD Average Response Time by Day and Hour
catalog.data.gov
data.austintexas.gov
+3more
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2025). APD Average Response Time by Day and Hour [Dataset]. https://catalog.data.gov/dataset/apd-average-response-time-by-day-and-hour
Explore at:
Dataset updated
Apr 25, 2025
Dataset provided by
data.austintexas.gov
Description
DATASET DESCRIPTION: This Dataset includes the average response time by Call Priority across days of the week and hours of the day. Response Times reflect the same information contained in the APD 911 Calls for Service 2019-2024 dataset. AUSTIN POLICE DEPARTMENT DATA DISCLAIMER 1. The data provided is for informational use only and may differ from official Austin Police Department data. The Austin Police Department’s databases are continuously updated, and changes can be made due to a variety of investigative factors including but not limited to offense reclassification and dates. Reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different systems of record may have been used. 4.The Austin Police Department does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided. City of Austin Open Data Terms of Use -https://data.austintexas.gov/stories/s/ranj-cccq
u
S3 Dataset
portalinvestigacion.um.es
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
López, Juan Manuel Espín; Celdrán, Alberto Huertas; Marín-Blázquez, Javier G.; Martínez, Francisco Esquembre; Pérez, Gregorio Martínez; López, Juan Manuel Espín; Celdrán, Alberto Huertas; Marín-Blázquez, Javier G.; Martínez, Francisco Esquembre; Pérez, Gregorio Martínez (2021). S3 Dataset [Dataset]. https://portalinvestigacion.um.es/documentos/668fc48db9e7c03b01be0de8?lang=de
Explore at:
Dataset updated
2021
Authors
López, Juan Manuel Espín; Celdrán, Alberto Huertas; Marín-Blázquez, Javier G.; Martínez, Francisco Esquembre; Pérez, Gregorio Martínez; López, Juan Manuel Espín; Celdrán, Alberto Huertas; Marín-Blázquez, Javier G.; Martínez, Francisco Esquembre; Pérez, Gregorio Martínez
Description
The S3 dataset contains the behavior (sensors, statistics of applications, and voice) of 21 volunteers interacting with their smartphones for more than 60 days. The type of users is diverse, males and females in the age range from 18 until 70 have been considered in the dataset generation. The wide range of age is a key aspect, due to the impact of age in terms of smartphone usage. To generate the dataset the volunteers installed a prototype of the smartphone application in on their Android mobile phones.
All attributes of the different kinds of data are writed in a vector. The dataset contains the fellow vectors:
Sensors:
This type of vector contains data belonging to smartphone sensors (accelerometer and gyroscope) that has been acquired in a given windows of time. Each vector is obtained every 20 seconds, and the monitored features are:- Average of accelerometer and gyroscope values.- Maximum and minimum of accelerometer and gyroscope values.- Variance of accelerometer and gyroscope values.- Peak-to-peak (max-min) of X, Y, Z coordinates.- Magnitude for gyroscope and accelerometer.

Statistics:
These vectors contain data about the different applications used by the user recently. Each vector of statistics is calculated every 60 seconds and contains : - Foreground application counters (number of different and total apps) for the last minute and the last day.- Most common app ID and the number of usages in the last minute and the last day. - ID of the currently active app. - ID of the last active app prior to the current one.- ID of the application most frequently utilized prior to the current application. - Bytes transmitted and received through the network interfaces.

Voice:
This kind of vector is generated when the microphone is active in a call o voice note. The speaker vector is an embedding, extracted from the audio, and it contains information about the user's identity. This vector, is usually named "x-vector" in the Speaker Recognition field, and it is calculated following the steps detailed in "egs/sitw/v2" for the Kaldi library, with the models available for the extraction of the embedding.

A summary of the details of the collected database.
- Users: 21 - Sensors vectors: 417.128 - Statistics app's usage vectors: 151.034 - Speaker vectors: 2.720 - Call recordings: 629 - Voice messages: 2.091
d
SWI 1.2 Phone Calls by Hold Time, Handled, and Abandoned FY2015-2024
catalog.data.gov
data.texas.gov
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2025). SWI 1.2 Phone Calls by Hold Time, Handled, and Abandoned FY2015-2024 [Dataset]. https://catalog.data.gov/dataset/swi-1-2-phone-calls-by-hold-time-handled-and-abandoned-fy2013-2022
Explore at:
Dataset updated
Feb 25, 2025
Dataset provided by
data.austintexas.gov
Description
Statewide Intake serves as the “front door to the front line” for all DFPS programs. As the central point of contact for reports of abuse, neglect and exploitation of vulnerable Texans, SWI staff are available 24 hours a day, 7 days per week, 365 days per year. SWI is the Centralized point of intake for child abuse and neglect, abuse, neglect or exploitation of people age 65 or older or adults with disabilities, clients served by DSHS or DADS employees in State Hospitals or State Supported Living Centers, and children in licensed child-care facilities or treatment centers for the entire State of Texas. SWI provides daily reports on call volume per application; hold times per application, etc. and integrates hardware and software upgrades to phone and computer systems to reduce hold times and improve efficiency. NOTE: Past Printed Data Books also included EBC, Re-Entry and Support Staff in all queues total. An abandoned call is a call that disconnects after completing navigation of the recorded message, but prior to being answered by an intake specialist. Legislative Budget Board (LBB) Performance Measure Targets are set every two years during Legislative Sessions. LBB Average Hold Time Targets for English Queue: 2010 11.4 minutes 2011 11.4 minutes 2012 8.7 minutes 2013 8.7 minutes 2014 8.7 minutes 2015 8.7 minutes 2016 7.2 minutes 2017 10.5 minutes 2018 12.0 minutes 2019 9.8 minutes Visit dfps.state.tx.us for information on all DFPS programs
Time use in the UK
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). Time use in the UK [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/datasets/timeuseintheuk
Explore at:
xlsxAvailable download formats
Dataset updated
May 7, 2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
United Kingdom
Description
Average daily time spent by adults on activities including paid work, unpaid household work, unpaid care, travel and entertainment. These are official statistics in development.
G
Daily average time spent on various activities, by age group and gender,...
ouvert.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2024). Daily average time spent on various activities, by age group and gender, 2022 [Dataset]. https://ouvert.canada.ca/data/dataset/e2b6126c-e9bf-446c-9905-46a30be2738a
Explore at:
html, csv, xmlAvailable download formats
Dataset updated
Jun 5, 2024
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Daily average time and proportion of day spent on various activities, by age group and gender, 15 years and over, Canada, Geographical region of Canada, province or territory, 2022.

Household Energy Consumption

kaggle.com

Updated Apr 5, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Samx_sam (2025). Household Energy Consumption [Dataset]. https://www.kaggle.com/datasets/samxsam/household-energy-consumption

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 5, 2025

Dataset provided by

Kaggle

Authors

Samx_sam

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🏡 Household Energy Consumption - April 2025 (90,000 Records)

📌 Overview

This dataset presents detailed energy consumption records from various households over the month. With 90,000 rows and multiple features such as temperature, household size, air conditioning usage, and peak hour consumption, this dataset is perfect for performing time-series analysis, machine learning, and sustainability research.

Column Name	Data Type Category	Description
Household_ID	Categorical (Nominal)	Unique identifier for each household
Date	Datetime	The date of the energy usage record
Energy_Consumption_kWh	Numerical (Continuous)	Total energy consumed by the household in kWh
Household_Size	Numerical (Discrete)	Number of individuals living in the household
Avg_Temperature_C	Numerical (Continuous)	Average daily temperature in degrees Celsius
Has_AC	Categorical (Binary)	Indicates if the household has air conditioning (Yes/No)
Peak_Hours_Usage_kWh	Numerical (Continuous)	Energy consumed during peak hours in kWh

📂 Dataset Summary

Rows: 90,000
Time Range: April 1, 2025 – April 30, 2025
Data Granularity: Daily per household
Location: Simulated global coverage
Format: CSV (Comma-Separated Values)

📚 Libraries Used for Working with household_energy_consumption_2025.csv

🔍 1. Data Manipulation & Analysis

Library	Purpose
`pandas`	Reading, cleaning, and transforming tabular data
`numpy`	Numerical operations, working with arrays

📊 2. Data Visualization

Library	Purpose
`matplotlib`	Creating static plots (line, bar, histograms, etc.)
`seaborn`	Statistical visualizations, heatmaps, boxplots, etc.
`plotly`	Interactive charts (time series, pie, bar, scatter, etc.)

📈 3. Machine Learning / Modeling

Library	Purpose
`scikit-learn`	Preprocessing, regression, classification, clustering
`xgboost` / `lightgbm`	Gradient boosting models for better accuracy

🧹 4. Data Preprocessing

Library	Purpose
`sklearn.preprocessing`	Encoding categorical features, scaling, normalization
`datetime` / `pandas`	Date-time conversion and manipulation

🧪 5. Model Evaluation

Library	Purpose
`sklearn.metrics`	Accuracy, MAE, RMSE, R² score, confusion matrix, etc.

✅ These libraries provide a complete toolkit for performing data analysis, modeling, and visualization tasks efficiently.

📈 Potential Use Cases

This dataset is ideal for a wide variety of analytics and machine learning projects:

🔮 Forecasting & Time Series Analysis

Predict future household energy consumption based on previous trends and weather conditions.
Identify seasonal and daily consumption patterns.

💡 Energy Efficiency Analysis

Analyze differences in energy consumption between households with and without air conditioning.
Compare energy usage efficiency across varying household sizes.

🌡️ Climate Impact Studies

Investigate how temperature affects electricity usage across households.
Model the potential impact of climate change on residential energy demand.

🔌 Peak Load Management

Build models to predict and manage energy demand during peak hours.
Support research on smart grid technologies and dynamic pricing.

🧠 Machine Learning Projects

Supervised learning (regression/classification) to predict energy consumption.
Clustering households by usage patterns for targeted energy programs.
Anomaly detection in energy usage for fault detection.

🛠️ Example Starter Projects

Time-series forecasting using Facebook Prophet or ARIMA
Regression modeling using XGBoost or LightGBM
Classification of AC vs. non-AC household behavior
Energy-saving recommendation systems
Heatmaps of temperature vs. energy usage

k
World Average Degree Days Database
datasource.kapsarc.org
Updated Sep 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). World Average Degree Days Database [Dataset]. https://datasource.kapsarc.org/explore/dataset/world-average-degree-days-database-1964-2013/
Explore at:
Dataset updated
Sep 10, 2023
Area covered
World
Description
This dataset contains the World Average Degree Days Database for the period 1964-2013. Follow datasource.kapsarc.org for timely data to advance energy economics research.*

Summary_64-13_freq=1D Average Degree Days of various indices for respective countries for the period 1964-2013, converted to a 1 day frequency

Summary_64-13_freq=6hrs Average Degree Days of various indices for respective countries for the period 1964-2013, calculated at 6 hrs frequency

T2m.hdd.18C Calculation of Heating Degree Days using plain temperature at 2 m elevation at Tref=18°C and frequency of 6 hrs

T2m.cdd.18C Calculation of Cooling Degree Days using plain temperature at 2 m elevation at Tref=18°C and frequency of 6 hrs

t2m.hdd.15.6C Calculation of Heating Degree Days using plain temperature at 2 m elevation at Tref=15.6°C and frequency of 6 hrs

t2m.hdd.18.3C Calculation of Heating Degree Days using plain temperature at 2 m elevation at Tref=18.3°C and frequency of 6 hrs

t2m.hdd.21.1C Calculation of Heating Degree Days using plain temperature at 2 m elevation at Tref=21.1°C and frequency of 6 hrs

t2m.cdd.15.6C Calculation of Cooling Degree Days using plain temperature at 2 m elevation at Tref=15.6°C and frequency of 6 hrs

t2m.cdd.18.3C Calculation of Cooling Degree Days using plain temperature at 2 m elevation at Tref=18.3°C and frequency of 6 hrs

t2m.cdd.21.1C Calculation of Cooling Degree Days using plain temperature at 2 m elevation at Tref=21.1°C and frequency of 6 hrs

t2m.hdd.60F Calculation of Heating Degree Days using plain temperature at 2 m elevation at Tref=60°F and frequency of 6 hrs

t2m.hdd.65F Calculation of Heating Degree Days using plain temperature at 2 m elevation at Tref=65°F and frequency of 6 hrs

t2m.hdd.70F Calculation of Heating Degree Days using plain temperature at 2 m elevation at Tref=70°F and frequency of 6 hrs

t2m.cdd.60F Calculation of Cooling Degree Days using plain temperature at 2 m elevation at Tref=60°F and frequency of 6 hrs

t2m.cdd.65F Calculation of Cooling Degree Days using plain temperature at 2 m elevation at Tref=65°F and frequency of 6 hrs

t2m.cdd.70F Calculation of Cooling Degree Days using plain temperature at 2 m elevation at Tref=70°F and frequency of 6 hrs

HI.hdd.57.56F Calculation of Heating Degree Days using the Heat Index at Tref=57.56°F and frequency of 6 hrs

HI.hdd.63.08F Calculation of Heating Degree Days using the Heat Index at Tref=63.08°F and frequency of 6 hrs

HI.hdd.68.58F Calculation of Heating Degree Days using the Heat Index at Tref=68.58°F and frequency of 6 hrs

HI.cdd.57.56F Calculation of Cooling Degree Days using the Heat Index at Tref=57.56°F and frequency of 6 hrs

HI.cdd.63.08F Calculation of Cooling Degree Days using the Heat Index at Tref=63.08°F and frequency of 6 hrs

HI.cdd.68.58F Calculation of Cooling Degree Days using the Heat Index at Tref=68.58°F and frequency of 6 hrs

HUM.hdd.13.98C Calculation of Heating Degree Days using the Humidex at Tref=13.98°C and frequency of 6 hrs

HUM.hdd.17.4C Calculation of Heating Degree Days using the Humidex at Tref=17.40°C and frequency of 6 hrs

HUM.hdd.21.09C Calculation of Heating Degree Days using the Humidex at Tref=21.09°C and frequency of 6 hrs

HUM.cdd.13.98C Calculation of Cooling Degree Days using the Humidex at Tref=13.98°C and frequency of 6 hrs

HUM.cdd.17.4C Calculation of Cooling Degree Days using the Humidex at Tref=17.40°C and frequency of 6 hrs

HUM.cdd.21.09C Calculation of Cooling Degree Days using the Humidex at Tref=21.09°C and frequency of 6 hrs

ESI.hdd.12.6C Calculation of Heating Degree Days using the Environmental Stress Index at Tref=12.6°C and frequency of 6 hrs

ESI.hdd.14.9C Calculation of Heating Degree Days using the Environmental Stress Index at Tref=14.9°C and frequency of 6 hrs

ESI.hdd.17.2C Calculation of Heating Degree Days using the Environmental Stress Index at Tref=17.2°C and frequency of 6 hrs

ESI.cdd.12.6C Calculation of Cooling Degree Days using the Environmental Stress Index at Tref=12.6°C and frequency of 6 hrs

ESI.cdd.14.9C Calculation of Cooling Degree Days using the Environmental Stress Index at Tref=14.9°C and frequency of 6 hrs

ESI.cdd.17.2C Calculation of Cooling Degree Days using the Environmental Stress Index at Tref=17.2°C and frequency of 6 hrs

Note:

Divide Degree Days by 4 to convert from 6 hrs to daily frequency
Global Surface Summary of the Day - GSOD
catalog.data.gov
datadiscoverystudio.org
+2more
Updated Oct 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOC/NOAA/NESDIS/NCEI > National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce (Point of Contact) (2023). Global Surface Summary of the Day - GSOD [Dataset]. https://catalog.data.gov/dataset/global-surface-summary-of-the-day-gsod1
Explore at:
Dataset updated
Oct 11, 2023
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
United States Department of Commercehttp://www.commerce.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Environmental Satellite, Data, and Information Service
Description
Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries. The online data files begin with 1929 and are at the time of this writing at the Version 8 software level. Over 9000 stations' data are typically available. The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches) Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud Global summary of day data for 18 surface meteorological elements are derived from the synoptic/hourly observations contained in USAF DATSAV3 Surface data and Federal Climate Complex Integrated Surface Hourly (ISH). Historical data are generally available for 1929 to the present, with data from 1973 to the present being the most complete. For some periods, one or more countries' data may not be available due to data restrictions or communications problems. In deriving the summary of day data, a minimum of 4 observations for the day must be present (allows for stations which report 4 synoptic observations/day). Since the data are converted to constant units (e.g, knots), slight rounding error from the originally reported values may occur (e.g, 9.9 instead of 10.0). The mean daily values described below are based on the hours of operation for the station. For some stations/countries, the visibility will sometimes 'cluster' around a value (such as 10 miles) due to the practice of not reporting visibilities greater than certain distances. The daily extremes and totals--maximum wind gust, precipitation amount, and snow depth--will only appear if the station reports the data sufficiently to provide a valid value. Therefore, these three elements will appear less frequently than other values. Also, these elements are derived from the stations' reports during the day, and may comprise a 24-hour period which includes a portion of the previous day. The data are reported and summarized based on Greenwich Mean Time (GMT, 0000Z - 2359Z) since the original synoptic/hourly data are reported and based on GMT.
d
Integrated Building Health Management
catalog.data.gov
data.amerigeoss.org
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Integrated Building Health Management [Dataset]. https://catalog.data.gov/dataset/integrated-building-health-management
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Abstract: Building health management is an important part in running an efficient and cost-effective building. Many problems in a building’s system can go undetected for long periods of time, leading to expensive repairs or wasted resources. This project aims to help detect and diagnose the building‘s health with data driven methods throughout the day. Orca and IMS are two state of the art algorithms that observe an array of building health sensors and provide feedback on the overall system’s health as well as localize the problem to one, or possibly two, components. With this level of feedback the hope is to quickly identify problems and provide appropriate maintenance while reducing the number of complaints and service calls. Introduction: To prepare these technologies for the new installation, the proposed methods are being tested on a current system that behaves similarly to the future green building. Building 241 was determined to best resemble the proposed building 232 and therefore was chosen for this study. Building 241 is currently outfitted with 34 sensors that monitor the heating & cooling temperatures for the air and water systems as well as other various subsystem states. The daily sensor recordings were logged and sent to the IDU group for analysis. The period of analysis was focused from July 1st through August 10th 2009. Methodology: The two algorithms used for analysis were Orca and IMS. Both methods look for anomalies using a distanced based scoring approach. Orca has the ability to use a single data set and find outliers within that data set. This tactic was applied to each day. After scoring each time sample throughout a given day the Orca score profiles were compared by computing the correlation against all other days. Days with high overall correlations were considered normal however days with lower overall correlations were more anomalous. IMS, on the other hand, needs a normal set of data to build a model, which can be applied to a set of test data to asses how anomaly the particular data set is. The typical days identified by Orca were used as the reference/training set for IMS, while all the other days were passed through IMS resulting in an anomaly score profile for each day. The mean of the IMS score profile was then calculated for each day to produce a summary IMS score. These summary scores were ranked and the top outliers were identified (see Figure 1). Once the anomalies were identified the contributing parameters were then ranked by the algorithm. Analysis: The contributing parameters identified by IMS were localized to the return air temperature duct system. -7/03/09 (Figure 2 & 3) AHU-1 Return Air Temperature (RAT) Calculated Average Return Air Temperature -7/19/09 (Figure 3 & 4) AHU-2 Return Air Temperature (RAT) Calculated Average Return Air Temperature IMS identified significantly higher temperatures compared to other days during the month of July and August. Conclusion: The proposed algorithms Orca and IMS have shown that they were able to pick up significant anomalies in the building system as well as diagnose the anomaly by identifying the sensor values that were anomalous. In the future these methods can be used on live streaming data and produce a real time anomaly score to help building maintenance with detection and diagnosis of problems.
d
Temperature - Historic Daily Time Series
catalog.data.gov
data.oregon.gov
+1more
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of Oregon (2025). Temperature - Historic Daily Time Series [Dataset]. https://catalog.data.gov/dataset/temperature-historic-daily-time-series
Explore at:
Dataset updated
Jan 31, 2025
Dataset provided by
State of Oregon
Description
Annual dataset covering the conterminous U.S., from 1981 to now. Contains spatially gridded annual average daily mean temperature at 4km grid cell resolution. Distribution of the point measurements to the spatial grid was accomplished using the PRISM model, developed and applied by Dr. Christopher Daly of the PRISM Climate Group at Oregon State University.
Integrated Building Health Management - Dataset - NASA Open Data Portal
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nasa.gov (2025). Integrated Building Health Management - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/integrated-building-health-management
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Abstract: Building health management is an important part in running an efficient and cost-effective building. Many problems in a building’s system can go undetected for long periods of time, leading to expensive repairs or wasted resources. This project aims to help detect and diagnose the building‘s health with data driven methods throughout the day. Orca and IMS are two state of the art algorithms that observe an array of building health sensors and provide feedback on the overall system’s health as well as localize the problem to one, or possibly two, components. With this level of feedback the hope is to quickly identify problems and provide appropriate maintenance while reducing the number of complaints and service calls. Introduction: To prepare these technologies for the new installation, the proposed methods are being tested on a current system that behaves similarly to the future green building. Building 241 was determined to best resemble the proposed building 232 and therefore was chosen for this study. Building 241 is currently outfitted with 34 sensors that monitor the heating & cooling temperatures for the air and water systems as well as other various subsystem states. The daily sensor recordings were logged and sent to the IDU group for analysis. The period of analysis was focused from July 1st through August 10th 2009. Methodology: The two algorithms used for analysis were Orca and IMS. Both methods look for anomalies using a distanced based scoring approach. Orca has the ability to use a single data set and find outliers within that data set. This tactic was applied to each day. After scoring each time sample throughout a given day the Orca score profiles were compared by computing the correlation against all other days. Days with high overall correlations were considered normal however days with lower overall correlations were more anomalous. IMS, on the other hand, needs a normal set of data to build a model, which can be applied to a set of test data to asses how anomaly the particular data set is. The typical days identified by Orca were used as the reference/training set for IMS, while all the other days were passed through IMS resulting in an anomaly score profile for each day. The mean of the IMS score profile was then calculated for each day to produce a summary IMS score. These summary scores were ranked and the top outliers were identified (see Figure 1). Once the anomalies were identified the contributing parameters were then ranked by the algorithm. Analysis: The contributing parameters identified by IMS were localized to the return air temperature duct system. -7/03/09 (Figure 2 & 3) AHU-1 Return Air Temperature (RAT) Calculated Average Return Air Temperature -7/19/09 (Figure 3 & 4) AHU-2 Return Air Temperature (RAT) Calculated Average Return Air Temperature IMS identified significantly higher temperatures compared to other days during the month of July and August. Conclusion: The proposed algorithms Orca and IMS have shown that they were able to pick up significant anomalies in the building system as well as diagnose the anomaly by identifying the sensor values that were anomalous. In the future these methods can be used on live streaming data and produce a real time anomaly score to help building maintenance with detection and diagnosis of problems.
Heart Attack Risk Prediction Dataset
kaggle.com
Updated May 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2024). Heart Attack Risk Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/heart-attack-prediction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

The Heart Attack Risk Prediction Dataset serves as a valuable resource for delving into the intricate dynamics of heart health and its predictors. Heart attacks, or myocardial infarctions, continue to be a significant global health issue, necessitating a deeper comprehension of their precursors and potential mitigating factors. This dataset encapsulates a diverse range of attributes including age, cholesterol levels, blood pressure, smoking habits, exercise patterns, dietary preferences, and more, aiming to elucidate the complex interplay of these variables in determining the likelihood of a heart attack. By employing predictive analytics and machine learning on this dataset, researchers and healthcare professionals can work towards proactive strategies for heart disease prevention and management. The dataset stands as a testament to collective efforts to enhance our understanding of cardiovascular health and pave the way for a healthier future.

Content

This synthetic dataset provides a comprehensive array of features relevant to heart health and lifestyle choices, encompassing patient-specific details such as age, gender, cholesterol levels, blood pressure, heart rate, and indicators like diabetes, family history, smoking habits, obesity, and alcohol consumption. Additionally, lifestyle factors like exercise hours, dietary habits, stress levels, and sedentary hours are included. Medical aspects comprising previous heart problems, medication usage, and triglyceride levels are considered. Socioeconomic aspects such as income and geographical attributes like country, continent, and hemisphere are incorporated. The dataset, consisting of 8763 records from patients around the globe, culminates in a crucial binary classification feature denoting the presence or absence of a heart attack risk, providing a comprehensive resource for predictive analysis and research in cardiovascular health.

Dataset Glossary (Column-wise)

Patient ID - Unique identifier for each patient

Age - Age of the patient

Sex - Gender of the patient (Male/Female)

Cholesterol - Cholesterol levels of the patient

Blood Pressure - Blood pressure of the patient (systolic/diastolic)

Heart Rate - Heart rate of the patient

Diabetes - Whether the patient has diabetes (Yes/No)

Family History - Family history of heart-related problems (1: Yes, 0: No)

Smoking - Smoking status of the patient (1: Smoker, 0: Non-smoker)

Obesity - Obesity status of the patient (1: Obese, 0: Not obese)

Alcohol Consumption - Level of alcohol consumption by the patient (None/Light/Moderate/Heavy)

Exercise Hours Per Week - Number of exercise hours per week

Diet - Dietary habits of the patient (Healthy/Average/Unhealthy)

Previous Heart Problems - Previous heart problems of the patient (1: Yes, 0: No)

Medication Use - Medication usage by the patient (1: Yes, 0: No)

Stress Level - Stress level reported by the patient (1-10)

Sedentary Hours Per Day - Hours of sedentary activity per day

Income - Income level of the patient

BMI - Body Mass Index (BMI) of the patient

Triglycerides - Triglyceride levels of the patient

Physical Activity Days Per Week - Days of physical activity per week

Sleep Hours Per Day - Hours of sleep per day

Country - Country of the patient

Continent - Continent where the patient resides

Hemisphere - Hemisphere where the patient resides

Heart Attack Risk - Presence of heart attack risk (1: Yes, 0: No)

Structure of the Dataset

https://i.imgur.com/5cTusqA.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world scenarios. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation.

Cover Photo by: brgfx on Freepik

Thumbnail by: vectorjuice on Freepik
Podcast Listening Time Prediction Dataset
kaggle.com
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yash Patel (2024). Podcast Listening Time Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/ysthehurricane/podcast-listening-time-prediction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yash Patel
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Instructions:

Dataset Name: Podcast Listening Time Prediction

Dataset Description: The dataset contains information about various podcast episodes and their attributes. The goal is to analyze and predict the average listening duration of podcast episodes based on various features.

Columns in the Dataset:

Podcast_Name (Type: string) Description: Names of popular podcasts. Example Values: "Tech Talk", "Health Hour", "Comedy Central"

Episode_Title (Type: string) Description: Titles of the podcast episodes. Example Values: "The Future of AI", "Meditation Tips", "Stand-Up Special"

Episode_Length (Type: float, minutes) Description: Length of the episode in minutes. Example Values: 5.0, 10.0, 30.0, 45.0, 60.0, 90.0

Genre (Type: string) Description: Genre of the podcast episode. Possible Values: "Technology", "Education", "Comedy", "Health", "True Crime", "Business", "Sports", "Lifestyle", "News", "Music"

Host_Popularity (Type: float, scale 0-100) Description: A score indicating the popularity of the host. Example Values: 50.0, 75.0, 90.0

Publication_Day (Type: string) Description: Day of the week the episode was published. Possible Values: "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"

Publication_Time (Type: string) Description: Time of the day the episode was published. Possible Values: "Morning", "Afternoon", "Evening", "Night"

Guest_Popularity (Type: float, scale 0-100) Description: A score indicating the popularity of the guest (if any). Example Values: 20.0, 50.0, 85.0

Number_of_Ads (Type: int) Description: Number of advertisements within the episode. Example Values: 0, 1, 2, 3

Episode_Sentiment (Type: string) Description: Sentiment of the episode's content. Possible Values: "Positive", "Neutral", "Negative"

Listening_Time (Type: float, minutes) Description: The actual average listening duration (target variable). Example Values: 4.5, 8.0, 30.0, 60.0
Healthcare Ransomware Dataset
kaggle.com
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rivalytics (2025). Healthcare Ransomware Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/healthcare-ransomware-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rivalytics
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
📌 Context of the Dataset

The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.

Why is this important?

Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.

📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:

1️⃣ IBM Cost of a Data Breach Report (2024)

The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.

2️⃣ Sophos State of Ransomware in Healthcare (2024)

67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).

3️⃣ Health & Human Services (HHS) Cybersecurity Reports

Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.

4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts

Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.

5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare

The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.

📌 Why is This a Simulated Dataset?

This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.

How It Was Created:

1️⃣ Defining the Dataset Structure

The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.

Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.

2️⃣ Generating Realistic Data Using ChatGPT & Python

ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.

3️⃣ Ensuring Logical Relationships Between Data Points

Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2023). Daily time spent on mobile phones in the U.S. 2019-2024 [Dataset]. https://www.statista.com/statistics/1045353/mobile-device-daily-usage-time-in-the-us/

Daily time spent on mobile phones in the U.S. 2019-2024

Explore at:

31 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Dec 6, 2023

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

United States

Description

The average time spent daily on a phone, not counting talking on the phone, has increased in recent years, reaching a total of 4 hours and 30 minutes as of April 2022. This figure is expected to reach around 4 hours and 39 minutes by 2024.

Clear search

Close search

Google apps

Main menu

Daily time spent on mobile phones in the U.S. 2019-2024

Average daily time spent on social media worldwide 2012-2024

Americans time spent with relationships

Children's screen time, 2 hours per day or less, by sex, household...

Data from: A 24-hour dynamic population distribution dataset based on mobile...

Daily average time spent on various activities by age group and sex, 2015,...

APD Average Response Time by Day and Hour

S3 Dataset

SWI 1.2 Phone Calls by Hold Time, Handled, and Abandoned FY2015-2024

Time use in the UK

Daily average time spent on various activities, by age group and gender,...

Household Energy Consumption

🏡 Household Energy Consumption - April 2025 (90,000 Records)

📌 Overview

📂 Dataset Summary

📚 Libraries Used for Working with household_energy_consumption_2025.csv

🔍 1. Data Manipulation & Analysis

📊 2. Data Visualization

📈 3. Machine Learning / Modeling

🧹 4. Data Preprocessing

🧪 5. Model Evaluation

📈 Potential Use Cases

🔮 Forecasting & Time Series Analysis

💡 Energy Efficiency Analysis

🌡️ Climate Impact Studies

🔌 Peak Load Management

🧠 Machine Learning Projects

🛠️ Example Starter Projects

World Average Degree Days Database

Global Surface Summary of the Day - GSOD

Integrated Building Health Management

Temperature - Historic Daily Time Series

Integrated Building Health Management - Dataset - NASA Open Data Portal

Heart Attack Risk Prediction Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

Podcast Listening Time Prediction Dataset

Healthcare Ransomware Dataset

Daily time spent on mobile phones in the U.S. 2019-2024