21 datasets found

Data from: San Francisco Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
DataSF
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
San Francisco
Description
Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.

This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.

This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).

This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

https://cloud.google.com/bigquery/public-data/sfo-311

https://cloud.google.com/bigquery/public-data/sffd-service-calls

https://cloud.google.com/bigquery/public-data/sfpd-reports

https://cloud.google.com/bigquery/public-data/sfo-trees

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?
NOAA GSOD
kaggle.com
zip
Updated Aug 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA (2019). NOAA GSOD [Dataset]. https://www.kaggle.com/datasets/noaa/gsod
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Aug 30, 2019
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Authors
NOAA
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries.

Content

Over 9000 stations' data are typically available.

The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches)

Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and present, collected from over 9000 stations. Dataset Source: NOAA

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Photo by Allan Nygren on Unsplash
ChatGPT reviews [DAILY UPDATED]
kaggle.com
Updated Jan 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ashish Kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset mainly consists of daily-updated user reviews and ratings for the ChatGPT Android App. It also contains data on the relevancy of these reviews and the dates they were posted.
e
Coronavirus (COVID-19) Mobility Report
data.europa.eu
unknown
Updated Mar 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Fairless (2021). Coronavirus (COVID-19) Mobility Report [Dataset]. https://data.europa.eu/data/datasets/coronavirus-covid-19-mobility-report-2?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
Mar 17, 2021
Dataset authored and provided by
Chris Fairless
Description
Due to changes in the collection and availability of data on COVID-19, this website will no longer be updated. The webpage will no longer be available as of 11 May 2023. On-going, reliable sources of data for COVID-19 are available via the COVID-19 dashboard and the UKHSA

GLA Covid-19 Mobility Report

Since March 2020, London has seen many different levels of restrictions - including three separate lockdowns and many other tiers/levels of restrictions, as well as easing of restrictions and even measures to actively encourage people to go to work, their high streets and local restaurants. This reports gathers data from a number of sources, including google, apple, citymapper, purple wifi and opentable to assess the extent to which these levels of restrictions have translated to a reductions in Londoners' movements.

The data behind the charts below come from different sources. None of these data represent a direct measure of how well people are adhering to the lockdown rules - nor do they provide an exhaustive data set. Rather, they are measures of different aspects of mobility, which together, offer an overall impression of how people Londoners are moving around the capital. The information is broken down by use of public transport, pedestrian activity, retail and leisure, and homeworking.

Public Transport

For the transport measures, we have included data from google, Apple, CityMapper and Transport for London. They measure different aspects of public transport usage - depending on the data source. Each of the lines in the chart below represents a percentage of a pre-pandemic baseline.

https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A15/6b096426c4c582dc9568ed4830b4226d.webp" alt="Embedded Image" />

activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Citymapper Citymapper mobility index 2021-09-05 Compares trips planned and trips taken within its app to a baseline of the four weeks from 6 Jan 2020 7.9% 28% 19% Google Google Mobility Report 2022-10-15 Location data shared by users of Android smartphones, compared time and duration of visits to locations to the median values on the same day of the week in the five weeks from 3 Jan 2020 20.4% 40% 27% TfL Bus Transport for London 2022-10-30 Bus journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 34% 24% TfL Tube Transport for London 2022-10-30 Tube journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 30% 21% Pedestrian activity

With the data we currently have it's harder to estimate pedestrian activity and high street busyness. A few indicators can give us information on how people are making trips out of the house:

https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A15/bcf082c07e4d7ff5202012f0a97abc3a.webp" alt="Embedded Image" />

activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Walking Apple Mobility Index 2021-11-09 estimates the frequency of trips made on foot compared to baselie of 13 Jan '20 22% 47% 36% Parks Google Mobility Report 2022-10-15 Frequency of trips to parks. Changes in the weather mean this varies a lot. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail & Rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail and recreation

In this section, we focus on estimated footfall to shops, restaurants, cafes, shopping centres and so on.

https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A16/b62d60f723eaafe64a989e4afec4c62b.webp" alt="Embedded Image" />

activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Grocery/pharmacy Google Mobility Report 2022-10-15 Estimates frequency of trips to grovery shops and pharmacies. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Retail/rec <a href="https://ww
Day & night temperatures, 50yrs, 1666ws, TFRecord
kaggle.com
zip
Updated Nov 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Görner (2019). Day & night temperatures, 50yrs, 1666ws, TFRecord [Dataset]. https://www.kaggle.com/datasets/mgorner/day-night-temperatures-50yrs-1666ws-tfrecord
Explore at:
zip(160157825 bytes)Available download formats
Dataset updated
Nov 9, 2019
Authors
Martin Görner
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
This dataset is a cleaned-up extract from the following public BigQuery dataset: https://console.cloud.google.com/marketplace/details/noaa-public/ghcn-d

The dataset contains daily min/max temperatures from a selection of 1666 weather stations. The data spans exactly 50 years. Missing values have been interpolated and are marked as such.

This dataset is in TFRecord format.

About the original dataset: NOAA’s Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. The data are obtained from more than 20 sources. The GHCN-Daily is an integrated database of daily climate summaries from land surface stations across the globe, and is comprised of daily climate records from over 100,000 stations in 180 countries and territories, and includes some data from every year since 1763.
Google Safe Browsing Transparency Report Data
kaggle.com
Updated Nov 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob Rose (2019). Google Safe Browsing Transparency Report Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/784868
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/784868
Dataset updated
Nov 8, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rob Rose
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

I wanted to make this for potentially using as a helper dataset in the Microsoft Malware Prediction competition. I was also inspired by Kaggle's new ability to create datasets from the outputs of Kernels, which is something I leveraged here.

Content

The data is the full data found on the Google Safe Browsing Transparency Report web page. There is plenty of missing data, sometimes the source data doesn't start for a while and there are periodic gaps for unspecified reasons. It's up to you to determine what to do with those gaps. The reinfection rate has been multiplied by 100 and converted to an int in order to signify percentage.

Acknowledgements

Thanks to @rquintino for publishing the splits for the Microsoft competition that originally inspired me to gather this data. And @cdeotte who originally published some scraped datasets in the Microsoft competition, see this discussion post for details.

Inspiration

I hope some people find this useful! For the Microsoft challenge or any future challenges! Please leave an upvote here or on the source kernel if you found it useful! I plan to rerun the source kernel weekly on Fridays. I hope Kaggle in the future enables some way to automate that, but for now I just do it manually. If the data is stale, feel free to ping me in the discussions section or on the source kernel and I'll run it.
Chicago Crime
kaggle.com
zip
Updated Apr 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2018). Chicago Crime [Dataset]. https://www.kaggle.com/datasets/chicago/chicago-crime
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 17, 2018
Dataset authored and provided by
City of Chicago
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
Context

Approximately 10 people are shot on an average day in Chicago.

http://www.chicagotribune.com/news/data/ct-shooting-victims-map-charts-htmlstory.html http://www.chicagotribune.com/news/local/breaking/ct-chicago-homicides-data-tracker-htmlstory.html http://www.chicagotribune.com/news/local/breaking/ct-homicide-victims-2017-htmlstory.html

Content

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. This data includes unverified reports supplied to the Police Department. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time.

Update Frequency: Daily

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:chicago_crime

https://cloud.google.com/bigquery/public-data/chicago-crime-data

Dataset Source: City of Chicago

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source —https://data.cityofchicago.org — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by Ferdinand Stohr from Unplash.

Inspiration

What categories of crime exhibited the greatest year-over-year increase between 2015 and 2016?

Which month generally has the greatest number of motor vehicle thefts?

How does temperature affect the incident rate of violent crime (assault or battery)?

https://cloud.google.com/bigquery/images/chicago-scatter.png" alt=""> https://cloud.google.com/bigquery/images/chicago-scatter.png
Google Stock History
kaggle.com
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PavanKalyan (2023). Google Stock History [Dataset]. https://www.kaggle.com/pavan9065/google-stock-history
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PavanKalyan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Google, one of the greatest gifts to mankind. Any information that you need today is available on Google. Google is a household name and literally, everyone is aware of what Google is. It helps you get resources for your school projects, helps you shop online and much more. Google has made getting an education a lot easier for people across the globe. No matter where you are, you can access google provided you have internet. Every piece of info is available on google and it's all one click away. But Google has a parent company known as Alphabet Inc. that trades and here we have stock data from A Alphabet Inc.

Content

This data set has 7 columns with all the necessary values such as the opening price of the stock, the closing price of it, its highest in the day and much more. It has date wise data of the stock starting from 2004 to 2023(October).
GOOGLE MOBILITY DATA
kaggle.com
Updated Oct 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AiswaryaRamachandran (2025). GOOGLE MOBILITY DATA [Dataset]. https://www.kaggle.com/aiswaryaramachandran/google-mobility-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AiswaryaRamachandran
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

As global communities respond to COVID-19, we've heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps could be helpful as they make critical decisions to combat COVID-19.

These Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. (https://www.google.com/covid19/mobility/)

Content

The data contains aggregated and anonymised aggregated data per day for each country. For say accessing data for India - the files 2020_IN_Region_Mobility_Report.csv for 2020 data and 2021_IN_Region_Mobility_Report.csv. The aggregated data is not only present at country level, but also at States and district level - as given in sub_region_1 and sub_region_2.

Acknowledgements

This data from report published by Google. https://www.google.com/covid19/mobility/

Inspiration

Some Questions to answer

India is having its Second Wave and one of the major causes is considered to the election rallies held in different parts of the country. How does Mobility Impact the COVID Cases?

Comparing Mobility across different Countries
COVID19 - The New York Times
kaggle.com
zip
Updated May 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). COVID19 - The New York Times [Dataset]. https://www.kaggle.com/bigquery/covid19-nyt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site

Sample Queries

Query 1

Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.

SELECT covid19.county, covid19.state_name, total_pop AS county_population, confirmed_cases, ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000, deaths, ROUND(deaths/total_pop *100000,2) AS deaths_per_100000 FROM bigquery-public-data.covid19_nyt.us_counties covid19 JOIN bigquery-public-data.census_bureau_acs.county_2017_5yr acs ON covid19.county_fips_code = acs.geo_id WHERE date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day) AND covid19.county_fips_code != "00000" ORDER BY confirmed_cases_per_100000 desc

Query 2

How do I calculate the number of new COVID-19 cases per day? This query determines the total number of new cases in each state for each day available in the dataset SELECT b.state_name, b.date, MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases FROM (SELECT state_name AS state, state_fips_code , confirmed_cases, DATE_ADD(date, INTERVAL 1 day) AS date_shift FROM bigquery-public-data.covid19_nyt.us_states WHERE confirmed_cases + deaths > 0) a JOIN bigquery-public-data.covid19_nyt.us_states b ON a.state_fips_code = b.state_fips_code AND a.date_shift = b.date GROUP BY b.state_name, date ORDER BY date desc
n
Data from: Recognizing the importance of near-home contact with nature for...
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+2more
zip
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Magdalena Lenda; Piotr Skórka; Małgorzata Jaźwa; Hsien-Yung Lin; Edward Nęcka; Piotr Tryjanowski; Dawid Moroń; Johannes M. H. Knops; Hugh P. Possingham (2023). Recognizing the importance of near-home contact with nature for mental well-being based on the COVID-19 lockdown experience [Dataset]. http://doi.org/10.5061/dryad.fn2z34v1h
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.fn2z34v1h
Dataset updated
Aug 29, 2023
Dataset provided by
The University of Queensland
Institute of Nature Conservation
University of Opole
Uniwersytet SWPS
Institute of Systematics and Evolution of Animals
Carleton University
University of Life Sciences in Poznań
Xi’an Jiaotong-Liverpool University
Authors
Magdalena Lenda; Piotr Skórka; Małgorzata Jaźwa; Hsien-Yung Lin; Edward Nęcka; Piotr Tryjanowski; Dawid Moroń; Johannes M. H. Knops; Hugh P. Possingham
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Several urban landscape planning solutions have been introduced around the world to find a balance between developing urban spaces, maintaining and restoring biodiversity, and enhancing quality of human life. Our global mini-review, combined with analysis of big data collected from Google Trends at global scale, reveals the importance of enjoying day-to-day contact with nature and engaging in such activities as nature observation and identification and gardening for the mental well-being of humans during the COVID-19 pandemic. Home-based activities, such as watching birds from one’s window, identifying species of plants and animals, backyard gardening, and collecting information about nature for citizen science projects, were popular during the first lockdown in spring 2020, when people could not easily venture out of their homes. In our mini-review, we found 37 articles from 28 countries with a total sample of 114,466 people. These papers suggest that home-based engagement with nature was an entertaining and pleasant distraction that helped preserve mental well-being during a challenging time. According to Google Trends, interest in such activities increased during lockdown compared to the previous five years. Millions of people worldwide are chronically or temporarily confined to their homes and neighborhoods because of illness, childcare chores, or elderly care responsibility, which makes it difficult for them to travel far to visit such places as national parks, created through land sparing, where people go to enjoy nature and relieve stress. This article posits that for such people, living in an urban landscape designed to facilitate effortless contact with small natural areas is a more effective way to receive the mental health benefits of contact with nature than visiting a sprawling nature park on rare occasions. Methods 1. Identifying the most common types of activities related to nature observation, gardening, and taxa identification during the first lockdown based on scientific articles and non-scientific press For scientific articles, in March 2023 we searched Scopus and Google Scholar. For countries where Google is restricted, such as China, similar results will be available from other scientific browsers, with the highest number of results from our database being available from Scopus. We used the Google Search browser to search for globally published non-scientific press articles. Some selection criteria were applied during article review. Specifically, we excluded articles that were not about the first lockdown; did not study activities at a local scale (from balcony, window, backyard) but rather in areas far away from home (e.g., visiting forests); studied the mental health effect of observing indoor potted plants and pet animals; or transiently mentioned the topic or keyword without going into any scientific detail. We included all papers that met our criteria, that is, studies that analyzed our chosen topic with experiments or planned observations. We included all research papers, but not letters that made claims without any data. Google Scholar automatically screened the title, abstract, keywords, and the whole text of each article for the keywords we entered. All articles that met our criteria were read and double-checked for keywords and content related to the keywords (e.g., synonyms or if they presented content about the relevant topic without using the specific keywords). We identified, from both types of articles, the major nature-based activities that people engaged in during the first lockdown in the spring of 2020. Keywords used in this study were grouped into six main topics: (1) COVID-19 pandemic; (2) nature-oriented activity focused on nature observation, identification of different taxa, or gardening; (3) mental well-being; (4) activities performed from a balcony, window, or in gardens; (5) entertainment; and (6) citizen science (see Table 1 for all keywords). 2. Increase in global trends in interest in nature observation, gardening, and taxa identification during the first lockdown We used the categorical cluster method, which was combined with big data from Google Trends (downloaded on 1 September 2020) and anomaly detection to identify trend anomalies globally in peoples’ interests. We used this combination of methods to examine whether interest in nature-based activities that were mentioned in scientific and nonscientific press articles increased during the first lockdown. Keywords linked with the main types of nature-oriented activities, as identified from press and scientific articles, and used according to the categorical clustering method were classified into the following six main categories: (1) global interest in bird-watching and bird identification combined with citizen science; (2) global interest in plant identification and gardening combined with citizen science; (3) global interest in butterfly watching, (4) local interest in early-spring (lockdown time), summer, or autumn flowering species that usually can be found in Central European (country: Poland) backyards; (5) global interest in traveling and social activities; and (6) global interest in nature areas and activities typically enjoyed during holidays and thus requiring traveling to land-spared nature reserves. The six categories were divided into 15 subcategories so that we could attach relevant words or phrases belonging to the same cluster and typically related to the activity (according to Google Trends and Google browser’s automatic suggestions; e.g., people who searched for “bird-watching” typically also searched for “binoculars,” “bird feeder,” “bird nest,” and “birdhouse”). The subcategories and keywords used for data collection about trends in society’s interest in the studied topic from Google Trends are as follows.

Bird-watching: “binoculars,” “bird feeder,” “bird nest,” “birdhouse,” “bird-watching”; Bird identification: “bird app,” “bird identification,” “bird identification app,” “bird identifier,” “bird song app”; Bird-watching combined with citizen science: “bird guide,” “bird identification,” “eBird,” “feeding birds,” “iNaturalist”; Citizen science and bird-watching apps: “BirdNET,” “BirdSong ID,” “eBird,” “iNaturalist,” “Merlin Bird ID”; Gardening: “gardening,” “planting,” “seedling,” “seeds,” “soil”; Shopping for gardening: “garden shop,” “plant buy,” “plant ebay,” “plant sell,” “plant shop”; Plant identification apps: “FlowerChecker,” “LeafSnap,” “NatureGate,” “Plantifier,” “PlantSnap”; Citizen science and plant identification: “iNaturalist,” “plant app,” “plant check,” “plant identification app,” “plant identifier”; Flowers that were flowering in gardens during lockdown in Poland: “fiołek” (viola), “koniczyna” (shamrock), “mlecz” (dandelion), “pierwiosnek” (primose), “stokrotka” (daisy). They are typical early-spring flowers growing in the gardens in Central Europe. We had to be more specific in this search because there are no plant species blooming across the world at the same time. These plant species have well-known biology; thus, we could easily interpret these results; Flowers that were not flowering during lockdown in Poland: “chaber” (cornflower), “mak” (poppy), “nawłoć” (goldenrod), “róża” (rose), “rumianek” (chamomile). They are typical mid-summer flowering plants often planted in gardens; Interest in traveling long distances and in social activities that involve many people: “airport,” “bus,” “café,” “driving,” “pub”; Single or mass commuting, and traveling: “bike,” “boat,” “car,” “flight,” “train”; Interest in distant places and activities for visiting natural areas: “forest,” “nature park,” “safari,” “trekking,” “trip”; Places and activities for holidays (typically located far away): “coral reef,” “rainforest,” “safari,” “savanna,” “snorkeling”; Butterfly watching: “butterfly watching,” “butterfly identification,” “butterfly app,” “butterfly net,” “butterfly guide”;

In Google Trends, we set the following filters: global search, dates: July 2016–July 2020; language: English.
Inflation Drives People to Google Negative Concepts (Forecast)
kappasignal.com
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2023). Inflation Drives People to Google Negative Concepts (Forecast) [Dataset]. https://www.kappasignal.com/2023/06/inflation-drives-people-to-google.html
Explore at:
Dataset updated
Jun 11, 2023
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Inflation Drives People to Google Negative Concepts

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
f
Data from: Fine-Scale Spatiotemporal Air Pollution Analysis Using Mobile...
tandf.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yawen Guan; Margaret C. Johnson; Matthias Katzfuss; Elizabeth Mannshardt; Kyle P. Messier; Brian J. Reich; Joon J. Song (2023). Fine-Scale Spatiotemporal Air Pollution Analysis Using Mobile Monitors on Google Street View Vehicles [Dataset]. http://doi.org/10.6084/m9.figshare.10113239.v3
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.10113239.v3
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Yawen Guan; Margaret C. Johnson; Matthias Katzfuss; Elizabeth Mannshardt; Kyle P. Messier; Brian J. Reich; Joon J. Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. To make informed decisions on their day-to-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift in measurement technologies. A methodological framework utilizing these increasingly fine-scale measurements to provide real-time air pollution maps and short-term air quality forecasts on a fine-resolution spatial scale could prove to be instrumental in increasing public awareness and understanding. The Google Street View study provides a unique source of data with spatial and temporal complexities, with the potential to provide information about commuter exposure and hot spots within city streets with high traffic. We develop a computationally efficient spatiotemporal model for these data and use the model to make short-term forecasts and high-resolution maps of current air pollution levels. We also show via an experiment that mobile networks can provide more nuanced information than an equally sized fixed-location network. This modeling framework has important real-world implications in understanding citizens’ personal environments, as data production and real-time availability continue to be driven by the ongoing development and improvement of mobile measurement technologies. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Data from: Novel Corona Virus 2019 Dataset
kaggle.com
zip
Updated Jan 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SRK (2020). Novel Corona Virus 2019 Dataset [Dataset]. https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
Explore at:
zip(3155 bytes)Available download formats
Dataset updated
Jan 30, 2020
Authors
SRK
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. This data is extracted from the same link and made available in csv format.

Content

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus.

The data is available from 22 Jan 2020.

Acknowledgements

Johns Hopkins university has made the data available in google sheets format here. Sincere thanks to them.

Thanks to WHO, CDC, NHC and DXY for making the data available in first place.

Picture courtesy : Johns Hopkins University dashboard

Inspiration

Some insights could be

Changes in number of affected cases over time

Change in cases over time at country level

Latest number of affected cases
Bellabeat Case Study Capstone Steps vs Sleep
kaggle.com
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brennan Grout (2022). Bellabeat Case Study Capstone Steps vs Sleep [Dataset]. https://www.kaggle.com/datasets/brennangrout/bellabeat-case-study-capstone-steps-vs-sleep
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2022
Dataset provided by
Kaggle
Authors
Brennan Grout
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The data set and tools can be found at the GitHub link here:https://github.com/groutbrennan/cleaning-data-with-r/tree/master/capstone_data/working_data

This dataset contains: - Data - R markdown - R analysis and cleaning scripts - Final gpplot scatterplot viz image

This dataset was created as part of the Google data analysis course presented by Coursera comparing how people use their smart devices to track their daily health.

After reviewing the initial data, my hypothesis was people who walk more sleep longer.

However after cleaning, transforming, and analyzing the data, I found people who took more steps during the day actually slept less total minutes than people who took lesser steps. After this conclusion I found there was a correlation between more steps taken during the day and less minutes need to sleep at night. However, I don't have proof that this is the causation. Further research will need to be done to confirm that this is the case.
Atlanta Crime Data 2009 - Present
kaggle.com
Updated Dec 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peng Chen charles (2020). Atlanta Crime Data 2009 - Present [Dataset]. https://www.kaggle.com/pengchencharles/atlanta-crime-data2020/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Peng Chen charles
Area covered
Atlanta
Description
Context

A majority of crime happened at Downtown and Midtown in 2020.

Content

This dataset reflects reported incidents of crime that occurred in the City of Atlanta from 2009 to present. Data is extracted from Atlanta Police Department's official website. This data includes unverified reports supplied to the Police Department. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, Atlanta Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time.

Update Frequency: Daily

Fork this kernel to get started.

Acknowledgements

https://www.atlantapd.org/i-want-to/crime-data-downloads

Dataset Source: City of Atlanta

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source —https://www.atlantapd.org/i-want-to/crime-data-downloads — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by https://wallpapermemory.com/199170

Inspiration

What categories of crime exhibited the greatest year-over-year increase between 2015 and 2016?

Which month generally has the greatest number of motor vehicle thefts?

How does temperature affect the incident rate of violent crime (assault or battery)?

Facebook: distribution of global audiences 2024, by age and gender

statista.com
de.statista.com
+3more

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

              Facebook connects the world

              Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
              as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

Covid19 Dataset (Worldwide cases 2019-20)
kaggle.com
zip
Updated Dec 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivekkumar Gediya (2020). Covid19 Dataset (Worldwide cases 2019-20) [Dataset]. https://www.kaggle.com/vivekgediya/covid19-case-worldwide-cases-till-30th-dec20
Explore at:
zip(327132 bytes)Available download formats
Dataset updated
Dec 31, 2020
Authors
Vivekkumar Gediya
Description
Context

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

Edited

Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

Content 2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020 to 30 Dec, 2020.

Sources

JHU confirmed covid datasets.
Top 200 Youtubers Data (cleaned)
kaggle.com
Updated Jul 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syed Jafer (2022). Top 200 Youtubers Data (cleaned) [Dataset]. https://www.kaggle.com/syedjaferk/top-200-youtubers-cleaned/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Syed Jafer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
YouTube is an American online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second most visited website, after Google Search. YouTube has more than 2.5 billion monthly users who collectively watch more than one billion hours of videos each day. As of May 2019, videos were being uploaded at a rate of more than 500 hours of content per minute.

Youtube is very much used to influence, educate, free university (for me also) people (the users followers) in a particular way for a specific issue - which can impact the order in some ways.
Most valuable media & entertainment brands worldwide 2024
statista.com
es.statista.com
+3more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julia Faria, Most valuable media & entertainment brands worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Julia Faria
Description
In 2024, Google ranked as the most valuable media and entertainment brand worldwide, with a brand value of 683 billion U.S. dollars. Facebook ranked second, valued at around 167 billion dollars. Part of the Tencent Group, WeChat and v.qq.com (Tencent Video) had a brand value of 56 billion and 17.5 billion dollars, respectively.

Facebook

Twitter

Click to copy link

Link copied

Cite

DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco

Data from: San Francisco Open Data

San Francisco Open Data (BigQuery Dataset)

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Mar 20, 2019

Dataset authored and provided by

DataSF

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

San Francisco

Description

Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.
This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).
This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?

Clear search

Close search

Google apps

Main menu

Data from: San Francisco Open Data

Context

Content

Acknowledgements

Inspiration

NOAA GSOD

Overview

Content

Querying BigQuery tables

Acknowledgements

ChatGPT reviews [DAILY UPDATED]

Coronavirus (COVID-19) Mobility Report

GLA Covid-19 Mobility Report

Public Transport

Day & night temperatures, 50yrs, 1666ws, TFRecord

Google Safe Browsing Transparency Report Data

Context

Content

Acknowledgements

Inspiration

Chicago Crime

Context

Content

Acknowledgements

Inspiration

Google Stock History

Context

Content

GOOGLE MOBILITY DATA

Context

Content

Acknowledgements

Inspiration

COVID19 - The New York Times

Context

Sample Queries

Query 1

Query 2

Data from: Recognizing the importance of near-home contact with nature for...

Inflation Drives People to Google Negative Concepts (Forecast)

Inflation Drives People to Google Negative Concepts

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Data from: Fine-Scale Spatiotemporal Air Pollution Analysis Using Mobile...

Data from: Novel Corona Virus 2019 Dataset

Context

Content

Acknowledgements

Inspiration

Bellabeat Case Study Capstone Steps vs Sleep

Atlanta Crime Data 2009 - Present

Context

Content

Acknowledgements

Inspiration

Facebook: distribution of global audiences 2024, by age and gender

Covid19 Dataset (Worldwide cases 2019-20)

Context

Edited

Sources

Top 200 Youtubers Data (cleaned)

Most valuable media & entertainment brands worldwide 2024

Data from: San Francisco Open Data

San Francisco Open Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration