36 datasets found

Data from: San Francisco Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
DataSF
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
San Francisco
Description
Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.

This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.

This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).

This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

https://cloud.google.com/bigquery/public-data/sfo-311

https://cloud.google.com/bigquery/public-data/sffd-service-calls

https://cloud.google.com/bigquery/public-data/sfpd-reports

https://cloud.google.com/bigquery/public-data/sfo-trees

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?
NOAA GSOD
kaggle.com
zip
Updated Aug 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA (2019). NOAA GSOD [Dataset]. https://www.kaggle.com/datasets/noaa/gsod
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Aug 30, 2019
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Authors
NOAA
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries.

Content

Over 9000 stations' data are typically available.

The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches)

Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and present, collected from over 9000 stations. Dataset Source: NOAA

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Photo by Allan Nygren on Unsplash
Chicago Crime
kaggle.com
zip
Updated Apr 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2018). Chicago Crime [Dataset]. https://www.kaggle.com/chicago/chicago-crime
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 17, 2018
Dataset authored and provided by
City of Chicago
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
Context

Approximately 10 people are shot on an average day in Chicago.

http://www.chicagotribune.com/news/data/ct-shooting-victims-map-charts-htmlstory.html http://www.chicagotribune.com/news/local/breaking/ct-chicago-homicides-data-tracker-htmlstory.html http://www.chicagotribune.com/news/local/breaking/ct-homicide-victims-2017-htmlstory.html

Content

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. This data includes unverified reports supplied to the Police Department. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time.

Update Frequency: Daily

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:chicago_crime

https://cloud.google.com/bigquery/public-data/chicago-crime-data

Dataset Source: City of Chicago

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source —https://data.cityofchicago.org — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by Ferdinand Stohr from Unplash.

Inspiration

What categories of crime exhibited the greatest year-over-year increase between 2015 and 2016?

Which month generally has the greatest number of motor vehicle thefts?

How does temperature affect the incident rate of violent crime (assault or battery)?

https://cloud.google.com/bigquery/images/chicago-scatter.png" alt=""> https://cloud.google.com/bigquery/images/chicago-scatter.png
Google Stock History
kaggle.com
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PavanKalyan (2023). Google Stock History [Dataset]. https://www.kaggle.com/pavan9065/google-stock-history
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PavanKalyan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Google, one of the greatest gifts to mankind. Any information that you need today is available on Google. Google is a household name and literally, everyone is aware of what Google is. It helps you get resources for your school projects, helps you shop online and much more. Google has made getting an education a lot easier for people across the globe. No matter where you are, you can access google provided you have internet. Every piece of info is available on google and it's all one click away. But Google has a parent company known as Alphabet Inc. that trades and here we have stock data from A Alphabet Inc.

Content

This data set has 7 columns with all the necessary values such as the opening price of the stock, the closing price of it, its highest in the day and much more. It has date wise data of the stock starting from 2004 to 2023(October).

Wordle Answer Search Trends Dataset (2021–2025)

kaggle.com

Updated Jun 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 26, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ankush Kamboj

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

🔍 Hypothesis

Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

This dataset supports exploration of:

Wordle Answers
Trends for wordle answers
Correlation between wordle answer rarity and search interest

Columns

Column	Description
`date`	Date of the Wordle puzzle
`word`	Correct 5-letter Wordle answer
`game`	Wordle game number
`wordfreq_commonality`	Normalized frequency score using Python’s `wordfreq` library
`subtlex_commonality`	Normalized frequency score using SUBTLEX-US dataset
`trend_day_global`	Google search interest on the day (global, all categories)
`trend_avg_200_global`	200-day average search interest (global, all categories)
`trend_day_language`	Search interest on Wordle day (Language Resources category)
`trend_avg_200_language`	200-day average search interest (Language Resources category)

Notes: - All trend values are relative (0–100 scale, per Google Trends)

🧮 Methodology

Wordle answers were scraped from wordfinder.yourdictionary.com
Commonality scores were computed using:
- wordfreq Python library
- SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
Trend data was fetched using Google Trends API via pytrends

📊 Analysis

Can find analysis done using this data in the blog post

FiveThirtyEight Daily Show Guests Dataset

kaggle.com

zip

Updated Jan 13, 2019

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

FiveThirtyEight (2019). FiveThirtyEight Daily Show Guests Dataset [Dataset]. https://www.kaggle.com/fivethirtyeight/fivethirtyeight-daily-show-guests-dataset

Explore at:

zip(37571 bytes)Available download formats

Dataset updated

Jan 13, 2019

Dataset authored and provided by

FiveThirtyEighthttps://abcnews.go.com/538

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Content

Daily Show Guests

This folder contains data behind the story Every Guest Jon Stewart Ever Had On ‘The Daily Show’.

Header	Definition
`YEAR`	The year the episode aired
`GoogleKnowlege_Occupation`	Their occupation or office, according to Google's Knowledge Graph or, if they're not in there, how Stewart introduced them on the program.
`Show`	Air date of episode. Not unique, as some shows had more than one guest
`Group`	A larger group designation for the occupation. For instance, us senators, us presidents, and former presidents are all under "politicians"
`Raw_Guest_List`	The person or list of people who appeared on the show, according to Wikipedia. The GoogleKnowlege_Occupation only refers to one of them in a given row.

Source: Google Knowlege Graph, The Daily Show clip library, Wikipedia.

Context

This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using GitHub's API and Kaggle's API.

This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

Cover photo by Oscar Nord on Unsplash
Unsplash Images are distributed under a unique Unsplash License.

a
PerCapita CO2 Footprint InDioceses FULL
hub.arcgis.com
catholic-geo-hub-cgisc.hub.arcgis.com
Updated Sep 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
burhansm2 (2019). PerCapita CO2 Footprint InDioceses FULL [Dataset]. https://hub.arcgis.com/content/95787df270264e6ea1c99ffa6ff844ff
Explore at:
Dataset updated
Sep 23, 2019
Dataset authored and provided by
burhansm2
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Area covered
Description
PerCapita_CO2_Footprint_InDioceses_FULLBurhans, Molly A., Cheney, David M., Gerlt, R.. . “PerCapita_CO2_Footprint_InDioceses_FULL”. Scale not given. Version 1.0. MO and CT, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2019.MethodologyThis is the first global Carbon footprint of the Catholic population. We will continue to improve and develop these data with our research partners over the coming years. While it is helpful, it should also be viewed and used as a "beta" prototype that we and our research partners will build from and improve. The years of carbon data are (2010) and (2015 - SHOWN). The year of Catholic data is 2018. The year of population data is 2016. Care should be taken during future developments to harmonize the years used for catholic, population, and CO2 data.1. Zonal Statistics: Esri Population Data and Dioceses --> Population per dioceses, non Vatican based numbers2. Zonal Statistics: FFDAS and Dioceses and Population dataset --> Mean CO2 per Diocese3. Field Calculation: Population per Diocese and Mean CO2 per diocese --> CO2 per Capita4. Field Calculation: CO2 per Capita * Catholic Population --> Catholic Carbon FootprintAssumption: PerCapita CO2Deriving per-capita CO2 from mean CO2 in a geography assumes that people's footprint accounts for their personal lifestyle and involvement in local business and industries that are contribute CO2. Catholic CO2Assumes that Catholics and non-Catholic have similar CO2 footprints from their lifestyles.Derived from:A multiyear, global gridded fossil fuel CO2 emission data product: Evaluation and analysis of resultshttp://ffdas.rc.nau.edu/About.htmlRayner et al., JGR, 2010 - The is the first FFDAS paper describing the version 1.0 methods and results published in the Journal of Geophysical Research.Asefi et al., 2014 - This is the paper describing the methods and results of the FFDAS version 2.0 published in the Journal of Geophysical Research.Readme version 2.2 - A simple readme file to assist in using the 10 km x 10 km, hourly gridded Vulcan version 2.2 results.Liu et al., 2017 - A paper exploring the carbon cycle response to the 2015-2016 El Nino through the use of carbon cycle data assimilation with FFDAS as the boundary condition for FFCO2."S. Asefi‐Najafabady P. J. Rayner K. R. Gurney A. McRobert Y. Song K. Coltin J. Huang C. Elvidge K. BaughFirst published: 10 September 2014 https://doi.org/10.1002/2013JD021296 Cited by: 30Link to FFDAS data retrieval and visualization: http://hpcg.purdue.edu/FFDAS/index.phpAbstractHigh‐resolution, global quantification of fossil fuel CO2 emissions is emerging as a critical need in carbon cycle science and climate policy. We build upon a previously developed fossil fuel data assimilation system (FFDAS) for estimating global high‐resolution fossil fuel CO2 emissions. We have improved the underlying observationally based data sources, expanded the approach through treatment of separate emitting sectors including a new pointwise database of global power plants, and extended the results to cover a 1997 to 2010 time series at a spatial resolution of 0.1°. Long‐term trend analysis of the resulting global emissions shows subnational spatial structure in large active economies such as the United States, China, and India. These three countries, in particular, show different long‐term trends and exploration of the trends in nighttime lights, and population reveal a decoupling of population and emissions at the subnational level. Analysis of shorter‐term variations reveals the impact of the 2008–2009 global financial crisis with widespread negative emission anomalies across the U.S. and Europe. We have used a center of mass (CM) calculation as a compact metric to express the time evolution of spatial patterns in fossil fuel CO2 emissions. The global emission CM has moved toward the east and somewhat south between 1997 and 2010, driven by the increase in emissions in China and South Asia over this time period. Analysis at the level of individual countries reveals per capita CO2 emission migration in both Russia and India. The per capita emission CM holds potential as a way to succinctly analyze subnational shifts in carbon intensity over time. Uncertainties are generally lower than the previous version of FFDAS due mainly to an improved nightlight data set."Global Diocesan Boundaries:Burhans, M., Bell, J., Burhans, D., Carmichael, R., Cheney, D., Deaton, M., Emge, T. Gerlt, B., Grayson, J., Herries, J., Keegan, H., Skinner, A., Smith, M., Sousa, C., Trubetskoy, S. “Diocesean Boundaries of the Catholic Church” [Feature Layer]. Scale not given. Version 1.2. Redlands, CA, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2016.Using: ArcGIS. 10.4. Version 10.0. Redlands, CA: Environmental Systems Research Institute, Inc., 2016.Boundary ProvenanceStatistics and Leadership DataCheney, D.M. “Catholic Hierarchy of the World” [Database]. Date Updated: August 2019. Catholic Hierarchy. Using: Paradox. Retrieved from Original Source.Catholic HierarchyAnnuario Pontificio per l’Anno .. Città del Vaticano :Tipografia Poliglotta Vaticana, Multiple Years.The data for these maps was extracted from the gold standard of Church data, the Annuario Pontificio, published yearly by the Vatican. The collection and data development of the Vatican Statistics Office are unknown. GoodLands is not responsible for errors within this data. We encourage people to document and report errant information to us at data@good-lands.org or directly to the Vatican.Additional information about regular changes in bishops and sees comes from a variety of public diocesan and news announcements.GoodLands’ polygon data layers, version 2.0 for global ecclesiastical boundaries of the Roman Catholic Church:Although care has been taken to ensure the accuracy, completeness and reliability of the information provided, due to this being the first developed dataset of global ecclesiastical boundaries curated from many sources it may have a higher margin of error than established geopolitical administrative boundary maps. Boundaries need to be verified with appropriate Ecclesiastical Leadership. The current information is subject to change without notice. No parties involved with the creation of this data are liable for indirect, special or incidental damage resulting from, arising out of or in connection with the use of the information. We referenced 1960 sources to build our global datasets of ecclesiastical jurisdictions. Often, they were isolated images of dioceses, historical documents and information about parishes that were cross checked. These sources can be viewed here:https://docs.google.com/spreadsheets/d/11ANlH1S_aYJOyz4TtG0HHgz0OLxnOvXLHMt4FVOS85Q/edit#gid=0To learn more or contact us please visit: https://good-lands.org/Esri Gridded Population Data 2016DescriptionThis layer is a global estimate of human population for 2016. Esri created this estimate by modeling a footprint of where people live as a dasymetric settlement likelihood surface, and then assigned 2016 population estimates stored on polygons of the finest level of geography available onto the settlement surface. Where people live means where their homes are, as in where people sleep most of the time, and this is opposed to where they work. Another way to think of this estimate is a night-time estimate, as opposed to a day-time estimate.Knowledge of population distribution helps us understand how humans affect the natural world and how natural events such as storms and earthquakes, and other phenomena affect humans. This layer represents the footprint of where people live, and how many people live there.Dataset SummaryEach cell in this layer has an integer value with the estimated number of people likely to live in the geographic region represented by that cell. Esri additionally produced several additional layers World Population Estimate Confidence 2016: the confidence level (1-5) per cell for the probability of people being located and estimated correctly. World Population Density Estimate 2016: this layer is represented as population density in units of persons per square kilometer.World Settlement Score 2016: the dasymetric likelihood surface used to create this layer by apportioning population from census polygons to the settlement score raster.To use this layer in analysis, there are several properties or geoprocessing environment settings that should be used:Coordinate system: WGS_1984. This service and its underlying data are WGS_1984. We do this because projecting population count data actually will change the populations due to resampling and either collapsing or splitting cells to fit into another coordinate system. Cell Size: 0.0013474728 degrees (approximately 150-meters) at the equator. No Data: -1Bit Depth: 32-bit signedThis layer has query, identify, pixel, and export image functions enabled, and is restricted to a maximum analysis size of 30,000 x 30,000 pixels - an area about the size of Africa.Frye, C. et al., (2018). Using Classified and Unclassified Land Cover Data to Estimate the Footprint of Human Settlement. Data Science Journal. 17, p.20. DOI: http://doi.org/10.5334/dsj-2018-020.What can you do with this layer?This layer is unsuitable for mapping or cartographic use, and thus it does not include a convenient legend. Instead, this layer is useful for analysis, particularly for estimating counts of people living within watersheds, coastal areas, and other areas that do not have standard boundaries. Esri recommends using the Zonal Statistics tool or the Zonal Statistics to Table tool where you provide input zones as either polygons, or raster data, and the tool will summarize the count of population within those zones. https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/data-management/2016-world-population-estimate-services-are-now-available/
g
Usage metrics of the TousAntiCovid application
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Usage metrics of the TousAntiCovid application [Dataset]. https://gimi9.com/dataset/eu_5fa93b994b29f6390f150980_1
Explore at:
Description
The TousAntiCovid app TousAntiCovid is an application that allows everyone to be an actor in the fight against the epidemic. This is an additional barrier gesture that is activated at all times when you have to redouble your vigilance: at the restaurant, in the canteen, when you go to a gym, when you participate in a professional event, when there is a risk that not everyone will respect the other barrier gestures. TousAntiCovid complements the action of doctors and sickness insurance, aimed at containing the spread of the virus by stopping the chains of contamination as soon as possible. The principle is as follows: prevent, while guaranteeing anonymity, people who have been close to a person tested positive, so that they can get tested and taken care of as soon as possible. It also makes it possible to stay informed about the evolution of the epidemic and the conduct to be held and thus to remain vigilant and adopt the right actions. It allows easy access to other tools available to citizens wishing to be involved in the fight against the epidemic: DepistageCovid which gives map of nearby labs and wait times and MesConseilsCovid which provides personalised advice to protect and protect others. The installation of the TousAntiCovid app is done on a voluntary basis. Everyone is supported even if they choose not to use the app. The app is downloaded from the Apple Store and Google Play: Hello.tousanticovid.gouv.fr/ ### Description of the data This dataset informs for each day since the launch of the application on 2 June 2020: — Cumulative total of the number of registered applications minus the number of deregistrations. — Cumulative total of users notified by the application: the number of users notified by the application as risk contacts following exposure to COVID-19, since 2 June 2020. — Cumulative total of users reporting as COVID-19 cases per day: the number of users who reported as COVID-19 cases in the application, since 2 June 2020.
n
Daily United States COVID-19 Testing and Outcomes Data By State, March 7,...
data.niaid.nih.gov
datadryad.org
zip
Updated Jul 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The COVID Tracking Project at The Atlantic (2021). Daily United States COVID-19 Testing and Outcomes Data By State, March 7, 2020 to March 7, 2021 [Dataset]. http://doi.org/10.5061/dryad.9kd51c5hk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.9kd51c5hk
Dataset updated
Jul 28, 2021
Dataset provided by
.
Authors
The COVID Tracking Project at The Atlantic
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United States
Description
The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. Our dataset was in use by national and local news organizations across the United States and by research projects and agencies worldwide.

Every day, we collected data on COVID-19 testing and patient outcomes from all 50 states, 5 territories, and the District of Columbia by visiting official public health websites for those jurisdictions and entering reported values in a spreadsheet. The files in this dataset represent the entirety of our COVID-19 testing and outcomes data collection from March 7, 2020 to March 7, 2021. This dataset includes official values reported by each state on each day of antigen, antibody, and PCR test result totals; the total number of probable and confirmed cases of COVID-19; the number of people currently hospitalized, in intensive care, and on a ventilator; the total number of confirmed and probable COVID-19 deaths; and more.

Methods This dataset was compiled by about 300 volunteers with The COVID Tracking Project from official sources of state-level COVID-19 data such as websites and press conferences. Every day, a team of about a dozen available volunteers visited these official sources and recorded the publicly reported values in a shared Google Sheet, which was used as a data source to publish the full dataset each day between about 5:30pm and 7pm Eastern time. All our data came from state and territory public health authorities or official statements from state officials. We did not automatically scrape data or attempt to offer a live feed. Our data was gathered and double-checked by humans, and we emphasized accuracy and context over speed. Some data was corrected or backfilled from structured data provided by public health authorities. Additional information about our methods can be found in a series of posts at http://covidtracking.com/analysis-updates.

We offer thanks and heartfelt gratitude for the labor and sacrifice of our volunteers. Volunteers on the Data Entry, Data Quality, and Data Infrastructure teams who granted us permission to use their name publicly are listed in VOLUNTEERS.md.
Major Tech Stocks Time Series (2019-2024)
kaggle.com
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfredo (2024). Major Tech Stocks Time Series (2019-2024) [Dataset]. https://www.kaggle.com/datasets/alfredkondoro/major-tech-stocks-time-series-2019-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2024
Dataset provided by
Kaggle
Authors
Alfredo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Description

Overview:

This dataset contains the historical stock prices and related financial information for five major technology companies: Apple (AAPL), Microsoft (MSFT), Amazon (AMZN), Google (GOOGL), and Tesla (TSLA). The dataset spans a five-year period from January 1, 2019, to January 1, 2024. It includes key stock metrics such as Open, High, Low, Close, Adjusted Close, and Volume for each trading day.

Data Collection:

The data was sourced using the yfinance library in Python, which provides convenient access to historical market data from Yahoo Finance.

Contents:

The dataset contains the following columns:

Date: The trading date. Open: The opening price of the stock on that date. High: The highest price of the stock on that date. Low: The lowest price of the stock on that date. Close: The closing price of the stock on that date. Adj Close: The adjusted closing price, accounting for dividends and splits. Volume: The number of shares traded on that date. Ticker: The stock ticker symbol representing each company.
Search Engines in Germany - Market Research Report (2015-2030)
ibisworld.com
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBISWorld (2024). Search Engines in Germany - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/germany/industry/search-engines/935/
Explore at:
Dataset updated
Jun 19, 2024
Dataset authored and provided by
IBISWorld
License
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Time period covered
2014 - 2029
Area covered
Germany
Description
In the last five years, the web portal industry has recorded significant revenue growth. Industry revenue increased by an average of 3.8% per year between 2019 and 2024 and is expected to reach 12.6 billion euros in the current year. The web portal industry comprises a variety of platforms such as social networks, search engines, video platforms and email services that are used by millions of users every day. These portals enable the exchange of information and communication as well as entertainment. Web portals generate their revenue mainly through advertising, premium services and commission payments. User numbers are rising steadily as more and more people go online and everyday processes are increasingly digitalised.In 2024, industry revenue is expected to increase by 3.2 %. Although the industry is growing, it is also facing challenges, particularly in terms of data protection. Web portals are constantly collecting user data, which can lead to misuse of the collected data. The General Data Protection Regulation (GDPR) introduced in the European Union in 2018 has prompted web portal operators to review their data protection practices and amend their terms and conditions in order to avoid fines. The aim of this regulation is to improve the protection of personal data and prevent data misuse.The industry's turnover is expected to increase by an average of 3.6% per year to 15 billion euros over the next five years. Video platforms such as YouTube often generate losses despite high user numbers. The reasons for this are the high costs of operation and infrastructure as well as expenses for copyright issues and compliance. Advertising on video platforms is perceived negatively by users, but is successful when it comes to attracting attention. Politicians are debating the taxation of revenues generated by internationally operating web portals based in tax havens. Another challenge is the copying of concepts, which inhibits innovation in the industry and can lead to legal problems.
k
Inflation Drives People to Google Negative Concepts (Forecast)
kappasignal.com
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2023). Inflation Drives People to Google Negative Concepts (Forecast) [Dataset]. https://www.kappasignal.com/2023/06/inflation-drives-people-to-google.html
Explore at:
Dataset updated
Jun 11, 2023
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Inflation Drives People to Google Negative Concepts

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
Twitch Reviews [DAILY UPDATED]
kaggle.com
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashish Kumar (2024). Twitch Reviews [DAILY UPDATED] [Dataset]. https://www.kaggle.com/datasets/ashishkumarak/twitch-reviews-daily-updated/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ashish Kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The core components of this dataset are the user reviews and ratings of the Twitch App, updated every single day. Additional details such as the relevance of the reviews and the dates on which they were posted are also incorporated into the dataset.

Facebook: distribution of global audiences 2024, by age and gender

statista.com
davegsmith.com

Updated Jun 17, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon (2025). Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset updated

Jun 17, 2025

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

              Facebook connects the world

              Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
              as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

r
Data from Time Travelling with Technology: a technology-based program for...
researchdata.edu.au
Updated Oct 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Weicong; Leahy Andrew; Jones Caroline; Radnan Maddie; Weicong Li; Caroline Jones (2024). Data from Time Travelling with Technology: a technology-based program for promoting relationships and engagement in aged care [Dataset]. http://doi.org/10.26183/RB4C-SS12
Explore at:
Unique identifier
https://doi.org/10.26183/RB4C-SS12
Dataset updated
Oct 23, 2024
Dataset provided by
Western Sydney University
Authors
Li Weicong; Leahy Andrew; Jones Caroline; Radnan Maddie; Weicong Li; Caroline Jones
Time period covered
Jul 29, 2019 - Dec 2, 2021
Description
This dataset contains transcripts of conversations between elderly people and a facilitator during group reminiscence therapy sessions in a day-respite aged care facility in Sydney Australia. Each session consisted of 2-4 older adults, sometimes including family and carers, and ran for approximately 30 minutes.
Each session displayed locations of significance to the clients on a television using Google Maps and Google Street View in a program called Time Travelling with Technology (TTT). Half the sessions involved the High-Tech condition using dynamic images panning the environment and the other half the Low-Tech condition using static images.
The dataset also includes dyadic interviews between the facilitator and each individual. The interviews were carried out at initial, mid and final intervals and included discourse tasks and autobiographical discussions.
COVID19 - The New York Times
kaggle.com
zip
Updated May 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). COVID19 - The New York Times [Dataset]. https://www.kaggle.com/bigquery/covid19-nyt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site

Sample Queries

Query 1

Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.

SELECT covid19.county, covid19.state_name, total_pop AS county_population, confirmed_cases, ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000, deaths, ROUND(deaths/total_pop *100000,2) AS deaths_per_100000 FROM bigquery-public-data.covid19_nyt.us_counties covid19 JOIN bigquery-public-data.census_bureau_acs.county_2017_5yr acs ON covid19.county_fips_code = acs.geo_id WHERE date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day) AND covid19.county_fips_code != "00000" ORDER BY confirmed_cases_per_100000 desc

Query 2

How do I calculate the number of new COVID-19 cases per day? This query determines the total number of new cases in each state for each day available in the dataset SELECT b.state_name, b.date, MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases FROM (SELECT state_name AS state, state_fips_code , confirmed_cases, DATE_ADD(date, INTERVAL 1 day) AS date_shift FROM bigquery-public-data.covid19_nyt.us_states WHERE confirmed_cases + deaths > 0) a JOIN bigquery-public-data.covid19_nyt.us_states b ON a.state_fips_code = b.state_fips_code AND a.date_shift = b.date GROUP BY b.state_name, date ORDER BY date desc
Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
Snapchat Reviews
kaggle.com
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashish Kumar (2025). Snapchat Reviews [Dataset]. https://www.kaggle.com/datasets/ashishkumarak/snapchat-reviews-daily-updated/versions/375
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ashish Kumar
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The primary elements of this dataset are the reviews and ratings given by users to the SnapChat App, updated every day. Additional information such as the relevancy of each review and the posting date is also included.
Day & night temperatures, 50yrs, 1666ws, TFRecord
kaggle.com
zip
Updated Nov 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Görner (2019). Day & night temperatures, 50yrs, 1666ws, TFRecord [Dataset]. https://www.kaggle.com/datasets/mgorner/day-night-temperatures-50yrs-1666ws-tfrecord
Explore at:
zip(160157825 bytes)Available download formats
Dataset updated
Nov 9, 2019
Authors
Martin Görner
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
This dataset is a cleaned-up extract from the following public BigQuery dataset: https://console.cloud.google.com/marketplace/details/noaa-public/ghcn-d

The dataset contains daily min/max temperatures from a selection of 1666 weather stations. The data spans exactly 50 years. Missing values have been interpolated and are marked as such.

This dataset is in TFRecord format.

About the original dataset: NOAA’s Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. The data are obtained from more than 20 sources. The GHCN-Daily is an integrated database of daily climate summaries from land surface stations across the globe, and is comprised of daily climate records from over 100,000 stations in 180 countries and territories, and includes some data from every year since 1763.
GOOGLE MOBILITY DATA
kaggle.com
zip
Updated Feb 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AiswaryaRamachandran (2022). GOOGLE MOBILITY DATA [Dataset]. https://www.kaggle.com/aiswaryaramachandran/google-mobility-data
Explore at:
zip(70425096 bytes)Available download formats
Dataset updated
Feb 2, 2022
Authors
AiswaryaRamachandran
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

As global communities respond to COVID-19, we've heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps could be helpful as they make critical decisions to combat COVID-19.

These Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. (https://www.google.com/covid19/mobility/)

Content

The data contains aggregated and anonymised aggregated data per day for each country. For say accessing data for India - the files 2020_IN_Region_Mobility_Report.csv for 2020 data and 2021_IN_Region_Mobility_Report.csv. The aggregated data is not only present at country level, but also at States and district level - as given in sub_region_1 and sub_region_2.

Acknowledgements

This data from report published by Google. https://www.google.com/covid19/mobility/

Inspiration

Some Questions to answer

India is having its Second Wave and one of the major causes is considered to the election rallies held in different parts of the country. How does Mobility Impact the COVID Cases?

Comparing Mobility across different Countries

Facebook

Twitter

Click to copy link

Link copied

Cite

DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco

Data from: San Francisco Open Data

San Francisco Open Data (BigQuery Dataset)

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Mar 20, 2019

Dataset authored and provided by

DataSF

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

San Francisco

Description

Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.
This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).
This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?

Clear search

Close search

Google apps

Main menu

Data from: San Francisco Open Data

Context

Content

Acknowledgements

Inspiration

NOAA GSOD

Overview

Content

Querying BigQuery tables

Acknowledgements

Chicago Crime

Context

Content

Acknowledgements

Inspiration

Google Stock History

Context

Content

Wordle Answer Search Trends Dataset (2021–2025)

🔍 Hypothesis

Columns

🧮 Methodology

📊 Analysis

FiveThirtyEight Daily Show Guests Dataset

Content

Daily Show Guests

Context

Acknowledgements

PerCapita CO2 Footprint InDioceses FULL

Usage metrics of the TousAntiCovid application

Daily United States COVID-19 Testing and Outcomes Data By State, March 7,...

Major Tech Stocks Time Series (2019-2024)

Dataset Description

Overview:

Data Collection:

Contents:

Search Engines in Germany - Market Research Report (2015-2030)

Inflation Drives People to Google Negative Concepts (Forecast)

Inflation Drives People to Google Negative Concepts

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Twitch Reviews [DAILY UPDATED]

Facebook: distribution of global audiences 2024, by age and gender

Data from Time Travelling with Technology: a technology-based program for...

COVID19 - The New York Times

Context

Sample Queries

Query 1

Query 2

Mobile internet users worldwide 2020-2029

Snapchat Reviews

Day & night temperatures, 50yrs, 1666ws, TFRecord

GOOGLE MOBILITY DATA

Context

Content

Acknowledgements

Inspiration

Data from: San Francisco Open Data

San Francisco Open Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration