53 datasets found

Data from: San Francisco Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasf/san-francisco
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
DataSF
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
San Francisco
Description
Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.

This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.

This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).

This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

https://cloud.google.com/bigquery/public-data/sfo-311

https://cloud.google.com/bigquery/public-data/sffd-service-calls

https://cloud.google.com/bigquery/public-data/sfpd-reports

https://cloud.google.com/bigquery/public-data/sfo-trees

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?
DataForSEO Google Full (Keywords+SERP) database, historical data available
datarade.ai
.json, .csv
Updated Aug 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo
Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 17, 2023
Dataset provided by
Authors
DataForSEO
Area covered
Portugal, Bolivia (Plurinational State of), Sweden, United Kingdom, Côte d'Ivoire, Cyprus, Paraguay, Costa Rica, South Africa, Burkina Faso
Description
You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

This database is available in JSON format only.

You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.
Data from: Bellabeat Case Study
kaggle.com
Updated Jun 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia Tatum (2021). Bellabeat Case Study [Dataset]. https://www.kaggle.com/cynthiatatum/bellabeat-case-study/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Cynthia Tatum
Description
Context

Business Task: Review the data on how consumers are using non-Bellabeat smart devices to point out any trends. With the insights, analyze how those trends could be applied to one of Bellabeat’s products. Use the top usage trends for a marketing strategy to drive growth for Bellabeat.

Content

The data shows the smart device is used to track minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

Acknowledgements

Resources - Kaggle Fitbit Fitness Tracker Data by Mobius Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa https://zenodo.org/record/53894#.YMoUpnVKiP9 https://bellabeat.com/ https://www.omnicalculator.com/sports/met-minutes-per-week

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
USA Names
console.cloud.google.com
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=pt-BR&inv=1&invt=Ab4Asw (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=pt-BR
Explore at:
Dataset updated
Aug 15, 2023
Dataset provided by
Googlehttp://google.com/
Area covered
United States
Description
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Data from: arXiv Dataset
kaggle.com
Updated Jul 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cornell University (2025). arXiv Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7548853
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7548853
Dataset updated
Jul 5, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Cornell University
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About ArXiv

For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.

In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.

Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.

The dataset is freely available via Google Cloud Storage buckets (more info here). Stay tuned for weekly updates to the dataset!

ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University.

The release of this dataset was featured further in a Kaggle blog post here.

https://storage.googleapis.com/kaggle-public-downloads/arXiv.JPG" alt="">

See here for more information.

ArXiv On Kaggle

Metadata

This dataset is a mirror of the original ArXiv data. Because the full dataset is rather large (1.1TB and growing), this dataset provides only a metadata file in the json format. This file contains an entry for each paper, containing: - id: ArXiv ID (can be used to access the paper, see below) - submitter: Who submitted the paper - authors: Authors of the paper - title: Title of the paper - comments: Additional info, such as number of pages and figures - journal-ref: Information about the journal the paper was published in - doi: https://www.doi.org - abstract: The abstract of the paper - categories: Categories / tags in the ArXiv system - versions: A version history

You can access each paper directly on ArXiv using these links: - https://arxiv.org/abs/{id}: Page for this paper including its abstract and further links - https://arxiv.org/pdf/{id}: Direct link to download the PDF

Bulk access

The full set of PDFs is available for free in the GCS bucket gs://arxiv-dataset or through Google API (json documentation and xml documentation).

You can use for example gsutil to download the data to your local machine. ```

List files:

gsutil cp gs://arxiv-dataset/arxiv/

Download pdfs from March 2020:

gsutil cp gs://arxiv-dataset/arxiv/arxiv/pdf/2003/ ./a_local_directory/

Download all the source files

gsutil cp -r gs://arxiv-dataset/arxiv/ ./a_local_directory/ ```

Update Frequency

We're automatically updating the metadata as well as the GCS bucket on a weekly basis.

License

Creative Commons CC0 1.0 Universal Public Domain Dedication applies to the metadata in this dataset. See https://arxiv.org/help/license for further details and licensing on individual papers.

Acknowledgements

The original data is maintained by ArXiv, huge thanks to the team for building and maintaining this dataset.

We're using https://github.com/mattbierbaum/arxiv-public-datasets to pull the original data, thanks to Matt Bierbaum for providing this tool.
United States Census
kaggle.com
zip
Updated Apr 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Census Bureau (2018). United States Census [Dataset]. https://www.kaggle.com/census/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 17, 2018
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
US Census Bureau
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census

Content

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.

The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa

https://cloud.google.com/bigquery/public-data/us-census

Dataset Source: United States Census Bureau

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by Steve Richey from Unsplash.

Inspiration

What are the ten most populous zip codes in the US in the 2010 census?

What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?

https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png"> https://cloud.google.com/bigquery/images/census-population-map.png
Forest proximate people - 5km cutoff distance (Global - 100m)
data.amerigeoss.org
http, wmts
Updated Oct 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Food and Agriculture Organization (2022). Forest proximate people - 5km cutoff distance (Global - 100m) [Dataset]. https://data.amerigeoss.org/dataset/8ed893bd-842a-4866-a655-a0a0c02b79b5
Explore at:
http, wmtsAvailable download formats
Dataset updated
Oct 24, 2022
Dataset provided by
Food and Agriculture Organizationhttp://fao.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.

For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.

Contact points:

Maintainer: Leticia Pina

Maintainer: Sarah E., Castle

Data lineage:

The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.

References:

Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.

Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.

WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645

Online resources:

GEE asset for "Forest proximate people - 5km cutoff distance"
COVID-19 Community Mobility Reports
google.com
google.com.tr
+4more
csv, pdf
Updated Oct 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2022). COVID-19 Community Mobility Reports [Dataset]. https://www.google.com/covid19/mobility/
Explore at:
csv, pdfAvailable download formats
Dataset updated
Oct 17, 2022
Dataset authored and provided by
Googlehttp://google.com/
Description
As global communities responded to COVID-19, we heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps would be helpful as they made critical decisions to combat COVID-19. These Community Mobility Reports aimed to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.
R
RECOD.ai events dataset
redu.unicamp.br
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Repositório de Dados de Pesquisa da Unicamp (2025). RECOD.ai events dataset [Dataset]. http://doi.org/10.25824/redu/BLIYYR
Explore at:
Unique identifier
https://doi.org/10.25824/redu/BLIYYR
Dataset updated
Mar 21, 2025
Dataset provided by
Repositório de Dados de Pesquisa da Unicamp
Dataset funded by
Fundação de Amparo à Pesquisa do Estado de São Paulo
Description
Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
census-bureau-usa
kaggle.com
zip
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/bigquery/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Area covered
United States
Description
Context :

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

Dataset source

United States Census Bureau

Sample Query

SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

Terms of use

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data
Raportul privind mobilitatea privind coronavirusul (COVID-19)
data.europa.eu
Updated Nov 7, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greater London Authority (2022). Raportul privind mobilitatea privind coronavirusul (COVID-19) [Dataset]. https://data.europa.eu/data/datasets/coronavirus-covid-19-mobility-report?locale=ro
Explore at:
Dataset updated
Nov 7, 2022
Dataset authored and provided by
Greater London Authorityhttp://www.london.gov.uk/
Description
Due to changes in the collection and availability of data on COVID-19, this website will no longer be updated. The webpage will no longer be available as of 11 May 2023. On-going, reliable sources of data for COVID-19 are available via the COVID-19 dashboard and the UKHSA

GLA Covid-19 Mobility Report

Since March 2020, London has seen many different levels of restrictions - including three separate lockdowns and many other tiers/levels of restrictions, as well as easing of restrictions and even measures to actively encourage people to go to work, their high streets and local restaurants. This reports gathers data from a number of sources, including google, apple, citymapper, purple wifi and opentable to assess the extent to which these levels of restrictions have translated to a reductions in Londoners' movements.

The data behind the charts below come from different sources. None of these data represent a direct measure of how well people are adhering to the lockdown rules - nor do they provide an exhaustive data set. Rather, they are measures of different aspects of mobility, which together, offer an overall impression of how people Londoners are moving around the capital. The information is broken down by use of public transport, pedestrian activity, retail and leisure, and homeworking.

Public Transport

For the transport measures, we have included data from google, Apple, CityMapper and Transport for London. They measure different aspects of public transport usage - depending on the data source. Each of the lines in the chart below represents a percentage of a pre-pandemic baseline.

https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A15/6b096426c4c582dc9568ed4830b4226d.webp" alt="Embedded Image" />

activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Citymapper Citymapper mobility index 2021-09-05 Compares trips planned and trips taken within its app to a baseline of the four weeks from 6 Jan 2020 7.9% 28% 19% Google Google Mobility Report 2022-10-15 Location data shared by users of Android smartphones, compared time and duration of visits to locations to the median values on the same day of the week in the five weeks from 3 Jan 2020 20.4% 40% 27% TfL Bus Transport for London 2022-10-30 Bus journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 34% 24% TfL Tube Transport for London 2022-10-30 Tube journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 30% 21% Pedestrian activity

With the data we currently have it's harder to estimate pedestrian activity and high street busyness. A few indicators can give us information on how people are making trips out of the house:

https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A15/bcf082c07e4d7ff5202012f0a97abc3a.webp" alt="Embedded Image" />

activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Walking Apple Mobility Index 2021-11-09 estimates the frequency of trips made on foot compared to baselie of 13 Jan '20 22% 47% 36% Parks Google Mobility Report 2022-10-15 Frequency of trips to parks. Changes in the weather mean this varies a lot. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail & Rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail and recreation

In this section, we focus on estimated footfall to shops, restaurants, cafes, shopping centres and so on.

https://cdn.datapress.cloud/london/img/dataset/60e5834b-68aa-48d7-a8c5-7ee4781bde05/2025-06-09T20%3A54%3A16/b62d60f723eaafe64a989e4afec4c62b.webp" alt="Embedded Image" />

activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Grocery/pharmacy Google Mobility Report 2022-10-15 Estimates frequency of trips to grovery shops and pharmacies. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Retail/rec <a href="https://ww
San Francisco Ford GoBike Share
console.cloud.google.com
Updated Jul 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:City%20and%20County%20of%20San%20Francisco&inv=1&invt=Ab4A3w (2020). San Francisco Ford GoBike Share [Dataset]. https://console.cloud.google.com/marketplace/product/san-francisco-public-data/sf-bike-share
Explore at:
Dataset updated
Jul 10, 2020
Dataset provided by
Googlehttp://google.com/
Area covered
San Francisco
Description
San Francisco Ford GoBike , managed by Motivate, provides the Bay Area’s bike share system. Bike share is a convenient, healthy, affordable, and fun form of transportation. It involves a fleet of specially designed bikes that are locked into a network of docking stations. Bikes can be unlocked from one station and returned to any other station in the system. People use bike share to commute to work or school, run errands, get to appointments, and more. The dataset contains trip data from 2013-2018, including start time, end time, start station, end station, and latitude/longitude for each station. See detailed metadata for historical and real-time data . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

ref_coco

tensorflow.org
opendatalab.com

Updated May 31, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). ref_coco [Dataset]. https://www.tensorflow.org/datasets/catalog/ref_coco

Explore at:

Dataset updated

May 31, 2024

Description

A collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.

RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.

Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".

Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):

dataset	partition	split	refs	images
refcoco	google	train	40000	19213
refcoco	google	val	5000	4559
refcoco	google	test	5000	4527
refcoco	unc	train	42404	16994
refcoco	unc	val	3811	1500
refcoco	unc	testA	1975	750
refcoco	unc	testB	1810	750
refcoco+	unc	train	42278	16992
refcoco+	unc	val	3805	1500
refcoco+	unc	testA	1975	750
refcoco+	unc	testB	1798	750
refcocog	google	train	44822	24698
refcocog	google	val	5000	4650
refcocog	umd	train	42226	21899
refcocog	umd	val	2573	1300
refcocog	umd	test	5023	2600

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('ref_coco', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/ref_coco-refcoco_unc-1.1.0.png" alt="Visualization" width="500px">

A
‘FAANG- Complete Stock Data’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘FAANG- Complete Stock Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-faang-complete-stock-data-53b0/latest
Explore at:
Dataset updated
Sep 20, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘FAANG- Complete Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

There are a few companies that are considered to be revolutionary. These companies also happen to be a dream place to work at for many many people across the world. These companies include - Facebook,Amazon,Apple,Netflix and Google also known as FAANG! These companies make ton of money and they help others too by giving them a chance to invest in the companies via stocks and shares. This data wass made targeting these stock prices.

Content

The data contains information such as opening price of a stock, closing price, how much of these stocks were sold and many more things. There are 5 different CSV files in the data for each company.

--- Original source retains full ownership of the source dataset ---
R
Accident Detection Model Dataset
universe.roboflow.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Accident detection model
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accident Bounding Boxes
Description
Accident-Detection-Model

Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

Problem Statement

Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.

According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.

The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

Accidents survey

https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

Literature Survey

Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.

Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

Research Gap

Lack of real-world data - We trained model for more then 3200 images.

Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.

Outdated Versions of previous works - We aer using Latest version of Yolo v8.

Proposed methodology

We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.

This model after training with 25 iterations and is ready to detect an accident with a significant probability.

Model Set-up

Preparing Custom dataset

We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.

Then we annotated all of them individually on a tool called roboflow.

During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident

Then we divided the data set into train, val, test in the ratio of 8:1:1

At the final step we downloaded the dataset in yolov8 format.
#### Using Google Collab

We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.

You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.

Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.

In Google collab, First of all we Changed runtime from TPU to GPU.

We cross checked it by running command ‘!nvidia-smi’
#### Coding

First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’

Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’

Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’

Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’

After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’

Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’

The results are stored in the runs/detect/predict folder.
Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

Challenges I ran into

I majorly ran into 3 problems while making this model

I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.

I was facing problem on cvat website because i was not sure what
USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
IoTeX Cryptocurrency
console.cloud.google.com
Updated Aug 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Cloud%20Public%20Datasets%20-%20Finance&hl=fr&inv=1&invt=Ab3rIQ (2023). IoTeX Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-iotex-dataset?hl=fr
Explore at:
Dataset updated
Aug 24, 2023
Dataset provided by
Googlehttp://google.com/
Description
IoTeX is a decentralized crypto system, a new generation of blockchain platform for the development of the Internet of things (IoT). The project team is sure that the users do not have such an application that would motivate to implement the technology of the Internet of things in life. And while this will not be created, people will not have the desire to spend money and time on IoT. The developers of IoTeX decided to implement not the application itself, but the platform for creation. It is through the platform that innovative steps in the space of the Internet of things will be encouraged. Learn more... This dataset is one of many crypto datasets that are available within the Google Cloud Public Datasets . As with other Google Cloud public datasets, you can query this dataset for free, up to 1TB/month of free processing, every month. Watch this short video to learn how to get started with the public datasets. Want to know how the data from these blockchains were brought into BigQuery, and learn how to analyze the data? En savoir plus
f
Travel time to cities and ports in the year 2015
figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Nelson (2023). Travel time to cities and ports in the year 2015 [Dataset]. http://doi.org/10.6084/m9.figshare.7638134.v4
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7638134.v4
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Andy Nelson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5

If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD

The following text is a summary of the information in the above Data Descriptor.

The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.

The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.

These maps represent a unique global representation of physical access to essential services offered by cities and ports.

The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).

travel_time_to_ports_x (x ranges from 1 to 5)

The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.

Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes

Data type Byte (16 bit Unsigned Integer)

No data value 65535

Flags None

Spatial resolution 30 arc seconds

Spatial extent

Upper left -180, 85

Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)

Temporal resolution 2015

Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.

Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.

The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.

Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points

The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).

Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.

Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.

This process and results are included in the validation zip file.

Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.

The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.

The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.

The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.
NPPES Plan and Provider Enumeration System
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2019). NPPES Plan and Provider Enumeration System [Dataset]. https://www.kaggle.com/cms/nppes
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
Centers for Medicare & Medicaid Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.

Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.

Content

The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:nppes?_ga=2.117120578.-577194880.1523455401

https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research

Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @rawpixel from Unplash.

Inspiration

What are the top ten most common types of physicians in Mountain View?

What are the names and phone numbers of dentists in California who studied public health?
American Community Survey (ACS)
console.cloud.google.com
Updated Jul 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&inv=1&invt=Abyneg (2018). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs
Explore at:
Dataset updated
Jul 16, 2018
Dataset provided by
Googlehttp://google.com/
Description
The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Facebook

Twitter

Click to copy link

Link copied

Cite

DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasf/san-francisco

Data from: San Francisco Open Data

San Francisco Open Data (BigQuery Dataset)

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Mar 20, 2019

Dataset authored and provided by

DataSF

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

San Francisco

Description

Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.
This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).
This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?

Clear search

Close search

Google apps

Main menu

Data from: San Francisco Open Data

Context

Content

Acknowledgements

Inspiration

DataForSEO Google Full (Keywords+SERP) database, historical data available

Data from: Bellabeat Case Study

Context

Content

Acknowledgements

Inspiration

USA Names

Data from: arXiv Dataset

About ArXiv

ArXiv On Kaggle

Metadata

Bulk access

List files:

Download pdfs from March 2020:

Download all the source files

Update Frequency

License

Acknowledgements

United States Census

Context

Content

Acknowledgements

Inspiration

Forest proximate people - 5km cutoff distance (Global - 100m)

COVID-19 Community Mobility Reports

RECOD.ai events dataset

census-bureau-usa

Context :

Dataset source

Sample Query

Terms of use

Raportul privind mobilitatea privind coronavirusul (COVID-19)

GLA Covid-19 Mobility Report

Public Transport

San Francisco Ford GoBike Share

ref_coco

‘FAANG- Complete Stock Data’ analyzed by Analyst-2

Context

Content

Accident Detection Model Dataset

Accident-Detection-Model

Problem Statement

Accidents survey

Literature Survey

Research Gap

Proposed methodology

Model Set-up

Preparing Custom dataset

Challenges I ran into

I majorly ran into 3 problems while making this model

USA Name Data

Context

Content

Acknowledgements

Inspiration

IoTeX Cryptocurrency

Travel time to cities and ports in the year 2015

NPPES Plan and Provider Enumeration System

Context

Content

Acknowledgements

Inspiration

American Community Survey (ACS)

Data from: San Francisco Open Data

San Francisco Open Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration