76 datasets found

Website Traffic
kaggle.com
zip
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic/discussion
Explore at:
zip(65228 bytes)Available download formats
Dataset updated
Aug 5, 2024
Authors
AnthonyTherrien
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview

This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

Dataset Description

Page Views: The number of pages viewed during a session.

Session Duration: The total duration of the session in minutes.

Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.

Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).

Time on Page: The amount of time spent on the specific page.

Previous Visits: The number of previous visits by the same visitor.

Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

Data Summary

Total Records: 2000

Total Features: 7

Key Features

Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.

Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.

Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.

Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.

Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.

Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.

Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

Usage

This dataset can be used for various analyses such as:

Identifying key drivers of engagement and conversion.

Analyzing the effectiveness of different traffic sources.

Understanding user behavior patterns and optimizing the website accordingly.

Improving marketing strategies based on traffic source performance.

Enhancing user experience by analyzing time spent on different pages.

Acknowledgments

This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.
Daily website visitors (time series regression)
kaggle.com
zip
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors
Explore at:
zip(35736 bytes)Available download formats
Dataset updated
Aug 20, 2020
Authors
Bob Nau
Description
Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
d
Website Analytics
catalog.data.gov
data.nola.gov
+4more
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nola.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
data.nola.gov
Description
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=en_GB (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=en_GB
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
r
Walmart.com Daily Traffic Statistics 2025
redstagfulfillment.com
html
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Stag Fulfillment (2025). Walmart.com Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-walmart-receive/
Explore at:
htmlAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Red Stag Fulfillment
Time period covered
2020 - 2025
Area covered
United States
Variables measured
Daily website visits, Session duration metrics, Traffic source breakdown, Geographic traffic patterns, Seasonal traffic variations, Mobile vs desktop traffic distribution
Description
Comprehensive dataset analyzing Walmart.com's daily website traffic, including 16.7 million daily visits, device distribution, geographic patterns, and competitive benchmarking data.
Website Metrics
catalog.data.gov
datasets.ai
+2more
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FEMA/Office of External Affairs/Communication Division (2025). Website Metrics [Dataset]. https://catalog.data.gov/dataset/website-metrics
Explore at:
Dataset updated
Jun 7, 2025
Dataset provided by
Federal Emergency Management Agencyhttp://www.fema.gov/
Description
Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.
website_visit_webalizer
kaggle.com
zip
Updated Mar 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin ÇOBAN (2024). website_visit_webalizer [Dataset]. https://www.kaggle.com/datasets/erinoban/website-visit-webalizer
Explore at:
zip(1082 bytes)Available download formats
Dataset updated
Mar 24, 2024
Authors
Erin ÇOBAN
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset was obtained from website visit data. These are real data. It contains monthly visit information of the tr-metaverse.com website hosted on Linux. Day Hit Hit% Files Files% Pages Pages% Visit Visit% Sites Sites% Kbytes Kbytes% It consists of fields. Values with a % sign next to them are numbers in percent. 30-day visit data from the beginning of the month to the end of the month. Day: Day index number, which day of the month Hit: How much reach there is in general Hit%: How much access there is overall in percentage Files: How many visits have been made as files Files%: Percentage in files Pages Pages% Visit: Number of unique visitors Visit%: Unique visitor rate sites sites% Kbytes: how much data has been downloaded Kbytes%: percentage in data
r
Amazon Daily Traffic Statistics 2025
redstagfulfillment.com
html
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Stag Fulfillment (2025). Amazon Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-amazon-receive/
Explore at:
htmlAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Red Stag Fulfillment
Time period covered
2019 - 2025
Area covered
Global
Variables measured
Daily website visits, Monthly traffic volume, Geographic distribution, Seasonal traffic patterns, Traffic sources breakdown, Mobile vs desktop traffic split
Description
Comprehensive dataset analyzing Amazon's daily website visits, traffic patterns, seasonal trends, and comparative analysis with other ecommerce platforms based on May 2025 data.
Collections (from American Folklife Center)
zenodo.org
csv
Updated Nov 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Egan; Patrick Egan (2024). Collections (from American Folklife Center) [Dataset]. http://doi.org/10.5281/zenodo.14140570
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14140570
Dataset updated
Nov 13, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Egan; Patrick Egan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2019
Description
Dataset originally created 03/01/2019 UPDATE: Packaged on 04/18/2019 UPDATE: Edited README on 04/18/2019

I. About this Data Set This data set is a snapshot of work that is ongoing as a collaboration between Kluge Fellow in Digital Studies, Patrick Egan and an intern at the Library of Congress in the American Folklife Center. It contains a combination of metadata from various collections that contain audio recordings of Irish traditional music. The development of this dataset is iterative, and it integrates visualizations that follow the key principles of trust and approachability. The project, entitled, “Connections In Sound” invites you to use and re-use this data.

The text available in the Items dataset is generated from multiple collections of audio material that were discovered at the American Folklife Center. Each instance of a performance was listed and “sets” or medleys of tunes or songs were split into distinct instances in order to allow machines to read each title separately (whilst still noting that they were part of a group of tunes). The work of the intern was then reviewed before publication, and cross-referenced with the tune index at www.irishtune.info. The Items dataset consists of just over 1000 rows, with new data being added daily in a separate file.

The collections dataset contains at least 37 rows of collections that were located by a reference librarian at the American Folklife Center. This search was complemented by searches of the collections by the scholar both on the internet at https://catalog.loc.gov and by using card catalogs.

Updates to these datasets will be announced and published as the project progresses.

II. What’s included? This data set includes:

The Items Dataset – a .CSV containing Media Note, OriginalFormat, On Website, Collection Ref, Missing In Duplication, Collection, Outside Link, Performer, Solo/multiple, Sub-item, type of tune, Tune, Position, Location, State, Date, Notes/Composer, Potential Linked Data, Instrument, Additional Notes, Tune Cleanup. This .CSV is the direct export of the Items Google Spreadsheet

III. How Was It Created? These data were created by a Kluge Fellow in Digital Studies and an intern on this program over the course of three months. By listening, transcribing, reviewing, and tagging audio recordings, these scholars improve access and connect sounds in the American Folklife Collections by focusing on Irish traditional music. Once transcribed and tagged, information in these datasets is reviewed before publication.

IV. Data Set Field Descriptions

IV

a) Collections dataset field descriptions

ItemId – this is the identifier for the collection that was found at the AFC
Viewed – if the collection has been viewed, or accessed in any way by the researchers.
On LOC – whether or not there are audio recordings of this collection available on the Library of Congress website.
On Other Website – if any of the recordings in this collection are available elsewhere on the internet
Original Format – the format that was used during the creation of the recordings that were found within each collection
Search – this indicates the type of search that was performed in order that resulted in locating recordings and collections within the AFC
Collection – the official title for the collection as noted on the Library of Congress website
State – The primary state where recordings from the collection were located
Other States – The secondary states where recordings from the collection were located
Era / Date – The decade or year associated with each collection
Call Number – This is the official reference number that is used to locate the collections, both in the urls used on the Library website, and in the reference search for catalog cards (catalog cards can be searched at this address: https://memory.loc.gov/diglib/ihas/html/afccards/afccards-home.html)
Finding Aid Online? – Whether or not a finding aid is available for this collection on the internet

b) Items dataset field descriptions

id – the specific identification of the instance of a tune, song or dance within the dataset
Media Note – Any information that is included with the original format, such as identification, name of physical item, additional metadata written on the physical item
Original Format – The physical format that was used when recording each specific performance. Note: this field is used in order to calculate the number of physical items that were created in each collection such as 32 wax cylinders.
On Webste? – Whether or not each instance of a performance is available on the Library of Congress website
Collection Ref – The official reference number of the collection
Missing In Duplication – This column marks if parts of some recordings had been made available on other websites, but not all of the recordings were included in duplication (see recordings from Philadelphia Céilí Group on Villanova University website)
Collection – The official title of the collection given by the American Folklife Center
Outside Link – If recordings are available on other websites externally
Performer – The name of the contributor(s)
Solo/multiple – This field is used to calculate the amount of solo performers vs group performers in each collection
Sub-item – In some cases, physical recordings contained extra details, the sub-item column was used to denote these details
Type of item – This column describes each individual item type, as noted by performers and collectors
Item – The item title, as noted by performers and collectors. If an item was not described, it was entered as “unidentified”
Position – The position on the recording (in some cases during playback, audio cassette player counter markers were used)
Location – Local address of the recording
State – The state where the recording was made
Date – The date that the recording was made
Notes/Composer – The stated composer or source of the item recorded
Potential Linked Data – If items may be linked to other recordings or data, this column was used to provide examples of potential relationships between them
Instrument – The instrument(s) that was used during the performance
Additional Notes – Notes about the process of capturing, transcribing and tagging recordings (for researcher and intern collaboration purposes)
Tune Cleanup – This column was used to tidy each item so that it could be read by machines, but also so that spelling mistakes from the Item column could be corrected, and as an aid to preserving iterations of the editing process

V. Rights statement The text in this data set was created by the researcher and intern and can be used in many different ways under creative commons with attribution. All contributions to Connections In Sound are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creators of these datasets.

VI. Creator and Contributor Information

Creator: Connections In Sound

Contributors: Library of Congress Labs

VII. Contact Information Please direct all questions and comments to Patrick Egan via www.twitter.com/drpatrickegan or via his website at www.patrickegan.org. You can also get in touch with the Library of Congress Labs team via LC-Labs@loc.gov.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
RÉ Logs Dataset
zenodo.org
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mac Aodhgáin Pádraig; Mac Aodhgáin Pádraig (2025). RÉ Logs Dataset [Dataset]. http://doi.org/10.5281/zenodo.17249231
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.17249231
Dataset updated
Oct 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mac Aodhgáin Pádraig; Mac Aodhgáin Pádraig
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Collation of data from Radio Éireann log books, at RTÉ, Donnybrook, Dublin 4.

Dataset originally created 2016 UPDATE: Packaged on 02/10/2025

I. About this Data Set

This data set is a result of close reading conducted by Patrick Egan (Pádraig Mac Aodhgáin) at Radio Teilifís Éireann log books relating to Seán Ó Riada.

Research was conducted between 2014-2018. It contains a combination of metadata from searches of the Boole Library catalogue and Seán Ó Riada Collection finding aid (or "descriptive list"), relating to music-related projects that were involving Seán Ó Riada. The PhD project was published in 2020, entitled, “Exploring ethnography and digital visualisation: a study of musical practice through the contextualisation of music related projects from the Seán Ó Riada Collection”, and a full listing of radio broadcasts is added to the dataset named "The Ó Riada Projects" at https://doi.org/10.5281/zenodo.15348617

You are invited to use and re-use this data with appropriate attribution.

The "RÉ Logs Dataset" dataset consists of 90 rows.

II. What’s included? This data set includes:

A search of log books of radio broadcasts to find all instances of shows that involved Seán Ó Riada.

III. How Was It Created? These data were created by daily visits to Radio Teilifís Éireann in Dublin, Ireland.

IV. Data Set Field Descriptions

Column headings have not been added to the dataset.

Column A - blank
Column B - type of broadcast
Column C - blank
Column D - date of broadcast
Column E - blank
Column F - blank
Column G - blank
Column H - blank
Column I - description of broadcast
Column J - blank
Column K - blank
Column J - length of broadcast

V. Rights statement The text in this data set was created by the researcher and can be used in many different ways under creative commons with attribution. All contributions to this PhD project are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creator of this dataset.

VI. Creator and Contributor Information

Creator: Patrick Egan (Pádraig Mac Aodhgáin)

VII. Contact Information Please direct all questions and comments to Patrick Egan via his website at www.patrickegan.org. You can also get in touch with the Library via UCC website.
d
Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
Latvia, Uzbekistan, Liechtenstein, Russian Federation, Jordan, India, Belarus, Jamaica, Monaco, Saint Vincent and the Grenadines
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
a
Traffic Crashes Resulting in Injury (from DataSF, pulled monthly)
hub.arcgis.com
Updated Nov 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City and County of San Francisco (2025). Traffic Crashes Resulting in Injury (from DataSF, pulled monthly) [Dataset]. https://hub.arcgis.com/maps/a24788281a484e08bd662828b4e0718e
Explore at:
Dataset updated
Nov 5, 2025
Dataset authored and provided by
City and County of San Francisco
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
Redirect Notice: The website https://transbase.sfgov.org/ is no longer in operation. Visitors to Transbase will be redirected to this page where they can view, visualize, and download Traffic Crash data.A. SUMMARYThis table contains all crashes resulting in an injury in the City of San Francisco. Fatality year-to-date crash data is obtained from the Office of the Chief Medical Examiner (OME) death records, and only includes those cases that meet the San Francisco Vision Zero Fatality Protocol maintained by the San Francisco Department of Public Health (SFDPH), San Francisco Police Department (SFPD), and San Francisco Municipal Transportation Agency (SFMTA). Injury crash data is obtained from SFPD’s Interim Collision System for 2018 through the current year-to-date, Crossroads Software Traffic Collision Database (CR) for years 2013-2017 and the Statewide Integrated Transportation Record System (SWITRS) maintained by the California Highway Patrol for all years prior to 2013. Only crashes with valid geographic information are mapped. All geocodable crash data is represented on the simplified San Francisco street centerline model maintained by the Department of Public Works (SFDPW). Collision injury data is queried and aggregated on a quarterly basis. Crashes occurring at complex intersections with multiple roadways are mapped onto a single point and injury and fatality crashes occurring on highways are excluded.The crash, party, and victim tables have a relational structure. The traffic crashes table contains information on each crash, one record per crash. The party table contains information from all parties involved in the crashes, one record per party. Parties are individuals involved in a traffic crash including drivers, pedestrians, bicyclists, and parked vehicles. The victim table contains information about each party injured in the collision, including any passengers. Injury severity is included in the victim table. For example, a crash occurs (1 record in the crash table) that involves a driver party and a pedestrian party (2 records in the party table). Only the pedestrian is injured and thus is the only victim (1 record in the victim table). To learn more about the traffic injury datasets, see the TIMS documentationB. HOW THE DATASET IS CREATEDTraffic crash injury data is collected from the California Highway Patrol 555 Crash Report as submitted by the police officer within 30 days after the crash occurred. All fields that match the SWITRS data schema are programmatically extracted, de-identified, geocoded, and loaded into TransBASE. See Section D below for details regarding TransBASE. C. UPDATE PROCESSAfter review by SFPD and SFDPH staff, the data is made publicly available approximately a month after the end of the previous quarter (May for Q1, August for Q2, November for Q3, and February for Q4). D. HOW TO USE THIS DATASETThis data is being provided as public information as defined under San Francisco and California public records laws. SFDPH, SFMTA, and SFPD cannot limit or restrict the use of this data or its interpretation by other parties in any way. Where the data is communicated, distributed, reproduced, mapped, or used in any other way, the user should acknowledge TransBASE.sfgov.org as the source of the data, provide a reference to the original data source where also applicable, include the date the data was pulled, and note any caveats specified in the associated metadata documentation provided. However, users should not attribute their analysis or interpretation of this data to the City of San Francisco. While the data has been collected and/or produced for the use of the City of San Francisco, it cannot guarantee its accuracy or completeness. Accordingly, the City of San Francisco, including SFDPH, SFMTA, and SFPD make no representation as to the accuracy of the information or its suitability for any purpose and disclaim any liability for omissions or errors that may be contained therein. As all data is associated with methodological assumptions and limitations, the City recommends that users review methodological documentation associated with the data prior to its analysis, interpretation, or communication.This dataset can also be queried on the TransBASE Dashboard. TransBASE is a geospatially enabled database maintained by SFDPH that currently includes over 200 spatially referenced variables from multiple agencies and across a range of geographic scales, including infrastructure, transportation, zoning, sociodemographic, and collision data, all linked to an intersection or street segment. TransBASE facilitates a data-driven approach to understanding and addressing transportation-related health issues,informed by a large and growing evidence base regarding the importance of transportation system design and land use decisions for health. TransBASE’s purpose is to inform public and private efforts to improve transportation system safety, sustainability, community health and equity in San Francisco.E. RELATED DATASETSTraffic Crashes Resulting in Injury: Parties InvolvedTraffic Crashes Resulting in Injury: Victims InvolvedTransBASE DashboardiSWITRSTIMSData pushed to ArcGIS Online on November 5, 2025 at 4:19 PM by SFGIS.Data from: https://data.sfgov.org/d/ubvf-ztfxDescription of dataset columns:

unique_id unique table row identifier cnn_intrsctn_fkey nearest intersection centerline node key cnn_sgmt_fkey nearest street centerline segment key (empty if crash occurred at intersection) case_id_pkey unique crash report number tb_latitude latitude of crash (WGS 84) tb_longitude longitude of crash (WGS 84) geocode_source geocode source geocode_location geocode location collision_datetime the date and time when the crash occurred collision_date the date when the crash occurred collision_time the time when the crash occurred (24 hour time) accident_year the year when the crash occurred month month crash occurred day_of_week day of the week crash occurred time_cat generic time categories juris jurisdiction officer_id officer ID reporting_district SFPD reporting district beat_number SFPD beat number primary_rd the road the crash occurred on secondary_rd a secondary reference road that DISTANCE and DIRECT are measured from distance offset distance from secondary road direction direction of offset distance weather_1 the weather condition at the time of the crash weather_2 the weather condition at the time of the crash, if a second description is necessary collision_severity the injury level severity of the crash (highest level of injury in crash) type_of_collision type of crash mviw motor vehicle involved with ped_action pedestrian action involved road_surface road surface road_cond_1 road condition road_cond_2 road condition, if a second description is necessary lighting lighting at time of crash control_device control device status intersection indicates whether the crash occurred in an intersection vz_pcf_code California vehicle code primary collision factor violated vz_pcf_group groupings of similar vehicle codes violated vz_pcf_description description of vehicle code violated vz_pcf_link link to California vehicle code section number_killed counts victims in the crash with degree of injury of fatal number_injured counts victims in the crash with degree of injury of severe, visible, or complaint of pain street_view link to Google Streetview dph_col_grp generic crash groupings based on parties involved dph_col_grp_description description of crash groupings party_at_fault party number indicated as being at fault party1_type party 1 vehicle type party1_dir_of_travel party 1 direction of travel party1_move_pre_acc party 1 movement preceding crash party2_type party 2 vehicle type (empty if no party 2) party2_dir_of_travel party 2 direction of travel (empty if no party 2) party2_move_pre_acc party 2 movement preceding crash (empty if no party 2) point geometry type of crash location data_as_of date data added to the source system data_updated_at date data last updated the source system data_loaded_at date data last loaded here (in the open data portal) analysis_neighborhood supervisor_district police_district Current Police Districts This column was automatically created in order to record in what polygon from the dataset 'Current Police Districts' (qgnn-b9vv) the point in column 'point' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens. Current Supervisor Districts This column was automatically created in order to record in what polygon from the dataset 'Current Supervisor Districts' (26cr-cadq) the point in column 'point' is located. This
z
Data from: NeuroSense: A Novel EEG Dataset Utilizing Low-Cost, Sparse...
zenodo.org
data-staging.niaid.nih.gov
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommaso Colafiglio; Tommaso Colafiglio; Angela Lombardi; Angela Lombardi; Paolo Sorino; Paolo Sorino; Elvira Brattico; Elvira Brattico; Domenico Lofù; Domenico Lofù; Danilo Danese; Danilo Danese; Eugenio Di Sciascio; Eugenio Di Sciascio; Tommaso Di Noia; Tommaso Di Noia; Fedelucio Narducci; Fedelucio Narducci (2024). NeuroSense: A Novel EEG Dataset Utilizing Low-Cost, Sparse Electrode Devices for Emotion Exploration [Dataset]. http://doi.org/10.5281/zenodo.14003181
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14003181
Dataset updated
Oct 30, 2024
Dataset provided by
Zenodo
Authors
Tommaso Colafiglio; Tommaso Colafiglio; Angela Lombardi; Angela Lombardi; Paolo Sorino; Paolo Sorino; Elvira Brattico; Elvira Brattico; Domenico Lofù; Domenico Lofù; Danilo Danese; Danilo Danese; Eugenio Di Sciascio; Eugenio Di Sciascio; Tommaso Di Noia; Tommaso Di Noia; Fedelucio Narducci; Fedelucio Narducci
Time period covered
Oct 28, 2024
Description
README

Link to the Publication

🔗 Read the Paper

Details related to access to the data

Data user agreement

The terms and conditions for using this dataset are specified in the [LICENCE](LICENCE) file included in this repository. Please review these terms carefully before accessing or using the data.

Contact person

For additional information about the dataset, please contact:
- Name: Angela Lombardi
- Affiliation: Department of Electrical and Information Engineering, Politecnico di Bari
- Email: angela.lombardi@poliba.it

Practical information to access the data

The dataset can be accessed through our dedicated web platform. To request access:

1. Visit the main dataset page at: https://sisinflab.poliba.it/neurosense-dataset-request/
2. Follow the instructions on the website to submit your access request
3. Upon approval, you will receive further instructions for downloading the data

Please ensure you have read and agreed to the terms in the data user agreement before requesting access.

Overview

EEG Emotion Recognition - Muse Headset
2023-2024

The experiment consists in 40 sessions per user. During each session, users are asked to watch a
music video with the aim to understand their emotions.
Recordings are performed with a Muse EEG headset at a 256 Hz sampling rate.
Channels are recorded as follows:
- Channel 0: AF7
- Channel 1: TP9
- Channel 2: TP10
- Channel 3: AF8

The chosen songs have various Last.fm tags in order to create different feelings. The title of every track
can be found in the "TaskName" field of sub-ID***_ses-S***_task-Default_run-001_eeg.json, while the author,
the Last.fm tag and additional information in "TaskDescription".

Methods

Subjects

The subject pool is made of 30 college students, aged between 18 and 35. 16 of them are males, 14 females.

Apparatus

The experiment was performed using the same procedures as those to create
[Deap Dataset](https://www.eecs.qmul.ac.uk/mmv/datasets/deap/), which is a dataset to recognize emotions via a Brain
Computer Interface (BCI).

Task organization

Firstly, music videos were selected. Once 40 songs were picked, the protocol was chosen and the self-assessment
questionnaire was created.

Task details

In order to evaluate the stimulus, Russell's VAD (Valence-Arousal-Dominance) scale was used.
In this scale, valenza-arousal space can be divided in four quadrants:
- Low Arousal/Low Valence (LALV);
- Low Arousal/High Valence (LAHV);
- High Arousal/Low Valence (HALV);
- High Arousal/High Valence (HAHV).

Experimental location

The experiment was performed in a laboratory located at DEI Department of
[Politecnico di Bari](https://www.poliba.it/).

Missing data

Data recorded during session S019 - Session 2, ID021 - Session 23, user was corrupted, therefore is missing.
Sessions S033 and S038 of ID015 user show a calculated effective sampling rate lower than 256 Hz:
- ID015_ses-S033 has 226.1320 Hz
- ID015_ses-S038 has 216.9549 Hz
mCLOUD Metadatenkatalog
ckan.mobidatalab.eu
data.europa.eu
Updated Mar 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bundesministerium für Digitales und Verkehr (BMDV) (2023). mCLOUD Metadatenkatalog [Dataset]. https://ckan.mobidatalab.eu/dataset/mcloud-metadatenkatalog
Explore at:
Dataset updated
Mar 6, 2023
Dataset provided by
Federal Ministry of Transport and Digital Infrastructurehttp://www.bmvi.de/
License
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Description
The BMDV open data portal mCLOUD offers a Export interface (REST-API) via the data as RDF according to the DCAT-AP.de specification or can be exported as CSV.

Export as DCAT-AP.de in RDF/XML:
Basic path: https://mcloud.de/export/datasets
Export as CSV:
Basic path: https://mcloud.de/export/csv/datasets
Parameters:

The parameters in the requests are based on the parameters in the portal for a remote search (URL).
At the end of a hit page in the portal, the export is always offered. One possibility is to search normally via the portal and then copy the export URL at the end of a page.

Single data record
A single data record can be retrieved by appending the UUID.
E.g. https://mcloud.de/export/datasets/922e436b- 2f0d-42d7-b3f4-528debab8b87
This export is directly available in the mCLOUD in the data record as a "link to the metadata".
Predefined filters:

All data sets added in the last 24 hours:
filter=newdatasets
https://mcloud.de/export/datasets?filter=newdatasets

All data sets that were changed in the last 24 hours (also includes newly added records):
filter=modifieddatasets
https://mcloud.de/export/datasets?filter=modifieddatasets

Paging (default):

pageSize=10 (number of sentences on one page)
page=1 (display first page)
https://mcloud.de/export/datasets?page=1&pageSize=10

Im DCAT-AP.de export always includes navigation information at the beginning:
itemsPerPage (= pageSize parameter)
totalItems (total number)
firstPage (= first page for page parameter)
lastPage (= last page for page parameter)

Search term:
< i>query=Vehicle
https://mcloud.de/export/ datasets?query=Vehicle
Search facet:
aggs=...
The facet is then specified exactly as in the portal request. Please note the coding:
format%3ACSV = type of access "CSV"
categories%3Aroads = category "road"
format%3ACSV%40%40categories%3Aroads = type of access "CSV" AND category "road"

Together:
aggs=format%3ACSV %40%40categories%3Aroads
https ://mcloud.de/export/datasets?aggs=format%3ACSV%40%40categories%3Aroads

Here is the search in the portal, you can use this as a guide:
https://mcloud .de/web/guest/suche/-/results/filter/auto/format%3ACSV%40%40categories%3Aroads/0
At the end of the page there is also the link (as RDF):
https://mcloud.de/export/datasets?page=1&pageSize=1147&sortOrder=desc&sortField=latest&aggs=format%3ACSV%40%40categories%3Aroads
Sorting field:
No information sorted by ID of the data records
sortField=relevance (relevance)
sortField=latest (current)
Sort order:
sortOrder=asc (ascending, default)
sortOrder=desc (descending)
Z
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Long, Phillip; Novack, Zachary; McAuley, Julian; Berg-Kirkpatrick, Taylor (2025). PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_13763755
Explore at:
Dataset updated
Mar 17, 2025
Dataset provided by
University of California, San Diego
UCSD
Authors
Long, Phillip; Novack, Zachary; McAuley, Julian; Berg-Kirkpatrick, Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing. Refer to our paper for more information, and our GitHub repository for any code-related details. Please cite both our paper and our collaborators' paper if you use this dataset (see our GitHub for more information).

Upon further use of the PDMX dataset, we discovered a discrepancy between the public-facing copyright metadata on the MuseScore website and the internal copyright data of the MuseScore files themselves, which affected 31,221 (12.29% of) songs. We have decided to proceed with the former given its public visibility on Musescore (i.e. this is what the MuseScore website presents its users with). We have noted files with conflicting internal licenses in the license_conflict column of PDMX. We recommend using the no_license_conflict subset of PDMX (which still includes 222,856 songs) moving forward.

Additionally, for each song in PDMX, we not only provide the MusicRender and metadata JSON files, but we also try to include the associated compressed MusicXML (MXL), sheet music (PDF), and MIDI (MID) files when available. Due to the corruption of 42 of the original MuseScore files, these songs lack those associated files (since they could not be converted to those formats) and only include the MusicRender and metadata JSON files. The all_valid subset of PDMX describes the songs where all associated files are valid.

Recipe Site Traffic: Analysis & Prediction

kaggle.com

Updated Sep 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 21, 2025

Dataset provided by

Kaggle

Authors

Michael Matta

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column Name	Details
recipe	Numeric, unique identifier of recipe
calories	Numeric, number of calories
carbohydrate	Numeric, amount of carbohydrates in grams
sugar	Numeric, amount of sugar in grams
protein	Numeric, amount of prote...

Number of International Visitors to London - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jun 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). Number of International Visitors to London - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/number-of-international-visitors-to-london
Explore at:
Dataset updated
Jun 9, 2025
Dataset provided by
CKANhttps://ckan.org/
Area covered
London
Description
Visit Britain publish data relating to international visitors to the UK. They produce the data in two formats - individual spreadsheets for each region that are updated annually, and a single spreadsheet for all regions, containing less detail but updated quarterly. Data shows London totals for nights, visits, and spend. Data broken down by age, purpose, duration, mode and country. This data is also available from Visit Britain website, including the latest quarterly data for other regions. All data taken from the International Passenger Survey (IPS). Some additional data on domestic tourism can be found on the Visit Britain website, and Visit England both overnight tourism and Day visits pages. Data on accomodation occupancy levels is also available from Visit England. An overview of all tourism data for London can be found in this GLAE report 'Tourism in London' Further information can be found on the London and Partners website. Comparisons of international tourist arrivals with other world cities are produced by Euromonitor and in Mastercard's Global Destination Cities Index of 2012, 2013, 2014, and 2015. This dataset is included in the Greater London Authority's Night Time Observatory. Click here to find out more.
p
Lithuania Number Dataset
listtodata.com
hmn.listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Lithuania Number Dataset [Dataset]. https://listtodata.com/lithuania-dataset
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Authors
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Lithuania
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Lithuania number dataset is a database of phone numbers collected from trusted sources. This means the numbers come from reliable places like government records, websites, or phone companies. The companies that provide this data work hard to ensure it is correct. They even offer source URLs, so you can see where the data came from. Moreover, you get 24/7 support, so if you have questions, help is always available. List to Data is a helpful website for finding important cell numbers quickly. Additionally, the phone numbers in the Lithuania number dataset follow an opt-in system. This means people agreed to share their phone numbers. This system is important because it keeps the data legal. It ensures that you are only contacting people who have given permission. Number data in Lithuania makes it easy to connect with the right people. Lithuania phone data is a special set of phone numbers that you can filter to meet your needs. You can easily filter the list by gender, age, and relationship status. For example, you can quickly sort the data to contact older adults or young singles easily. This flexibility makes it easier to communicate with the right audience. Therefore, you can connect with the people you want to reach. Also, the Lithuanian phone data follows strict GDPR rules. These rules protect people’s privacy and make sure their information stays safe. We collect and use the database of Lithuania in ways that respect everyone’s rights. Additionally, it removes any invalid numbers. You can find important phone numbers easily on our website, List to Data. Lithuania phone number list is a collection of phone numbers from people living in Lithuania. This list is completely correct and valid, meaning all numbers work properly. Companies check every phone number to ensure it is accurate. If you find a number that doesn’t work, you can get a new one for free. Moreover, Lithuania phone number list is about all numbers from authorized customers. People on this list agreed to share their numbers. As a result, you can use the data without worrying about legal issues. This makes the phonebook safe and useful for businesses that want to connect with people in Lithuania.
a
Traffic Crashes Resulting in Injury from DataSF pulled monthly points
hub.arcgis.com
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City and County of San Francisco (2025). Traffic Crashes Resulting in Injury from DataSF pulled monthly points [Dataset]. https://hub.arcgis.com/datasets/a24788281a484e08bd662828b4e0718e
Explore at:
Dataset updated
Nov 5, 2025
Dataset authored and provided by
City and County of San Francisco
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Area covered
Description
Redirect Notice: The website https://transbase.sfgov.org/ is no longer in operation. Visitors to Transbase will be redirected to this page where they can view, visualize, and download Traffic Crash data.A. SUMMARYThis table contains all crashes resulting in an injury in the City of San Francisco. Fatality year-to-date crash data is obtained from the Office of the Chief Medical Examiner (OME) death records, and only includes those cases that meet the San Francisco Vision Zero Fatality Protocol maintained by the San Francisco Department of Public Health (SFDPH), San Francisco Police Department (SFPD), and San Francisco Municipal Transportation Agency (SFMTA). Injury crash data is obtained from SFPD’s Interim Collision System for 2018 through the current year-to-date, Crossroads Software Traffic Collision Database (CR) for years 2013-2017 and the Statewide Integrated Transportation Record System (SWITRS) maintained by the California Highway Patrol for all years prior to 2013. Only crashes with valid geographic information are mapped. All geocodable crash data is represented on the simplified San Francisco street centerline model maintained by the Department of Public Works (SFDPW). Collision injury data is queried and aggregated on a quarterly basis. Crashes occurring at complex intersections with multiple roadways are mapped onto a single point and injury and fatality crashes occurring on highways are excluded.The crash, party, and victim tables have a relational structure. The traffic crashes table contains information on each crash, one record per crash. The party table contains information from all parties involved in the crashes, one record per party. Parties are individuals involved in a traffic crash including drivers, pedestrians, bicyclists, and parked vehicles. The victim table contains information about each party injured in the collision, including any passengers. Injury severity is included in the victim table. For example, a crash occurs (1 record in the crash table) that involves a driver party and a pedestrian party (2 records in the party table). Only the pedestrian is injured and thus is the only victim (1 record in the victim table). To learn more about the traffic injury datasets, see the TIMS documentationB. HOW THE DATASET IS CREATEDTraffic crash injury data is collected from the California Highway Patrol 555 Crash Report as submitted by the police officer within 30 days after the crash occurred. All fields that match the SWITRS data schema are programmatically extracted, de-identified, geocoded, and loaded into TransBASE. See Section D below for details regarding TransBASE. C. UPDATE PROCESSAfter review by SFPD and SFDPH staff, the data is made publicly available approximately a month after the end of the previous quarter (May for Q1, August for Q2, November for Q3, and February for Q4). D. HOW TO USE THIS DATASETThis data is being provided as public information as defined under San Francisco and California public records laws. SFDPH, SFMTA, and SFPD cannot limit or restrict the use of this data or its interpretation by other parties in any way. Where the data is communicated, distributed, reproduced, mapped, or used in any other way, the user should acknowledge TransBASE.sfgov.org as the source of the data, provide a reference to the original data source where also applicable, include the date the data was pulled, and note any caveats specified in the associated metadata documentation provided. However, users should not attribute their analysis or interpretation of this data to the City of San Francisco. While the data has been collected and/or produced for the use of the City of San Francisco, it cannot guarantee its accuracy or completeness. Accordingly, the City of San Francisco, including SFDPH, SFMTA, and SFPD make no representation as to the accuracy of the information or its suitability for any purpose and disclaim any liability for omissions or errors that may be contained therein. As all data is associated with methodological assumptions and limitations, the City recommends that users review methodological documentation associated with the data prior to its analysis, interpretation, or communication.This dataset can also be queried on the TransBASE Dashboard. TransBASE is a geospatially enabled database maintained by SFDPH that currently includes over 200 spatially referenced variables from multiple agencies and across a range of geographic scales, including infrastructure, transportation, zoning, sociodemographic, and collision data, all linked to an intersection or street segment. TransBASE facilitates a data-driven approach to understanding and addressing transportation-related health issues,informed by a large and growing evidence base regarding the importance of transportation system design and land use decisions for health. TransBASE’s purpose is to inform public and private efforts to improve transportation system safety, sustainability, community health and equity in San Francisco.E. RELATED DATASETSTraffic Crashes Resulting in Injury: Parties InvolvedTraffic Crashes Resulting in Injury: Victims InvolvedTransBASE DashboardiSWITRSTIMSData pushed to ArcGIS Online on December 2, 2025 at 4:11 AM by SFGIS.Data from: https://data.sfgov.org/d/ubvf-ztfxDescription of dataset columns:

unique_id unique table row identifier cnn_intrsctn_fkey nearest intersection centerline node key cnn_sgmt_fkey nearest street centerline segment key (empty if crash occurred at intersection) case_id_pkey unique crash report number tb_latitude latitude of crash (WGS 84) tb_longitude longitude of crash (WGS 84) geocode_source geocode source geocode_location geocode location collision_datetime the date and time when the crash occurred collision_date the date when the crash occurred collision_time the time when the crash occurred (24 hour time) accident_year the year when the crash occurred month month crash occurred day_of_week day of the week crash occurred time_cat generic time categories juris jurisdiction officer_id officer ID reporting_district SFPD reporting district beat_number SFPD beat number primary_rd the road the crash occurred on secondary_rd a secondary reference road that DISTANCE and DIRECT are measured from distance offset distance from secondary road direction direction of offset distance weather_1 the weather condition at the time of the crash weather_2 the weather condition at the time of the crash, if a second description is necessary collision_severity the injury level severity of the crash (highest level of injury in crash) type_of_collision type of crash mviw motor vehicle involved with ped_action pedestrian action involved road_surface road surface road_cond_1 road condition road_cond_2 road condition, if a second description is necessary lighting lighting at time of crash control_device control device status intersection indicates whether the crash occurred in an intersection vz_pcf_code California vehicle code primary collision factor violated vz_pcf_group groupings of similar vehicle codes violated vz_pcf_description description of vehicle code violated vz_pcf_link link to California vehicle code section number_killed counts victims in the crash with degree of injury of fatal number_injured counts victims in the crash with degree of injury of severe, visible, or complaint of pain street_view link to Google Streetview dph_col_grp generic crash groupings based on parties involved dph_col_grp_description description of crash groupings party_at_fault party number indicated as being at fault party1_type party 1 vehicle type party1_dir_of_travel party 1 direction of travel party1_move_pre_acc party 1 movement preceding crash party2_type party 2 vehicle type (empty if no party 2) party2_dir_of_travel party 2 direction of travel (empty if no party 2) party2_move_pre_acc party 2 movement preceding crash (empty if no party 2) point geometry type of crash location data_as_of date data added to the source system data_updated_at date data last updated the source system data_loaded_at date data last loaded here (in the open data portal) analysis_neighborhood supervisor_district police_district Current Police Districts This column was automatically created in order to record in what polygon from the dataset 'Current Police Districts' (qgnn-b9vv) the point in column 'point' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens. Current Supervisor Districts This column was automatically created in order to record in what polygon from the dataset 'Current Supervisor Districts' (26cr-cadq) the point in column 'point' is located. This

Facebook

Twitter

Click to copy link

Link copied

Cite

AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic/discussion

Website Traffic

Website Traffic and User Engagement Metrics

Explore at:

zip(65228 bytes)Available download formats

Dataset updated

Aug 5, 2024

Authors

AnthonyTherrien

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Overview

This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

Dataset Description

Page Views: The number of pages viewed during a session.
Session Duration: The total duration of the session in minutes.
Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.
Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).
Time on Page: The amount of time spent on the specific page.
Previous Visits: The number of previous visits by the same visitor.
Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

Data Summary

Total Records: 2000
Total Features: 7

Key Features

Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.
Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.
Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.
Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.
Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.
Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.
Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

Usage

This dataset can be used for various analyses such as:

Identifying key drivers of engagement and conversion.
Analyzing the effectiveness of different traffic sources.
Understanding user behavior patterns and optimizing the website accordingly.
Improving marketing strategies based on traffic source performance.
Enhancing user experience by analyzing time spent on different pages.

Acknowledgments

This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.

Clear search

Close search

Google apps

Main menu

Website Traffic

Dataset Overview

Dataset Description

Data Summary

Key Features

Usage

Acknowledgments

Daily website visitors (time series regression)

Context

Content

Inspiration

Website Analytics

Google Analytics Sample

Walmart.com Daily Traffic Statistics 2025

Website Metrics

website_visit_webalizer

Amazon Daily Traffic Statistics 2025

Collections (from American Folklife Center)

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

RÉ Logs Dataset

Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

Traffic Crashes Resulting in Injury (from DataSF, pulled monthly)

Data from: NeuroSense: A Novel EEG Dataset Utilizing Low-Cost, Sparse...

README

Link to the Publication

Details related to access to the data

Data user agreement

Contact person

Practical information to access the data

Overview

EEG Emotion Recognition - Muse Headset2023-2024

Methods

Subjects

Apparatus

Task organization

Task details

Experimental location

Missing data

mCLOUD Metadatenkatalog

Export as DCAT-AP.de in RDF/XML:

Export as CSV:

Parameters:

Single data record

Predefined filters:

Paging (default):

Search term:

Search facet:

Sorting field:

Sort order:

PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music...

Recipe Site Traffic: Analysis & Prediction

Recipe Site Traffic

About Tasty Bytes

Example Recipe

Data Information

Number of International Visitors to London - Dataset - data.gov.uk

Lithuania Number Dataset

Traffic Crashes Resulting in Injury from DataSF pulled monthly points

Website Traffic

Website Traffic and User Engagement Metrics

Dataset Overview

Dataset Description

Data Summary

Key Features

Usage

Acknowledgments

EEG Emotion Recognition - Muse Headset
2023-2024