48 datasets found
  1. Daily website visitors (time series regression)

    • kaggle.com
    Updated Aug 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bob Nau
    Description

    Context

    This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

    Content

    The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

    Inspiration

    This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.

  2. d

    Website Analytics

    • catalog.data.gov
    • data.nola.gov
    • +4more
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nola.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    data.nola.gov
    Description

    This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.

  3. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=pl&inv=1&invt=Ab3yJQ (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=pl
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  4. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  5. D

    Exhibit of Datasets

    • ssh.datastations.nl
    pdf
    Updated Sep 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P.K. Doorn; L. Breure; P.K. Doorn; L. Breure (2024). Exhibit of Datasets [Dataset]. http://doi.org/10.17026/SS/TLTMIR
    Explore at:
    pdf(6387646), pdf(2009614), pdf(21694737), pdf(7119932), pdf(7368953), pdf(2266022), pdf(5957611), pdf(2372244), pdf(3506939), pdf(7233056), pdf(3825954), pdf(1165676), pdf(2683520), pdf(602628), pdf(1968819), pdf(12429754), pdf(1802813), pdf(8847011), pdf(8196391), pdf(559663), pdf(4024461), pdf(1992824), pdf(1541567), pdf(2404227)Available download formats
    Dataset updated
    Sep 2, 2024
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    P.K. Doorn; L. Breure; P.K. Doorn; L. Breure
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2016 - 2020
    Dataset funded by
    Data Archiving and Networked Services
    Description

    The Exhibit of Datasets was an experimental project with the aim of providing concise introductions to research datasets in the humanities and social sciences deposited in a trusted repository and thus made accessible for the long term. The Exhibit consists of so-called 'showcases', short webpages summarizing and supplementing the corresponding data papers, published in the Research Data Journal for the Humanities and Social Sciences. The showcase is a quick introduction to such a dataset, a bit longer than an abstract, with illustrations, interactive graphs and other multimedia (if available). As a rule it also offers the option to get acquainted with the data itself, through an interactive online spreadsheet, a data sample or link to the online database of a research project. Usually, access to these datasets requires several time consuming actions, such as downloading data, installing the appropriate software and correctly uploading the data into these programs. This makes it difficult for interested parties to quickly assess the possibilities for reuse in other projects. The Exhibit aimed to help visitors of the website to get the right information at a glance by: - Attracting attention to (recently) acquired deposits: showing why data are interesting. - Providing a concise overview of the dataset's scope and research background; more details are to be found, for example, in the associated data paper in the Research Data Journal (RDJ). - Bringing together references to the location of the dataset and to more detailed information elsewhere, such as the project website of the data producers. - Allowing visitors to explore (a sample of) the data without downloading and installing associated software at first (see below). - Publishing related multimedia content, such as videos, animated maps, slideshows etc., which are currently difficult to include in online journals as RDJ. - Making it easier to review the dataset. The Exhibit would also have been the right place to publish these reviews in the same way as a webshop publishes consumer reviews of a product, but this could not yet be achieved within the limited duration of the project. Note (1) The text of the showcase is a summary of the corresponding data paper in RDJ, and as such a compilation made by the Exhibit editor. In some cases a section 'Quick start in Reusing Data' is added, whose text is written entirely by the editor. (2) Various hyperlinks such as those to pages within the Exhibit website will no longer work. The interactive Zoho spreadsheets are also no longer available because this facility has been discontinued.

  6. n

    FOI-01782 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). FOI-01782 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-01782
    Explore at:
    Dataset updated
    Mar 21, 2024
    Description

    Thank you for explaining that you don’t collect data on the number of abandoned applications. Alternatively, please could you share the website analytics which shows the number of visitors to each webpage, from this information we can compare against form completion rates and if there is a particular drop in traffic on certain pages/questions? Response A copy of the information is attached. Please read the below notes to ensure correct understanding of the data. Attached is raw data covering individual page hits from 19 February 2024 to 17 March 2024. Please be advised that our Data Analysts have viewed the Google analytics for the Healthy Start website pages, and despite the search options including country, regions and town or city, the data provided within these fields is an approximation and cannot be guaranteed as a true location of a user. We believe that Google analytics geo location capabilities are based on IP (Internet Protocol) addresses which may not resolve to a true location, and instead could be based off the users ISP (Internet Service Provider) server location. Therefore, please be aware that this raw data is not reliable.

  7. The Items Dataset

    • zenodo.org
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Egan; Patrick Egan (2024). The Items Dataset [Dataset]. http://doi.org/10.5281/zenodo.10964134
    Explore at:
    Dataset updated
    Nov 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Egan; Patrick Egan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset originally created 03/01/2019 UPDATE: Packaged on 04/18/2019 UPDATE: Edited README on 04/18/2019

    I. About this Data Set This data set is a snapshot of work that is ongoing as a collaboration between Kluge Fellow in Digital Studies, Patrick Egan and an intern at the Library of Congress in the American Folklife Center. It contains a combination of metadata from various collections that contain audio recordings of Irish traditional music. The development of this dataset is iterative, and it integrates visualizations that follow the key principles of trust and approachability. The project, entitled, “Connections In Sound” invites you to use and re-use this data.

    The text available in the Items dataset is generated from multiple collections of audio material that were discovered at the American Folklife Center. Each instance of a performance was listed and “sets” or medleys of tunes or songs were split into distinct instances in order to allow machines to read each title separately (whilst still noting that they were part of a group of tunes). The work of the intern was then reviewed before publication, and cross-referenced with the tune index at www.irishtune.info. The Items dataset consists of just over 1000 rows, with new data being added daily in a separate file.

    The collections dataset contains at least 37 rows of collections that were located by a reference librarian at the American Folklife Center. This search was complemented by searches of the collections by the scholar both on the internet at https://catalog.loc.gov and by using card catalogs.

    Updates to these datasets will be announced and published as the project progresses.

    II. What’s included? This data set includes:

    • The Items Dataset – a .CSV containing Media Note, OriginalFormat, On Website, Collection Ref, Missing In Duplication, Collection, Outside Link, Performer, Solo/multiple, Sub-item, type of tune, Tune, Position, Location, State, Date, Notes/Composer, Potential Linked Data, Instrument, Additional Notes, Tune Cleanup. This .CSV is the direct export of the Items Google Spreadsheet

    III. How Was It Created? These data were created by a Kluge Fellow in Digital Studies and an intern on this program over the course of three months. By listening, transcribing, reviewing, and tagging audio recordings, these scholars improve access and connect sounds in the American Folklife Collections by focusing on Irish traditional music. Once transcribed and tagged, information in these datasets is reviewed before publication.

    IV. Data Set Field Descriptions

    IV

    a) Collections dataset field descriptions

    • ItemId – this is the identifier for the collection that was found at the AFC
    • Viewed – if the collection has been viewed, or accessed in any way by the researchers.
    • On LOC – whether or not there are audio recordings of this collection available on the Library of Congress website.
    • On Other Website – if any of the recordings in this collection are available elsewhere on the internet
    • Original Format – the format that was used during the creation of the recordings that were found within each collection
    • Search – this indicates the type of search that was performed in order that resulted in locating recordings and collections within the AFC
    • Collection – the official title for the collection as noted on the Library of Congress website
    • State – The primary state where recordings from the collection were located
    • Other States – The secondary states where recordings from the collection were located
    • Era / Date – The decade or year associated with each collection
    • Call Number – This is the official reference number that is used to locate the collections, both in the urls used on the Library website, and in the reference search for catalog cards (catalog cards can be searched at this address: https://memory.loc.gov/diglib/ihas/html/afccards/afccards-home.html)
    • Finding Aid Online? – Whether or not a finding aid is available for this collection on the internet

    b) Items dataset field descriptions

    • id – the specific identification of the instance of a tune, song or dance within the dataset
    • Media Note – Any information that is included with the original format, such as identification, name of physical item, additional metadata written on the physical item
    • Original Format – The physical format that was used when recording each specific performance. Note: this field is used in order to calculate the number of physical items that were created in each collection such as 32 wax cylinders.
    • On Webste? – Whether or not each instance of a performance is available on the Library of Congress website
    • Collection Ref – The official reference number of the collection
    • Missing In Duplication – This column marks if parts of some recordings had been made available on other websites, but not all of the recordings were included in duplication (see recordings from Philadelphia Céilí Group on Villanova University website)
    • Collection – The official title of the collection given by the American Folklife Center
    • Outside Link – If recordings are available on other websites externally
    • Performer – The name of the contributor(s)
    • Solo/multiple – This field is used to calculate the amount of solo performers vs group performers in each collection
    • Sub-item – In some cases, physical recordings contained extra details, the sub-item column was used to denote these details
    • Type of item – This column describes each individual item type, as noted by performers and collectors
    • Item – The item title, as noted by performers and collectors. If an item was not described, it was entered as “unidentified”
    • Position – The position on the recording (in some cases during playback, audio cassette player counter markers were used)
    • Location – Local address of the recording
    • State – The state where the recording was made
    • Date – The date that the recording was made
    • Notes/Composer – The stated composer or source of the item recorded
    • Potential Linked Data – If items may be linked to other recordings or data, this column was used to provide examples of potential relationships between them
    • Instrument – The instrument(s) that was used during the performance
    • Additional Notes – Notes about the process of capturing, transcribing and tagging recordings (for researcher and intern collaboration purposes)
    • Tune Cleanup – This column was used to tidy each item so that it could be read by machines, but also so that spelling mistakes from the Item column could be corrected, and as an aid to preserving iterations of the editing process

    V. Rights statement The text in this data set was created by the researcher and intern and can be used in many different ways under creative commons with attribution. All contributions to Connections In Sound are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creators of these datasets.

    VI. Creator and Contributor Information

    Creator: Connections In Sound

    Contributors: Library of Congress Labs

    VII. Contact Information Please direct all questions and comments to Patrick Egan via www.twitter.com/drpatrickegan or via his website at www.patrickegan.org. You can also get in touch with the Library of Congress Labs team via LC-Labs@loc.gov.

  8. g

    OGD Portal: Daily usage by record (since January 2024) | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OGD Portal: Daily usage by record (since January 2024) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_12610-kanton-basel-landschaft
    Explore at:
    Description

    The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.

  9. Website Metrics

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FEMA/Office of External Affairs/Communication Division (2025). Website Metrics [Dataset]. https://catalog.data.gov/dataset/website-metrics
    Explore at:
    Dataset updated
    Jun 7, 2025
    Dataset provided by
    Federal Emergency Management Agencyhttp://www.fema.gov/
    Description

    Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.

  10. E-commerce - Users of a French C2C fashion store

    • kaggle.com
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    Kaggle
    Authors
    Jeffrey Mvutu Mabilama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Foreword

    This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

    My Telegram bot will answer your queries and allow you to contact me.

    Context

    There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

    Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

    This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

    • For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

    If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

    This dataset is part of a preview of a much larger dataset. Please contact me for more.

    Content

    The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

    Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Questions you might want to answer using this dataset:

    • Are e-commerce users interested in social network feature ?
    • Are my users active enough (compared to those of this dataset) ?
    • How likely are people from other countries to sign up in a C2C website ?
    • How many users are likely to drop off after years of using my service ?

    Example works:

    • Report(s) made using SQL queries can be found on the data.world page of the dataset.
    • Notebooks may be found on the Kaggle page of the dataset.

    License

    CC-BY-NC-SA 4.0

    For other licensing options, contact me.

  11. A

    ‘Spotify Past Decades Songs Attributes’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Spotify Past Decades Songs Attributes’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-spotify-past-decades-songs-attributes-57a7/4e9b7dfe/?iid=011-638&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Spotify Past Decades Songs Attributes’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/cnic92/spotify-past-decades-songs-50s10s on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Why do we like some songs more than others? Is there something about a song that pleases out subconscious, making us listening to it on repeat? To understand this I collected various attributes from a selection of songs available in the Spotify's playlist "All out ..s" starting from the 50s up to the newly ended 10s. Can you find the secret sauce to make a song popular?

    Content

    This data repo contains 7 datasets (.csv files), each representing a Spotify's "All out ..s" type of playlist. Those playlists collect the most popular/iconic songs from the decade. For each song, a set of attributes have been reported in order to perform some data analysis. The attributes have been scraped from this amazing website. In particular, according to the website the attributes are:

    • top genre: genre of the song
    • year: year of the song (due to re-releases, the year might not correspond to the release year of the original song)
    • bpm(beats per minute): beats per minute
    • nrgy(energy): energy of a song, the higher the value the more energetic the song is
    • dnce(danceability): the higher the value, the easier it is to dance to this song.
    • dB(loudness): the higher the value, the louder the song.
    • live(liveness): the higher the value, the more likely the song is a live recording.
    • val(valence): the higher the value, the more positive mood for the song.
    • dur(duration): the duration of the song.
    • acous(acousticness): the higher the value the more acoustic the song is.
    • spch(speechiness): the higher the value the more spoken word the song contains.
    • pop(popularity): the higher the value the more popular the song is.

    Acknowledgements

    I got inspired by the top-notch work by Leonardo Henrique in this dataset. Thanks to him I discovered this website, from which all the data collected here have been scraped.

    --- Original source retains full ownership of the source dataset ---

  12. Z

    PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music...

    • data.niaid.nih.gov
    Updated Mar 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berg-Kirkpatrick, Taylor (2025). PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13763755
    Explore at:
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    Novack, Zachary
    McAuley, Julian
    Long, Phillip
    Berg-Kirkpatrick, Taylor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing. Refer to our paper for more information, and our GitHub repository for any code-related details. Please cite both our paper and our collaborators' paper if you use this dataset (see our GitHub for more information).

    Upon further use of the PDMX dataset, we discovered a discrepancy between the public-facing copyright metadata on the MuseScore website and the internal copyright data of the MuseScore files themselves, which affected 31,221 (12.29% of) songs. We have decided to proceed with the former given its public visibility on Musescore (i.e. this is what the MuseScore website presents its users with). We have noted files with conflicting internal licenses in the license_conflict column of PDMX. We recommend using the no_license_conflict subset of PDMX (which still includes 222,856 songs) moving forward.

    Additionally, for each song in PDMX, we not only provide the MusicRender and metadata JSON files, but we also try to include the associated compressed MusicXML (MXL), sheet music (PDF), and MIDI (MID) files when available. Due to the corruption of 42 of the original MuseScore files, these songs lack those associated files (since they could not be converted to those formats) and only include the MusicRender and metadata JSON files. The all_valid subset of PDMX describes the songs where all associated files are valid.

  13. Music Data Sharing Platform for Computational Musicology Research (CCMUSIC...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhaorui Liu; Zijin Li; Zhaorui Liu; Zijin Li (2021). Music Data Sharing Platform for Computational Musicology Research (CCMUSIC DATASET) [Dataset]. http://doi.org/10.5281/zenodo.5676893
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhaorui Liu; Zijin Li; Zhaorui Liu; Zijin Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This platform is a multi-functional music data sharing platform for Computational Musicology research. It contains many music datas such as the sound information of Chinese traditional musical instruments and the labeling information of Chinese pop music, which is available for free use by computational musicology researchers.

    This platform is also a large-scale music data sharing platform specially used for Computational Musicology research in China, including 3 music databases: Chinese Traditional Instrument Sound Database (CTIS), Midi-wav Bi-directional Database of Pop Music and Multi-functional Music Database for MIR Research (CCMusic). All 3 databases are available for free use by computational musicology researchers. For the contents contained in the database, we will provide audio files recorded by the professional team of the conservatory of music, as well as corresponding labelled files, which have no commodity copyright problem and facilitate large-scale promotion. We hope that this music data sharing platform can meet the one-stop data needs of users and contribute to the research in the field of Computational Musicology.

    If you want to know more information or obtain complete files, please go to the official website of this platform:

    Music Data Sharing Platform for Academic Research

    • Chinese Traditional Instrument Sound Database (CTIS)

    This database is developed by Prof. Han Baoqiang's team for many years, which collects sound information about Chinese traditional musical instruments. The database includes 287 Chinese national musical instruments, including traditional musical instruments, improved musical instruments and ethnic minority musical instruments.

    • Multi-functional Music Database for MIR Research

    This database collects sound materials of pop music, folk music and hundreds of national musical instruments, and makes comprehensive annotation to form a multi-purpose music database for MIR researchers.

    • Midi-wav Bi-directional Database of Pop Music

    This database contains hundreds of Chinese pop songs, and each song contains the corresponding midi-audio-lyric information. Among them, recording the vocal part and accompaniment part of audio independently is helpful to study the MIR task under the ideal situation. In addition, the information of singing techniques consistent with vocal part (such as breath sound, falsetto, breathing, vibrato, mute, slide, etc.) is marked in MuseScore, which constitutes a Midi-Wav bi-direction corresponding pop music database.

  14. A

    ‘Spotify Top 2020 Songs’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Spotify Top 2020 Songs’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-spotify-top-2020-songs-934e/latest
    Explore at:
    Dataset updated
    Dec 29, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Spotify Top 2020 Songs’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heminp16/spotify-top-2020-songs on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The top 50 songs from the year 2020, totaling 50 songs in the dataset.

    Content

    The attributes were scraped from this website. Top Genre - genre of the song Year - release year of the song BPM – Beats Per Minute - The tempo of the song. Energy - The energy of a song; the higher the value, the more energetic. Danceability – Describes how suitable a track is for dancing; the higher the value, the easier it is to dance. Loudness (dB) – The loudness level in decibels, higher the value, the louder the song Liveness - the higher the value, the more likely the song is a live recording. Valence - A measure of musical positiveness of the track. The tracks with the highest number give a sense of positive moods. Duration (sec) - The duration of the song in seconds. Acousticness - A measure of how acoustic the track is. Speechiness - The higher the value that tells how many spoken words were in the track. Popularity - The higher the value, the more popular the song is.

    Acknowledgements

    I got inspired by Leonardo Henrique in this dataset, and found the amazing website that helped me scrap this data!

    --- Original source retains full ownership of the source dataset ---

  15. O

    Parking — Occupancy forecasting

    • data.qld.gov.au
    • researchdata.edu.au
    html
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brisbane City Council (2025). Parking — Occupancy forecasting [Dataset]. https://www.data.qld.gov.au/dataset/parking-occupancy-forecasting
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Brisbane City Council
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is available on Brisbane City Council’s open data website – data.brisbane.qld.gov.au. The site provides additional features for viewing and interacting with the data and for downloading the data in various formats.

    The Brisbane City Council parking occupancy forecasting data is provided to be accessed by third party web or app developers to develop tools to provide Brisbane residents and visitors with likely parking availability within a paid parking area.

    The parking occupancy forecasting data is compiled using advanced analytics and machine learning to estimate paid parking availability. The solution uses parking occupancy survey data, parking meter transaction data and other traffic and environmental data.

    This dataset is linked to the open data called Parking — Meter locations. The field called MOBILE_ZONE is used to link the datasets. MOBILE_ZONE is a seven-digit mobile payment zone number that may include one or many parking meter numbers.

    Additional information on parking meters can be found on the Brisbane City Council website.

    The Brisbane City Council parking occupancy forecasting data includes parking data for all of Council’s parking meters. The data attributes used in this resource and their descriptions can be found in the Parking — Occupancy forecasting — metadata — CSV resource in this dataset.

    The Data and resources section of this dataset contains further information for this dataset.

  16. z

    Data from: NeuroSense: A Novel EEG Dataset Utilizing Low-Cost, Sparse...

    • zenodo.org
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommaso Colafiglio; Tommaso Colafiglio; Angela Lombardi; Angela Lombardi; Paolo Sorino; Paolo Sorino; Elvira Brattico; Elvira Brattico; Domenico Lofù; Domenico Lofù; Danilo Danese; Danilo Danese; eugenio di sciascio; eugenio di sciascio; Tommaso Di Noia; Tommaso Di Noia; Fedelucio Narducci; Fedelucio Narducci (2024). NeuroSense: A Novel EEG Dataset Utilizing Low-Cost, Sparse Electrode Devices for Emotion Exploration [Dataset]. http://doi.org/10.5281/zenodo.14002375
    Explore at:
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodo
    Authors
    Tommaso Colafiglio; Tommaso Colafiglio; Angela Lombardi; Angela Lombardi; Paolo Sorino; Paolo Sorino; Elvira Brattico; Elvira Brattico; Domenico Lofù; Domenico Lofù; Danilo Danese; Danilo Danese; eugenio di sciascio; eugenio di sciascio; Tommaso Di Noia; Tommaso Di Noia; Fedelucio Narducci; Fedelucio Narducci
    Time period covered
    Oct 28, 2024
    Description

    # README

    ## Details related to access to the data

    ### Data user agreement
    The terms and conditions for using this dataset are specified in the [LICENCE](LICENCE) file included in this repository. Please review these terms carefully before accessing or using the data.

    ### Contact person
    For additional information about the dataset, please contact:
    - **Name:** Angela Lombardi
    - **Affiliation:** Department of Electrical and Information Engineering, Politecnico di Bari
    - **Email:** angela.lombardi@poliba.it

    ### Practical information to access the data
    The dataset can be accessed through our dedicated web platform. To request access:

    1. Visit the main dataset page at: https://sisinflab.poliba.it/neurosense-dataset-request/
    2. Follow the instructions on the website to submit your access request
    3. Upon approval, you will receive further instructions for downloading the data

    Please ensure you have read and agreed to the terms in the data user agreement before requesting access.

    ## Overview

    ### EEG Emotion Recognition - Muse Headset
    #### 2023-2024

    The experiment consists in 40 sessions per user. During each session, users are asked to watch a
    music video with the aim to understand their emotions.
    Recordings are performed with a Muse EEG headset at a 256 Hz sampling rate.
    Channels are recorded as follows:
    - Channel 0: AF7
    - Channel 1: TP9
    - Channel 2: TP10
    - Channel 3: AF8

    The chosen songs have various Last.fm tags in order to create different feelings. The title of every track
    can be found in the "TaskName" field of sub-ID***_ses-S***_task-Default_run-001_eeg.json, while the author,
    the Last.fm tag and additional information in "TaskDescription".

    ## Methods

    ### Subjects

    The subject pool is made of 30 college students, aged between 18 and 35. 16 of them are males, 14 females.

    ### Apparatus

    The experiment was performed using the same procedures as those to create
    [Deap Dataset](https://www.eecs.qmul.ac.uk/mmv/datasets/deap/), which is a dataset to recognize emotions via a Brain
    Computer Interface (BCI).


    ### Task organization

    Firstly, music videos were selected. Once 40 songs were picked, the protocol was chosen and the self-assessment
    questionnaire was created.

    ### Task details

    In order to evaluate the stimulus, Russell's VAD (Valence-Arousal-Dominance) scale was used.
    In this scale, valenza-arousal space can be divided in four quadrants:
    - Low Arousal/Low Valence (LALV);
    - Low Arousal/High Valence (LAHV);
    - High Arousal/Low Valence (HALV);
    - High Arousal/High Valence (HAHV).

    ### Experimental location

    The experiment was performed in a laboratory located at DEI Department of
    [Politecnico di Bari](https://www.poliba.it/).

    ### Missing data
    Data recorded during session S019 - Session 2, ID021 - Session 23, user was corrupted, therefore is missing.
    Sessions S033 and S038 of ID015 user show a calculated effective sampling rate lower than 256 Hz:
    - ID015_ses-S033 has 226.1320 Hz
    - ID015_ses-S038 has 216.9549 Hz

  17. P

    +++Can I call Delta Airlines to get the best deal on flights? Dataset

    • paperswithcode.com
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HUI ZHANG; Shenglong Zhou; Geoffrey Ye Li; Naihua Xiu (2025). +++Can I call Delta Airlines to get the best deal on flights? Dataset [Dataset]. https://paperswithcode.com/dataset/can-i-call-delta-airlines-to-get-the-best
    Explore at:
    Dataset updated
    Jul 20, 2025
    Authors
    HUI ZHANG; Shenglong Zhou; Geoffrey Ye Li; Naihua Xiu
    Description

    Yes, calling Delta Airlines can be an excellent way to find the best deals on flights, especially when you want real-time assistance. ☎️+1(855)-564-2526 By dialing ☎️+1(855)-564-2526, you’re connecting directly with a knowledgeable agent who can walk you through current promotions, ☎️+1(855)-564-2526 flexible travel dates, and unpublished fare options.

    Sometimes, deals are available that aren't advertised on the website. ☎️+1(855)-564-2526 These hidden or exclusive offers can be disclosed only by phone agents, ☎️+1(855)-564-2526 especially if you're flexible with departure times, travel days, or even nearby airports. ☎️+1(855)-564-2526 Calling lets you explore multiple scenarios in real time, which is harder to do when clicking through the website.

    Agents can also identify package discounts that combine flights with hotel stays, ☎️+1(855)-564-2526 rental cars, or other amenities that reduce your total cost. ☎️+1(855)-564-2526 While these may be viewable online, they’re often easier to customize and finalize over the phone. ☎️+1(855)-564-2526 Speaking with someone live gives you a direct line to insight, strategy, and flexibility.

    Another big advantage is price matching. ☎️+1(855)-564-2526 If you’ve found a cheaper fare on another platform or website, ☎️+1(855)-564-2526 calling ☎️+1(855)-564-2526 gives you the opportunity to present that price and see if Delta can match or even beat it. ☎️+1(855)-564-2526 While it’s not guaranteed, phone representatives often have discretion to offer discounts or incentives.

    Sometimes, the lowest fares come with certain restrictions—like no changes or no refunds. ☎️+1(855)-564-2526 These conditions may not always be clear online. ☎️+1(855)-564-2526 But when you speak to an agent, ☎️+1(855)-564-2526 they can explain all the fine print so you’re not hit with surprises later. ☎️+1(855)-564-2526 This is especially useful if you’re looking for budget-friendly fares with minimal risks.

    Travelers looking to use SkyMiles or promo codes also find value in calling. ☎️+1(855)-564-2526 If your miles don’t seem to apply online or you’re unsure how many points you need, ☎️+1(855)-564-2526 an agent can guide you. ☎️+1(855)-564-2526 Plus, they may even notify you of redemption bonuses or limited-time offers.

    Customer service agents are also trained to understand fare classes in detail. ☎️+1(855)-564-2526 They can explain whether it’s worth upgrading from Basic Economy to Main Cabin or Comfort+. ☎️+1(855)-564-2526 That type of advice is priceless when you're looking to maximize comfort without overpaying. ☎️+1(855)-564-2526

    If you’re booking for a group or a large family, calling may be your best strategy. ☎️+1(855)-564-2526 Delta often provides group discounts that aren’t easily accessible through the website. ☎️+1(855)-564-2526 You can also request special seating arrangements, coordinated check-in, or baggage bundling. ☎️+1(855)-564-2526 These features are especially helpful for team travel, corporate trips, or family reunions.

    Another benefit is real-time troubleshooting. ☎️+1(855)-564-2526 If your travel dates are flexible, ☎️+1(855)-564-2526 agents can quickly scroll through alternate days to find cheaper options. ☎️+1(855)-564-2526 On a website, this process is manual and time-consuming. ☎️+1(855)-564-2526 But with a quick conversation, you can save both time and money.

    Lastly, phone reps often have the most current inventory in front of them. ☎️+1(855)-564-2526 If a sale just launched or an error fare was published, ☎️+1(855)-564-2526 there's a chance the online systems haven’t fully updated. ☎️+1(855)-564-2526 A phone call ensures you're seeing the freshest data. ☎️+1(855)-564-2526

    To conclude, yes, calling Delta Airlines is a smart move when you're serious about finding the best deal. ☎️+1(855)-564-2526 Whether you're working with a specific budget or want insider tips on booking, ☎️+1(855)-564-2526 speaking to a live agent at ☎️+1(855)-564-2526 can be the key to unlocking better value.

  18. ISMIR04 Genre Identification task dataset

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Cano; N. Wack; P. Herrera; P. Herrera; P. Cano; N. Wack (2020). ISMIR04 Genre Identification task dataset [Dataset]. http://doi.org/10.5281/zenodo.1302992
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    P. Cano; N. Wack; P. Herrera; P. Herrera; P. Cano; N. Wack
    Description

    This is a collection of audio used for the Genre Identification task of the ISMIR 2004 audio description contest organized by the Music Technology Group (Universitat Pompeu Fabra). The audio for the task was collected from Magnatune, which contains a large amount of music licensed under Creative Commons licenses. The task of the contest was to classify a set of songs into genres, using the genre labels that Magnatune provided in their database.

    Further information about the original contest and the contents of the dataset can be obtained from the following technical report:

    Cano P, Gómez E, Gouyon F, Herrera P, Koppenberger M, Ong B, Serra X, Streich S, Wack N. ISMIR 2004 audio description contest. Barcelona: Universitat Pompeu Fabra, Music technology Group; 2006. 20 p. Report No.: MTG-TR-2006-02

    http://hdl.handle.net/10230/34013

    The original contest website can be found at http://ismir2004.ismir.net/genre_contest/

    The dataset contains the audio tracks from following 8 genres: classical, electronic, jazz- & blues, metal-, punk, rock-, pop, world.

    For the genre recognition contest, the data was grouped into 6 classes: classical, electronic, jazz-blues, metal-punk, rock-pop, world, where in some cases two genres were merged into a single class. Note that ground-truth files uses these 6 classes, however in some cases the data is organised by original genre.

    Audio

    The audio is in MP3 format. It is divided into three folders, representing different subsets of the collection. Each folder has 729 files, split into classes. The number of files in each category reflects the proportion of files in each category in Magnatune when the dataset was created. No track appears in more than one folder.

    • Training: files for generating a classification model, arranged by class.

    • Development: A separate set of files for participants to test their model against.

    • Evaluation: originally a private subset, the files used to evaluate the accuracy of all submitted models

    The training and development set each consist of:

    • classical: 320 files

    • electronic: 115 files

    • jazz_blues: 26 files

    • metal_punk: 45 files

    • rock_pop: 101 files

    • world: 122 files

    The evaluation set consists of 729 tracks with a similar distribution.

    Metadata

    Each folder of audio has a corresponding folder containing metadata of the files in that folder. The metadata is included in a file, tracklist.csv which has the following headers:

    class, artist, album, track, track number, file path

    The evaluation tracklist file has an additional column representing the magnatune track id of the recording.

    Due to the way that the data was collected and distributed for the challenge, the metadata for the development subset is anonymised.

    Licensing

    The audio is licensed under a CC Attribution-NonCommercial-ShareAlike license (https://creativecommons.org/licenses/by-nc-sa/1.0/).

    Using this dataset

    We would highly appreciate if scientific publications of works partly based on this dataset cite the above publication.

    We are interested in knowing if you find our datasets useful! If you use our dataset please email us at mtg-info@upf.edu and tell us about your research.

  19. G

    2015 – 2017 Glyphosate Testing Data

    • open.canada.ca
    • datasets.ai
    • +1more
    csv, html
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Canadian Food Inspection Agency (2024). 2015 – 2017 Glyphosate Testing Data [Dataset]. https://open.canada.ca/data/en/dataset/906cd35c-d396-4999-9a9f-f5351796661f
    Explore at:
    html, csvAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Canadian Food Inspection Agencyhttps://inspection.canada.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Glyphosate data supporting scientific publication.

    Linking to non-government of Canada websites

    Links to Websites not under the control of the Government of Canada are provided solely for the convenience of the Website visitors. The Government of Canada is not responsible for the accuracy, currency or reliability of the content. The Government of Canada does not offer any guarantee in that regard and is not responsible for the information found through these links, nor does it endorse the sites and their content.

    Visitors should also be aware that information offered by non-Government of Canada sites to which this website links to is not subject to the Privacy Act, or the Official Languages Act and may not be accessible to persons with disabilities. The information offered may be available only in the language(s) used by the sites in question. With respect to privacy, visitors should research the privacy policies of these non-government Websites before providing personal information.

  20. n

    Full dataset for: Diversifying environmental volunteers by engaging with...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Nov 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anita Diaz; Kayleigh Winch; Richard Stafford; Pippa Gillingham; Einar Thorsen (2020). Full dataset for: Diversifying environmental volunteers by engaging with online communities [Dataset]. http://doi.org/10.5061/dryad.fxpnvx0qd
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 24, 2020
    Dataset provided by
    Bournemouth University
    Authors
    Anita Diaz; Kayleigh Winch; Richard Stafford; Pippa Gillingham; Einar Thorsen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Environmental volunteering can benefit participants and nature through improving physical and mental wellbeing while encouraging environmental stewardship. To enhance achievement of these outcomes, conservation organisations need to reach different groups of people to increase participation in environmental volunteering. This paper explores what engages communities searching online for environmental volunteering.
      
    2. We conducted a literature review of 1032 papers to determine key factors fostering participation by existing volunteers in environmental projects. We found the most important factor was to tailor projects to the motivations of participants. Also important were: promoting projects to people with relevant interests; meeting the perceived benefits of volunteers and removing barriers to participation.
      
    3. We then assessed the composition and factors fostering participation of the NatureVolunteers’s online community (n = 2216) of potential environmental volunteers and compared findings with those from the literature review. We asked whether projects advertised by conservation organisations meet motivations and interests of this online community.
      
    4. Using Facebook insights and Google Analytics we found that the online community were on average younger than extant communities observed in studies of environmental volunteering. Their motivations were also different as they were more interested in physical activity and using skills and less in social factors. They also exhibited preference for projects which are outdoor based, and which offer close contact with wildlife. Finally, we found that the online community showed a stronger preference for habitat improvement projects over those involving species-survey based citizen science.
      
    5. Our results demonstrate mis-matches between what our online community are looking for and what is advertised by conservation organisations. The online community are looking for projects which are more solitary, more physically active and more accessible by organised transport. We discuss how our results may be used by conservation organisations to better engage with more people searching for environmental volunteering opportunities online.
      
    6. We conclude that there is a pool of young people attracted to environmental volunteering projects whose interests are different to those of current volunteers. If conservation organisations can develop projects that meet these interests, they can engage larger and more diverse communities in nature volunteering.
      

    Methods The data set consists of separate sheets for each set of results presented in the paper. Each sheet contains the full data, summary descriptive statistics analysis and graphs presented in the paper. The method for collection and processing of the dataset in each sheet is as follows:

    The data set for results presented in Figure 1 in the paper - Sheet: "Literature"

    We conducted a review of literature on improving participation within nature conservation projects. This enabled us to determine what the most important factors were for participating in environmental projects, the composition of the populations sampled and the methods by which data were collected. The search terms used were (Environment* OR nature OR conservation) AND (Volunteer* OR “citizen science”) AND (Recruit* OR participat* OR retain* OR interest*). We reviewed all articles identified in the Web of Science database and the first 50 articles sorted for relevance in Google Scholar on the 22nd October 2019. Articles were first reviewed by title, secondly by abstract and thirdly by full text. They were retained or excluded according to criteria agreed by the authors of this paper. These criteria were as follows - that the paper topic was volunteering in the environment, including citizen science, community-based projects and conservation abroad, and included the study of factors which could improve participation in projects. Papers were excluded for topics irrelevant to this study, the most frequent being the outcomes of volunteering for participants (such as behavioural change and knowledge gain), improving citizen science data and the usefulness of citizen science data. The remaining final set of selected papers was then read to extract information on the factors influencing participation, the population sampled and the data collection methods. In total 1032 papers were reviewed of which 31 comprised the final selected set read in full. Four factors were identified in these papers which improve volunteer recruitment and retention. These were: tailoring projects to the motivations of participants, promoting projects to people with relevant hobbies and interests, meeting the perceived benefits of volunteers and removing barriers to participation.

    The data set for results presented in Figure 2 and Figure 3 in the paper - Sheet "Demographics"

    To determine if the motivations and interests expressed by volunteers in literature were representative of wider society, NatureVolunteers was exhibited at three UK public engagement events during May and June 2019; Hullabaloo Festival (Isle of Wight), The Great Wildlife Exploration (Bournemouth) and Festival of Nature (Bristol). This allowed us to engage with people who may not have ordinarily considered volunteering and encourage people to use the website. A combination of surveys and semi-structured interviews were used to collect information from the public regarding demographics and volunteering. In line with our ethics approval, no personal data were collected that could identify individuals and all participants gave informed consent for their anonymous information to be used for research purposes. The semi-structured interviews consisted of conducting the survey in a conversation with the respondent, rather than the respondent filling in the questionnaire privately and responses were recorded immediately by the interviewer. Hullabaloo Festival was a free discovery and exploration event where NatureVolunteers had a small display and surveys available. The Great Wildlife Exploration was a Bioblitz designed to highlight the importance of urban greenspaces where we had a stall with wildlife crafts promoting NatureVolunteers. The Festival of Nature was the UK’s largest nature-based festival in 2019 where we again had wildlife crafts available promoting NatureVolunteers. The surveys conducted at these events sampled a population of people who already expressed an interest in nature and the environment by attending the events and visiting the NatureVolunteers stand. In total 100 completed surveys were received from the events NatureVolunteers exhibited at; 21 from Hullabaloo Festival, 25 from the Great Wildlife Exploration and 54 from the Festival of Nature. At Hullabaloo Festival information on gender was not recorded for all responses and was consequently entered as “unrecorded”.

    OVERALL DESCRIPTION OF METHOD DATA COLLECTION FOR ALL OTHER RESULTS (Figures 4-7 and Tables 1-2)

    The remaining data were all collected from the NatureVolunteers website. The NatureVolunteers website https://www.naturevolunteers.uk/ was set up in 2018 with funding support from the Higher Education Innovation Fund to expand the range of people accessing nature volunteering opportunities in the UK. It is designed to particularly appeal to people who are new to nature volunteering including young adults wishing to expand their horizons, families looking for ways connect with nature to enhance well-being and older people wishing to share their time and life experiences to help nature. In addition, it was designed to be helpful to professionals working in the countryside & wildlife conservation sectors who wish to enhance their skills through volunteering. As part of the website’s development we created and used an online project database, www.naturevolunteers.uk (hereafter referred to as NatureVolunteers), to assess the needs and interests of our online community. Our research work was granted ethical approval by the Bournemouth University Ethics Committee. The website collects entirely anonymous data on our online community of website users that enables us to evaluate what sort of projects and project attributes most appeal to our online community. Visitors using the website to find projects are informed as part of the guidance on using the search function that this fully anonymous information is collected by the website to enhance and share research understanding of how conservation organisations can tailor their future projects to better match the interests of potential volunteers. Our online community was built up over the 2018-2019 through open advertising of the website nationally through the social media channels of our partner conservation organisations, through a range of public engagement in science events and nature-based festivals across southern England and through our extended network of friends and families, their own social media networks and the NatureVolunteers website’s own social network on Facebook and Twitter. There were 2216 searches for projects on NatureVolunteers from January 1st to October 25th, 2019.

    The data set for results presented in Figure 2 and Figure 3 in the paper - Sheet "Demographics"

    On the website, users searching for projects were firstly asked to specify their expectations of projects. These expectations encompass the benefits of volunteering by asking whether the project includes social interaction, whether particular skills are required or can be developed, and whether physical activity is involved. The barriers to participation are incorporated by asking whether the project is suitable for families, and whether organised transport is provided. Users were asked to rate the importance of the five project expectations on a Likert scale of 1 to 5 (Not at all = 1, Not really = 2, Neutral = 3, It

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors/code
Organization logo

Daily website visitors (time series regression)

Predict tomorrow's number of website visitors from 5 years of daily data

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bob Nau
Description

Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.

Search
Clear search
Close search
Google apps
Main menu