64 datasets found
  1. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  2. MLB Leaderboard 2024

    • kaggle.com
    Updated Nov 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yukawithdata (2024). MLB Leaderboard 2024 [Dataset]. https://www.kaggle.com/datasets/yukawithdata/mlb-statcast-leaderboard-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2024
    Dataset provided by
    Kaggle
    Authors
    yukawithdata
    Description

    💁‍♀️Please take a moment to carefully read through this description and metadata to better understand the dataset and its nuances before proceeding to the Suggestions and Discussions section.

    Batter Performance Metrics Dataset

    This dataset focuses on a wide range of sabermetric metrics for analyzing batter performance in baseball. It provides a comprehensive view of a player's abilities in power, plate discipline, speed, and overall efficiency.

    NOTE that only qualified players are included in this data, meaning that players who reach the minimum number of plate appearances are required to qualify for season-long leaderboards and rate stats. The data was retrieved on October 18th, 2024.

    • AB (At-Bats): The total number of times a batter has a turn at the plate, excluding walks and sacrifices.

    • PA (Plate Appearances): The total number of times a batter completes a turn at the plate, including all outcomes.

    • Home Run: The number of times the batter hits the ball out of the field, allowing them to round all bases and score.

    • K% (Strikeout Percentage): The percentage of plate appearances that result in a strikeout.

    • BB% (Walk Percentage): The percentage of plate appearances that result in the batter receiving a walk.

    • SLG% (Slugging Percentage): A measure of the batter's power, calculated as total bases per at-bat.

    • OBP (On-Base Percentage): The percentage of times the batter reaches base via hits, walks, or hit-by-pitch events.

    • OPS (On-Base Plus Slugging): The sum of OBP and SLG, providing a combined measure of a batter's ability to get on base and hit for power.

    • Isolated Power (ISO): A measure of a batter's raw power, calculated by subtracting batting average from slugging percentage to focus on extra-base hits.

    • BABIP (Batting Average on Balls in Play): The batting average on balls hit into play, excluding home runs and strikeouts.

    • Total Stolen Bases: The total number of bases a player has stolen successfully.

    • xwOBA (Expected Weighted On-Base Average): A predictive version of wOBA (Weighted On-base Average) based on the quality of contact, such as exit velocity and launch angle.

    • wOBAdiff (wOBA Differential): The difference between a batter’s actual wOBA and expected wOBA (xwOBA), indicating performance versus expectations.

    • Exit Velocity Avg (Average Exit Velocity): The average speed of the ball off the bat, providing insight into the quality of contact.

    • Sweet Spot Percentage: The percentage of batted balls hit with a launch angle between 8 and 32 degrees, which typically leads to better offensive results.

    • Barrel Batted Rate: The percentage of batted balls hit with ideal exit velocity and launch angle, maximizing chances for extra-base hits.

    • Hard-Hit Percentage: The percentage of batted balls hit with an exit velocity of 95 mph or higher, reflecting the strength of contact.

    • Average Hyper Speed: The batter's average sprint speed during short, high-intensity runs like reaching base.

    • Whiff Percentage: The percentage of swings in which the batter misses the ball entirely.

    • Swing Percentage: The percentage of pitches at which the batter swings, regardless of whether they make contact.

    • HP to 1B (Home Plate to First Base Speed): The time it takes for a batter to sprint from home plate to first base after hitting the ball.

    • Sprint Speed: The player’s top running speed, usually measured during base running or fielding.

    • WAR (Wins Above Replacement): measures a player's value in all facets of the game by deciphering how many more wins he's worth than a replacement-level player at his same position.

    This dataset is designed to simplify the process of analyzing batter performance using advanced sabermetric principles, providing key insights into offensive effectiveness and expected outcomes.

    Data Usage and License

    The dataset was retrieved from the respective sources listed in the Provenance section. Users are urged to use this data responsibly and to respect the rights and guidelines specified by the original data providers. When utilizing or sharing insights derived from this dataset, ensure proper attribution to the sources.

  3. Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

    • zenodo.org
    bin, zip
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

    The attractive features of MusicOSet include:

    • Integration and centralization of different musical data sources
    • Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018
    • Enriched metadata for music, artists, and albums from the US popular music industry
    • Availability of acoustic and lyrical resources
    • Unrestricted access in two formats: SQL database and compressed .csv files
    |    Data    | # Records |
    |:-----------------:|:---------:|
    | Songs       | 20,405  |
    | Artists      | 11,518  |
    | Albums      | 26,522  |
    | Lyrics      | 19,664  |
    | Acoustic Features | 20,405  |
    | Genres      | 1,561   |
  4. A

    ‘K-Pop Hits Through The Years’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘K-Pop Hits Through The Years’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-k-pop-hits-through-the-years-0b70/be8b4573/?iid=032-298&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘K-Pop Hits Through The Years’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sberj127/kpop-hits-through-the-years on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    What is the data?

    The datasets contain the top songs from the said era or year accordingly (as presented in the name of each dataset). Note that only the KPopHits90s dataset represents an era (1989-2001). Although there is a lack of easily available and reliable sources to show the actual K-Pop hits per year during the 90s, this era was still included as this time period was when the first generation of K-Pop stars appeared. Each of the other datasets represent a specific year after the 90s.

    How was it obtained?

    A song is considered to be a K-Pop hit during that era or year if it is included in the annual series of K-Pop Hits playlists, which is created officially by Apple Music. Note that for the dataset that represents the 90s, the playlist 90s K-Pop Essentials was used as the reference.

    1. These playlists were transferred into Spotify through the Tune My Music site. After transferring, the site also presented all the missing songs from each Spotify playlist when compared to the original Apple Music playlists.
      • Any data besides the names and artists of the hit songs were not directly obtained from Apple Music since these other details of songs in this music service are only available for those enrolled as members of the Apple Developer Program.
    2. The presented missing songs from each playlist was manually searched and, if found, added to the respective Spotify playlist.
      • For the songs that were found, there are three types: (1) the song by the original artist, (2) the instrumental of the original song and (3) a cover of the song. When the first type is not found, the two other types are searched and are compared to each other. The one that sounded the most like the original song (from the Apple Music playlist) is chosen as the substitute in the Spotify playlist.
      • Presented is a link containing all the missing data per playlist (when the initial Spotify playlists were compared to the original Apple Music playlists) and the action done to each one.
    3. The necessary identification details and specific audio features of each track were obtained through the use of the Spotipy library and Spotify Web API documentation.

    Why did you make this?

    As someone who has a particular curiosity to the field of data science and a genuine love for the musicality in the K-Pop scene, this data set was created to make something out of the strong interest I have for these separate subjects.

    Acknowledgements

    I would like to express my sincere gratitude to Apple Music for creating the annual K-Pop playlists, Spotify for making their API very accessible, Spotipy for making it easier to get the desired data from the Spotify Web API, Tune My Music for automating the process of transferring one's library into another service's library and, of course, all those involved in the making of these songs and artists included in these datasets for creating such high quality music and concepts digestible even for the general public.

    --- Original source retains full ownership of the source dataset ---

  5. Spotify Top 50 Tracks 2023

    • kaggle.com
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yuka_with_data (2024). Spotify Top 50 Tracks 2023 [Dataset]. https://www.kaggle.com/datasets/yukawithdata/spotify-top-tracks-2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Kaggle
    Authors
    yuka_with_data
    Description

    💁‍♀️Please take a moment to carefully read through this description and metadata to better understand the dataset and its nuances before proceeding to the Suggestions and Discussions section.

    Dataset Description:

    This dataset compiles the tracks from Spotify's official "Top Tracks of 2023" playlist, showcasing the most popular and influential music of the year according to Spotify's streaming data. It represents a wide range array of genres, artists, and musical styles that have defined the musical landscapes of 2023. Each track in the dataset is detailed with a variety of features, popularity, and metadata. This dataset serves as an excellent resource for music enthusiasts, data analysts, and researchers aiming to explore music trends or develop music recommendation systems based on empirical data.

    Data Collection and Processing:

    Obtaining the Data:

    The data was obtained directly from the Spotify Web API, specifically from the "Top Tracks of 2023" official playlist curated by Spotify. The Spotify API provides detailed information about tracks, artists, and albums through various endpoints.

    Data Processing:

    To process and structure the data, I developed Python scripts using data science libraries such as pandas for data manipulation and spotipy for API interactions specifically for Spotify data retrieval.

    Workflow:

    1. Authentification
    2. API Requests
    3. Data Cleaning and Transformation
    4. Saving the Data

    Attribute Descriptions:

    • artist_name: the artist name
    • track_name: the title of the track
    • is_explicit: Indicates whether the track contains explicit content
    • album_release_date: The date when the track was released
    • genres: A list of genres associated with the track's artist(s)
    • danceability: A measure from 0.0 to 1.0 indicating how suitable a track is for dancing based on a combination of musical elements
    • valence: A measure from 0.0 to 1.0 indicating the musical positiveness conveyed by a track
    • energy: A measure from 0.0 to 1.0 representing a perceptual measure of intensity and activity
    • loudness: The overall loudness of a track in decibels (dB)
    • acousticness: A measure from 0.0 to 1.0 whether the track is acoustic.
    • instrumentalness: Predicts whether a track contains no vocals
    • liveness: Detects the presence of an audience in the recordings
    • speechiness: Detects the presence of spoken words in a track
    • key: The key the track is in. Integers map to pitches using standard Pitch Class notation.
    • tempo: The overall estimated tempo of a track in beats per minute (BPM)
    • mode: Modality of the track
    • duration_ms: The length of the track in milliseconds
    • time_signature: An estimated overall time signature of a track
    • popularity: A score between 0 and 100, with 100 being the most popular

    Possible Data Projects

    • Trends Analysis
    • Genre Popularity
    • Mood and Music
    • Comparison with other tracks

    Disclaimer and Responsible Use:

    • This dataset, derived from Spotify's "Top Tracks of 2023" playlist, is intended for educational, research, and analysis purposes only. Users are urged to use this data responsibly and ethically.
    • Users should comply with Spotify's Terms of Service and Developer Policies when using this dataset.
    • The dataset includes music track information such as names and artist details, which are subject to copyright. While the dataset presents this information for analytical purposes, it does not convey any rights to the music itself.
    • Users of the dataset must ensure that their use does not infringe on the rights of copyright holders. Any analysis, distribution, or derivative work should respect the intellectual property rights of all parties and comply with applicable laws.
    • The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for the use of the dataset by others. Users are responsible for ensuring their use of the dataset is legal and ethical.
    • For the most accurate and up-to-date information regarding Spotify's music, playlists, and policies, users are encouraged to refer directly to Spotify's official website. This ensures that users have access to the latest details directly from the source.
    • The creator/maintainer of this dataset is not affiliated with Spotify, any third-party entities, or artists mentioned within the dataset. This project is independent and has not been authorized, sponsored, or otherwise approved by Spotify or any other mentioned entities.

    Contribution

    I encourage users who discover new insights, propose dataset enhancements, or craft analytics that illuminate aspects of the dataset's focus to share their findings with the community. - Kaggle Notebooks: To facilitate sharing and collaboration, users are encouraged to create and share their analyses through Kaggle notebooks. For ease of use, start your notebook by clicking "New Notebook" atop this dataset’s page on K...

  6. Project Sunroof

    • console.cloud.google.com
    Updated Aug 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Google%20Project%20Sunroof (2017). Project Sunroof [Dataset]. https://console.cloud.google.com/marketplace/product/project-sunroof/project-sunroof
    Explore at:
    Dataset updated
    Aug 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    Description

    As the price of installing solar has gotten less expensive, more homeowners are turning to it as a possible option for decreasing their energy bill. We want to make installing solar panels easy and understandable for anyone. Project Sunroof puts Google's expansive data in mapping and computing resources to use, helping calculate the best solar plan for you. How does it work? When you enter your address, Project Sunroof looks up your home in Google Maps and combines that information with other databases to create your personalized roof analysis. Don’t worry, Project Sunroof doesn't give the address to anybody else. Learn more about Project Sunroof and see the tool at Project Sunroof’s site . Project Sunroof computes how much sunlight hits roofs in a year, based on shading calculations, typical meteorological data, and estimates of the size and shape of the roofs. You can see more details about how solar viability is determined by checking out methodology here. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  7. Coastal dataset including exposure and vulnerability layers, Deliverable 3.1...

    • zenodo.org
    Updated Nov 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E. Ieronymidi; D. Grigoriadis; E. Ieronymidi; D. Grigoriadis (2023). Coastal dataset including exposure and vulnerability layers, Deliverable 3.1 - ECFAS Project (GA 101004211), www.ecfas.eu [Dataset]. http://doi.org/10.5281/zenodo.7319270
    Explore at:
    Dataset updated
    Nov 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    E. Ieronymidi; D. Grigoriadis; E. Ieronymidi; D. Grigoriadis
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The European Copernicus Coastal Flood Awareness System (ECFAS) project aimed at contributing to the evolution of the Copernicus Emergency Management Service (https://emergency.copernicus.eu/) by demonstrating the technical and operational feasibility of a European Coastal Flood Awareness System. Specifically, ECFAS provides a much-needed solution to bolster coastal resilience to climate risk and reduce population and infrastructure exposure by monitoring and supporting disaster preparedness, two factors that are fundamental to damage prevention and recovery if a storm hits.

    The ECFAS Proof-of-Concept development ran from January 2021 to December 2022. The ECFAS project was a collaboration between Scuola Universitaria Superiore IUSS di Pavia (Italy, ECFAS Coordinator), Mercator Ocean International (France), Planetek Hellas (Greece), Collecte Localisation Satellites (France), Consorzio Futuro in Ricerca (Italy), Universitat Politecnica de Valencia (Spain), University of the Aegean (Greece), and EurOcean (Portugal), and was funded by the European Commission H2020 Framework Programme within the call LC-SPACE-18-EO-2020 - Copernicus evolution: research activities in support of the evolution of the Copernicus services.

    Description of the containing files inside the Dataset.

    The ECFAS Coastal Dataset represents a single access point to publicly available Pan-European datasets that provide key information for studying coastal areas. The publicly available datasets listed below have been clipped to the coastal area extent, quality-checked and assessed for completeness and usability in terms of coverage, accuracy, specifications and access. The dataset was divided at European country level, except for the Adriatic area which was extracted as a region and not at the country level due to the small size of the countries. The buffer zone of each data was 10km inland in order to be correlated with the new Copernicus product Coastal Zone LU/LC.

    Specifically, the dataset includes the new Coastal LU/LC product which was implemented by the EEA and became available at the end of 2020. Additional information collected in relation to the location and characteristics of transport (road and railway) and utility networks (power plants), population density and time variability. Furthermore, some of the publicly available datasets that were used in CEMS related to the above mentioned assets were gathered such as OpenStreetMap (building footprints, road and railway network infrastructures), GeoNames (populated places but also names of administrative units, rivers and lakes, forests, hills and mountains, parks and recreational areas, etc.), the Global Human Settlement Layer (GHS) and Global Human Settlement Population Grid (GHS-POP) generated by JRC. Also, the dataset contains 2 layers with statistics information regarding the population of Europe per sex and age divided in administrative units at NUTS level 3. The first layer includes information for the whole of Europe and the second layer has only the information regarding the population at the Coastal area. Finally, the dataset includes the global database of Floods protection standards. Below there are tables which present the dataset.

    * Adriatic folder contains the countries: Slovenia, Croatia, Montenegro, Albania, Bosnia and Herzegovina

    * Malta was added to the dataset

    Copernicus Land Monitoring Service:

    Coastal LU/LC

    Scale 1:10.000; A Copernicus hotspot product to monitor landscape dynamics in coastal zones

    EU-Hydro - Coastline

    Scale 1:30.000; EU-Hydro is a dataset for all European countries providing the coastline

    Natura 2000

    Scale 1: 100000; A Copernicus hotspot product to monitor important areas for nature conservation

    European Settlement Map

    Resolution 10m; A spatial raster dataset that is mapping human settlements in Europe

    Imperviousness Density

    Resolution 10m; The percentage of sealed area

    Impervious Built-up

    Resolution 10m; The part of the sealed surfaces where buildings can be found

    Grassland 2018

    Resolution 10m; A binary grassland/non-grassland product

    Tree Cover Density 2018

    Resolution 10m; Level of tree cover density in a range from 0-100%

    Joint Research Center:

    Global Human Settlement Population Grid
    GHS-POP)

    Resolution 250m; Residential population estimates for target year 2015

    GHS settlement model layer
    (GHS-SMOD)

    Resolution 1km: The GHS Settlement Model grid delineates and classify settlement typologies via a logic of population size, population and built-up area densities

    GHS-BUILT

    Resolution 10m; Built-up grid derived from Sentinel-2 global image composite for reference year 2018

    ENACT 2011 Population Grid

    (ENACT-POP R2020A)

    Resolution 1km; The ENACT is a population density for the European Union that take into account major daily and monthly population variations

    JRC Open Power Plants Database (JRC-PPDB-OPEN)

    Europe's open power plant database

    GHS functional urban areas
    (GHS-FUA R2019A)

    Resolution 1km; City and its commuting zone (area of influence of the city in terms of labour market flows)

    GHS Urban Centre Database
    (GHS-UCDB R2019A)

    Resolution 1km; Urban Centres defined by specific cut-off values on resident population and built-up surface

    Additional Data:

    Open Street Map (OSM)

    BF, Transportation Network, Utilities Network, Places of Interest

    CEMS

    Data from Rapid Mapping activations in Europe

    GeoNames

    Populated places, Adm. units, Hydrography, Forests, Hills/Mountains, Parks, etc.

    Global Administrative Areas

    Administrative areas of all countries, at all levels of sub-division

    NUTS3 Population Age/Sex Group

    Eurostat population by age and sex statistics interescted with the NUTS3 Units

    FLOPROS

    A global database of FLOod PROtection Standards, which comprises information in the form of the flood return period associated with protection measures, at different spatial scales

    Disclaimer:

    ECFAS partners provide the data "as is" and "as available" without warranty of any kind. The ECFAS partners shall not be held liable resulting from the use of the information and data provided.

    This project has received funding from the Horizon 2020 research and innovation programme under grant agreement No. 101004211

  8. The Items Dataset

    • zenodo.org
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Egan; Patrick Egan (2024). The Items Dataset [Dataset]. http://doi.org/10.5281/zenodo.10964134
    Explore at:
    Dataset updated
    Nov 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Egan; Patrick Egan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset originally created 03/01/2019 UPDATE: Packaged on 04/18/2019 UPDATE: Edited README on 04/18/2019

    I. About this Data Set This data set is a snapshot of work that is ongoing as a collaboration between Kluge Fellow in Digital Studies, Patrick Egan and an intern at the Library of Congress in the American Folklife Center. It contains a combination of metadata from various collections that contain audio recordings of Irish traditional music. The development of this dataset is iterative, and it integrates visualizations that follow the key principles of trust and approachability. The project, entitled, “Connections In Sound” invites you to use and re-use this data.

    The text available in the Items dataset is generated from multiple collections of audio material that were discovered at the American Folklife Center. Each instance of a performance was listed and “sets” or medleys of tunes or songs were split into distinct instances in order to allow machines to read each title separately (whilst still noting that they were part of a group of tunes). The work of the intern was then reviewed before publication, and cross-referenced with the tune index at www.irishtune.info. The Items dataset consists of just over 1000 rows, with new data being added daily in a separate file.

    The collections dataset contains at least 37 rows of collections that were located by a reference librarian at the American Folklife Center. This search was complemented by searches of the collections by the scholar both on the internet at https://catalog.loc.gov and by using card catalogs.

    Updates to these datasets will be announced and published as the project progresses.

    II. What’s included? This data set includes:

    • The Items Dataset – a .CSV containing Media Note, OriginalFormat, On Website, Collection Ref, Missing In Duplication, Collection, Outside Link, Performer, Solo/multiple, Sub-item, type of tune, Tune, Position, Location, State, Date, Notes/Composer, Potential Linked Data, Instrument, Additional Notes, Tune Cleanup. This .CSV is the direct export of the Items Google Spreadsheet

    III. How Was It Created? These data were created by a Kluge Fellow in Digital Studies and an intern on this program over the course of three months. By listening, transcribing, reviewing, and tagging audio recordings, these scholars improve access and connect sounds in the American Folklife Collections by focusing on Irish traditional music. Once transcribed and tagged, information in these datasets is reviewed before publication.

    IV. Data Set Field Descriptions

    IV

    a) Collections dataset field descriptions

    • ItemId – this is the identifier for the collection that was found at the AFC
    • Viewed – if the collection has been viewed, or accessed in any way by the researchers.
    • On LOC – whether or not there are audio recordings of this collection available on the Library of Congress website.
    • On Other Website – if any of the recordings in this collection are available elsewhere on the internet
    • Original Format – the format that was used during the creation of the recordings that were found within each collection
    • Search – this indicates the type of search that was performed in order that resulted in locating recordings and collections within the AFC
    • Collection – the official title for the collection as noted on the Library of Congress website
    • State – The primary state where recordings from the collection were located
    • Other States – The secondary states where recordings from the collection were located
    • Era / Date – The decade or year associated with each collection
    • Call Number – This is the official reference number that is used to locate the collections, both in the urls used on the Library website, and in the reference search for catalog cards (catalog cards can be searched at this address: https://memory.loc.gov/diglib/ihas/html/afccards/afccards-home.html)
    • Finding Aid Online? – Whether or not a finding aid is available for this collection on the internet

    b) Items dataset field descriptions

    • id – the specific identification of the instance of a tune, song or dance within the dataset
    • Media Note – Any information that is included with the original format, such as identification, name of physical item, additional metadata written on the physical item
    • Original Format – The physical format that was used when recording each specific performance. Note: this field is used in order to calculate the number of physical items that were created in each collection such as 32 wax cylinders.
    • On Webste? – Whether or not each instance of a performance is available on the Library of Congress website
    • Collection Ref – The official reference number of the collection
    • Missing In Duplication – This column marks if parts of some recordings had been made available on other websites, but not all of the recordings were included in duplication (see recordings from Philadelphia Céilí Group on Villanova University website)
    • Collection – The official title of the collection given by the American Folklife Center
    • Outside Link – If recordings are available on other websites externally
    • Performer – The name of the contributor(s)
    • Solo/multiple – This field is used to calculate the amount of solo performers vs group performers in each collection
    • Sub-item – In some cases, physical recordings contained extra details, the sub-item column was used to denote these details
    • Type of item – This column describes each individual item type, as noted by performers and collectors
    • Item – The item title, as noted by performers and collectors. If an item was not described, it was entered as “unidentified”
    • Position – The position on the recording (in some cases during playback, audio cassette player counter markers were used)
    • Location – Local address of the recording
    • State – The state where the recording was made
    • Date – The date that the recording was made
    • Notes/Composer – The stated composer or source of the item recorded
    • Potential Linked Data – If items may be linked to other recordings or data, this column was used to provide examples of potential relationships between them
    • Instrument – The instrument(s) that was used during the performance
    • Additional Notes – Notes about the process of capturing, transcribing and tagging recordings (for researcher and intern collaboration purposes)
    • Tune Cleanup – This column was used to tidy each item so that it could be read by machines, but also so that spelling mistakes from the Item column could be corrected, and as an aid to preserving iterations of the editing process

    V. Rights statement The text in this data set was created by the researcher and intern and can be used in many different ways under creative commons with attribution. All contributions to Connections In Sound are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creators of these datasets.

    VI. Creator and Contributor Information

    Creator: Connections In Sound

    Contributors: Library of Congress Labs

    VII. Contact Information Please direct all questions and comments to Patrick Egan via www.twitter.com/drpatrickegan or via his website at www.patrickegan.org. You can also get in touch with the Library of Congress Labs team via LC-Labs@loc.gov.

  9. R

    Basketball Dataset

    • universe.roboflow.com
    zip
    Updated May 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zaki (2022). Basketball Dataset [Dataset]. https://universe.roboflow.com/zaki-b86c6/basketball-jagmz/dataset/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 25, 2022
    Dataset authored and provided by
    zaki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Hit Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Sports Analysis: Coaches and analysts can use this computer vision model to track the performance of players during a game or practice session. They can get insights about precise ball movements, successful hits, and goal rates, leading to better training and strategic decisions.

    2. Highlight Generation: Sports media companies can implement the "basketball" model to automatically detect exciting moments like successful goals or impressive hits during a game. This can enable them to create instant highlights for social media, web portals, or live broadcasts, enhancing user engagement.

    3. Virtual Coaching: This model can be integrated into mobile applications or websites that offer virtual basketball coaching. Users would be able to upload their videos, and the model would provide them with feedback based on their technique, ball handling, and shooting accuracy.

    4. Smart Camera Systems: The "basketball" model can be embedded in smart cameras for sports facilities or courts. This would allow the cameras to follow the action as it happens, automatically zooming in on goals or exciting plays, thus enhancing the overall viewing experience for spectators.

    5. Basketball Simulation Games: Game developers can utilize the model's capability to recognize various aspects of a basketball game to create more realistic and engaging basketball simulation games. The AI-driven virtual players would exhibit authentic in-game actions and responses, providing a closer-to-real gaming experience to the users.

  10. h

    spotify-million-song-dataset

    • huggingface.co
    Updated Jun 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishnu Priya VR (2024). spotify-million-song-dataset [Dataset]. https://huggingface.co/datasets/vishnupriyavr/spotify-million-song-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2024
    Authors
    Vishnu Priya VR
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for Spotify Million Song Dataset

      Dataset Summary
    

    This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    [More Information Needed]

      Data… See the full description on the dataset page: https://huggingface.co/datasets/vishnupriyavr/spotify-million-song-dataset.
    
  11. z

    UMD-350MB: Refined MIDI Dataset for Symbolic Music Generation

    • zenodo.org
    • data.niaid.nih.gov
    Updated Dec 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patchbanks (2024). UMD-350MB: Refined MIDI Dataset for Symbolic Music Generation [Dataset]. http://doi.org/10.5281/zenodo.13126590
    Explore at:
    Dataset updated
    Dec 27, 2024
    Dataset provided by
    Patchbanks
    Description

    UMD-350MB

    The Universal MIDI Dataset 350MB (UMD-350MB) is a proprietary collection of 85,618 MIDI files curated for research and development within our organization. This collection is a subset sampled from a larger dataset developed for pretraining symbolic music models.

    The field of symbolic music generation is constrained by limited data compared to language models. Publicly available datasets, such as the Lakh MIDI Dataset, offer large collections of MIDI files sourced from the web. While the sheer volume of musical data might appear beneficial, the actual amount of valuable data is less than anticipated, as many songs contain less desirable melodies with erratic and repetitive events.

    The UMD-350MB employs an attention-based approach to achieve more desirable output generations by focusing on human-reviewed training examples of single-track melodies, chord progressions, leads and arpeggios with an average duration of 8 bars. This was achieved by refining the dataset over 24 months, ensuring consistent quality and tempo alignment. Moreover, the dataset is normalized by setting the timing information to 120 BPM with a tick resolution (PPQ) of 96 and transposing the musical scales to C major and A minor (natural scales).

    Melody Styles

    A major portion of the dataset is composed of newly produced private data to represent modern musical styles.

    • Pop: 1970s to 2020s Pop music
    • EDM: Trance, House, Synthwave, Dance, Arcade
    • Jazz: Bebop, Ballad, Latin-Jazz, Bossa-Jazz, Ragtime
    • Soul: 80s Classic, Neo-Soul, Latin-Soul
    • Urban: Pop, Hip-Hop, Trap, R&B, Afrobeat
    • World: Latin, Bossa Nova, European
    • Other: Film, Cinematic, Game music and piano references

    Actual MIDI files are unlabeled for unsupervised training.

    Dataset Access

    Please note that this is a closed-source dataset with very limited access. Considerations for access include proposals for data augmentation, chord extraction and other enhancement methods, whether through scripts, algorithmic techniques, manual editing in a DAW or additional processing methods.

    For inquiries about this dataset, please email us.

  12. a

    Million Song Dataset Subset

    • academictorrents.com
    bittorrent
    Updated Oct 12, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thierry Bertin-Mahieux and Daniel P.W. Ellis and Brian Whitman and Paul Lamere (2015). Million Song Dataset Subset [Dataset]. https://academictorrents.com/details/e0b6b5ff012fcda7c4a14e4991d8848a6a2bf52b
    Explore at:
    bittorrent(1994614463)Available download formats
    Dataset updated
    Oct 12, 2015
    Dataset authored and provided by
    Thierry Bertin-Mahieux and Daniel P.W. Ellis and Brian Whitman and Paul Lamere
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    To let you get a feel for the dataset without committing to a full download, we also provide a subset consisting of 10,000 songs (1%, 1.8 gb) selected at random. It contains "additional files" (SQLite databases) in the same format as those for the full set, but referring only to the 10K song subset. Therefore, you can develop code on the subset, then port it to the full dataset. The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are: To encourage research on algorithms that scale to commercial sizes To provide a reference dataset for evaluating research As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest s) To help new researchers get started in the MIR field The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, howeve

  13. Data from: Da-TACOS: A Dataset for Cover Song Identification and...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Apr 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Yesiler; Furkan Yesiler; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra (2021). Da-TACOS: A Dataset for Cover Song Identification and Understanding [Dataset]. http://doi.org/10.5281/zenodo.4717628
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 27, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Furkan Yesiler; Furkan Yesiler; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra; Chris Tralie; Albin Correya; Diego F. Silva; Philip Tovstogan; Emilia Gómez; Xavier Serra
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We present Da-TACOS: a dataset for cover song identification and understanding. It contains two subsets, namely the benchmark subset (for benchmarking cover song identification systems) and the cover analysis subset (for analyzing the links among cover songs), with pre-extracted features and metadata for 15,000 and 10,000 songs, respectively. The annotations included in the metadata are obtained with the API of SecondHandSongs.com. All audio files we use to extract features are encoded in MP3 format and their sample rate is 44.1 kHz. Da-TACOS does not contain any audio files. For the results of our analyses on modifiable musical characteristics using the cover analysis subset and our initial benchmarking of 7 state-of-the-art cover song identification algorithms on the benchmark subset, you can look at our publication.

    For organizing the data, we use the structure of SecondHandSongs where each song is called a ‘performance’, and each clique (cover group) is called a ‘work’. Based on this, the file names of the songs are their unique performance IDs (PID, e.g. P_22), and their labels with respect to their cliques are their work IDs (WID, e.g. W_14).

    Metadata for each song includes

    • performance title,
    • performance artist,
    • work title,
    • work artist,
    • release year,
    • SecondHandSongs.com performance ID,
    • SecondHandSongs.com work ID,
    • whether the song is instrumental or not.

    In addition, we matched the original metadata with MusicBrainz to obtain MusicBrainz ID (MBID), song length and genre/style tags. We would like to note that MusicBrainz related information is not available for all the songs in Da-TACOS, and since we used just our metadata for matching, we include all possible MBIDs for a particular songs.

    For facilitating reproducibility in cover song identification (CSI) research, we propose a framework for feature extraction and benchmarking in our supplementary repository: acoss. The feature extraction component is designed to help CSI researchers to find the most commonly used features for CSI in a single address. The parameter values we used to extract the features in Da-TACOS are shared in the same repository. Moreover, the benchmarking component includes our implementations of 7 state-of-the-art CSI systems. We provide the performance results of an initial benchmarking of those 7 systems on the benchmark subset of Da-TACOS. We encourage other CSI researchers to contribute to acoss with implementing their favorite feature extraction algorithms and their CSI systems to build up a knowledge base where CSI research can reach larger audiences.

    The instructions for how to download and use the dataset are shared below. Please contact us if you have any questions or requests.

    1. Structure

    1.1. Metadata

    We provide two metadata files that contain information about the benchmark subset and the cover analysis subset. Both metadata files are stored as python dictionaries in .json format, and have the same hierarchical structure.

    An example to load the metadata files in python:

    import json
    
    with open('./da-tacos_metadata/da-tacos_benchmark_subset_metadata.json') as f:
      benchmark_metadata = json.load(f)
    

    The python dictionary obtained with the code above will have the respective WIDs as keys. Each key will provide the song dictionaries that contain the metadata regarding the songs that belong to their WIDs. An example can be seen below:

    "W_163992": { # work id
      "P_547131": { # performance id of the first song belonging to the clique 'W_163992'
        "work_title": "Trade Winds, Trade Winds",
        "work_artist": "Aki Aleong",
        "perf_title": "Trade Winds, Trade Winds",
        "perf_artist": "Aki Aleong",
        "release_year": "1961",
        "work_id": "W_163992",
        "perf_id": "P_547131",
        "instrumental": "No",
        "perf_artist_mbid": "9bfa011f-8331-4c9a-b49b-d05bc7916605",
        "mb_performances": {
          "4ce274b3-0979-4b39-b8a3-5ae1de388c4a": {
            "length": "175000"
          },
          "7c10ba3b-6f1d-41ab-8b20-14b2567d384a": {
            "length": "177653"
          }
        }
      },
      "P_547140": { # performance id of the second song belonging to the clique 'W_163992'
        "work_title": "Trade Winds, Trade Winds",
        "work_artist": "Aki Aleong",
        "perf_title": "Trade Winds, Trade Winds",
        "perf_artist": "Dodie Stevens",
        "release_year": "1961",
        "work_id": "W_163992",
        "perf_id": "P_547140",
        "instrumental": "No"
      }
    }
    

    1.2. Pre-extracted features

    The list of features included in Da-TACOS can be seen below. All the features are extracted with acoss repository that uses open-source feature extraction libraries such as Essentia, LibROSA, and Madmom.

    To facilitate the use of the dataset, we provide two options regarding the file structure.

    1- In da-tacos_benchmark_subset_single_files and da-tacos_coveranalysis_subset_single_files folders, we organize the data based on their respective cliques, and one file contains all the features for that particular song.

    {
      "chroma_cens": numpy.ndarray,
      "crema": numpy.ndarray,
      "hpcp": numpy.ndarray,
      "key_extractor": {
        "key": numpy.str_,
        "scale": numpy.str_,_
        "strength": numpy.float64
      },
      "madmom_features": {
        "novfn": numpy.ndarray, 
        "onsets": numpy.ndarray,
        "snovfn": numpy.ndarray,
        "tempos": numpy.ndarray
      }
      "mfcc_htk": numpy.ndarray,
      "tags": list of (numpy.str_, numpy.str_)
      "label": numpy.str_,
      "track_id": numpy.str_
    }
    
    
    

    2- In da-tacos_benchmark_subset_FEATURE and da-tacos_coveranalysis_subset_FEATURE folders, the data is organized based on their cliques as well, but each of these folders contain only one feature per song. For instance, if you want to test your system that uses HPCP features, you can download da-tacos_benchmark_subset_hpcp to access the pre-computed HPCP features. An example for the contents in those files can be seen below:

    {
      "hpcp": numpy.ndarray,
      "label": numpy.str_,
      "track_id": numpy.str_
    }
    
    

    2. Using the dataset

    2.1. Requirements

    • Python 3.6+
    • Create virtual environment and install requirements
      git clone https://github.com/MTG/da-tacos.git
      cd da-tacos
      python3 -m venv venv
      source venv/bin/activate
      pip install -r requirements.txt
      

    2.2. Downloading the data

    The dataset is currently stored in only in Google Drive (it will be uploaded to Zenodo soon), and can be downloaded from this link. We also provide a python script that automatically downloads the folders you specify. Basic usage of this script can be seen below:

    python download_da-tacos.py -h
    

    usage: download_da-tacos.py [-h]
                  [--dataset {benchmark,coveranalysis,da-tacos}]
    [--type {single_files,cens,crema,hpcp,key,madmom,mfcc,tags}]
    [--source {gdrive,zenodo}]
    [--outputdir OUTPUTDIR] [--unpack] [--remove]

    Download script for Da-TACOS

    optional arguments:
    -h, --help show this help message and exit
    --dataset {metadata,benchmark,coveranalysis,da-tacos}
    which subset to download. 'da-tacos' option downloads both subsets. the options other than 'metadata' will download the metadata as well. (default: metadata)
    --type {single_files,cens,crema,hpcp,key,madmom,mfcc,tags} [{single_files,cens,crema,hpcp,key,madmom,mfcc,tags} ...]
    which folder to download. for downloading multiple folders, you can enter multiple arguments (e.g. '-- type cens crema'). for detailed explanation, please check https://mtg.github.io/da-tacos/ (default: single_files)
    --source {gdrive,zenodo} from which source to download the files. you can either download from Google Drive (gdrive) or from Zenodo (zenodo) (default: gdrive)
    --outputdir OUTPUTDIR
    directory to store the dataset (default: ./)

  14. MTG-QBH: Query By Humming dataset

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Salamon; J. Salamon; J. Serrà; J. Serrà; E. Gómez; E. Gómez (2020). MTG-QBH: Query By Humming dataset [Dataset]. http://doi.org/10.5281/zenodo.1290712
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    J. Salamon; J. Salamon; J. Serrà; J. Serrà; E. Gómez; E. Gómez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes 118 recordings of sung melodies. The recordings were made as part of the experiments on Query-by-Humming (QBH) reported in the following article:

    J. Salamon, J. Serrà and E. Gómez, "Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming", International Journal of Multimedia Information Retrieval, special issue on Hybrid Music Information Retrieval, In Press (accepted Nov. 2012).

    The recordings were made by 17 different subjects, 9 female and 8 male, whose musical experience ranged from none at all to amateur musicians. Subjects were presented with a list of songs out of which they were asked to select the ones they knew and sing part of the melody. The subjects were aware that the recordings will be used as queries in an experiment on QBH. There was no restriction as to how much of the melody should be sung nor which part of the melody should be sung, and the subjects were allowed to sing the melody with or without lyrics. The subjects did not listen to the original songs before recording the queries, and the recordings were all sung a capella without any accompaniment nor reference tone. To simulate a realistic QBH scenario, all recordings were done using a basic laptop microphone and no post-processing was applied. The duration of the recordings ranges from 11 to 98 seconds, with an average recording length of 26.8 seconds.

    In addition to the query recordings, three meta-data files are included, one describing the queries and two describing the music collections against which the queries were tested in the experiments described in the aforementioned article. Whilst the query recordings are included in this dataset, audio files for the music collections listed in the meta-data files are NOT included in this dataset, as they are protected by copyright law. If you wish to reproduce the experiments reported in the aforementioned paper, it is up to you to obtain the original audio files of these songs.

    All subjects have given their explicit approval for this dataset to be made public.

    Please Acknowledge MTG-QBH in Academic Research

    Using this dataset

    When the MTG-QBH dataset is used for academic research, we would highly appreciate if scientific publications of works partly based on the MTG-QBH dataset cite the above publication.

    We are interested in knowing if you find our datasets useful! If you use our dataset please email us at mtg-info@upf.edu and tell us about your research.

    https://www.upf.edu/web/mtg/mtg-qbh

  15. e

    PanDDA-analyzed data of a crystallographic fragment screening of...

    • b2find.eudat.eu
    Updated Jul 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PanDDA-analyzed data of a crystallographic fragment screening of F2X-Universal Library vs. AR - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/1643327e-c863-50c3-9a95-401f89073e68
    Explore at:
    Dataset updated
    Jul 20, 2025
    Description

    A crystallographic fragment screening (CFS) has been performed on a spliceosomal yeast protein-protein complex of Aar2 and the RNaseH-like domain of Prp8 (AR). The F2X-Universal Library is a fragment library representing the commercially available chemical space of fragments. 917 fragments have been individually screened via crystal soaking. The datasets that could be successfully auto-processed and auto-refined had been subjected to a Pan-Dataset Density Analysis (PanDDA) (Pearce et al., 2017). This analysis allows to find low occupancy binders. The data has been analyzed in different ways; once all datasets were given as input to PanDDA and once the data was clustered via cluster4x (Ginn, 2020) and the individual clusters were given as input for PanDDA. After the analysis, in total 269 hits could be identified in the PanDDA event maps. Most fragments cluster in certain regions on the protein surface, which are termed binding sites. Ten binding sites were identified. Some of these binding sites overlap with known protein-protein interactions sites, while others have no known function. These novel binding sites could be potential interaction sites too. Furthermore, due to the repeated binding of individual fragment hits in the same binding site, certain structural overlaps could be observed of fragments. These provide the confirmation of binding modes which offers additional information for further compound development. The data provided here includes all folders of the different PanDDA-runs in individual tar.gz files. For each run the input (auto-refined structures, fragment structure files) and output data (pandda models and event and Z-maps) is given. Additionally, a directory overview as a PDF is provided to help navigate through the data. With this data, every identified fragment hit can be inspected individually.

  16. o

    Spotify Million Playlist: Recsys Challenge 2018 Dataset

    • explore.openaire.eu
    Updated Sep 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AIcrowd (2018). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. http://doi.org/10.5281/zenodo.6425593
    Explore at:
    Dataset updated
    Sep 27, 2018
    Authors
    AIcrowd
    Description

    Spotify Million Playlist Dataset Challenge Summary The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights). Background Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes. Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits. But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special. The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists? By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love. Dataset To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology. In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page. Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset. Dataset Contains 1000 examples of each scenario: Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks Download Link Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files {"references": ["C.W. Chen, P. Lamere, M. Schedl, and H. Zamani. Recsys Challenge 2018: Automatic Music Playlist Continuation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18), 2018."]}

  17. Dataset of psychophysiological data from children with learning difficulties...

    • openneuro.org
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    César E. Corona-González; Claudia Rebeca De Stefano-Ramos; Juan Pablo Rosado-Aíza; David I. Ibarra-Zarate; Fabiola R. Gómez-Velázquez; Luz María Alonso-Valerdi (2025). Dataset of psychophysiological data from children with learning difficulties who strengthen reading and math skills through assistive technology [Dataset]. http://doi.org/10.18112/openneuro.ds006260.v1.0.1
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    César E. Corona-González; Claudia Rebeca De Stefano-Ramos; Juan Pablo Rosado-Aíza; David I. Ibarra-Zarate; Fabiola R. Gómez-Velázquez; Luz María Alonso-Valerdi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    README

    Authors

    César E. Corona-González, Claudia Rebeca De Stefano-Ramos, Juan Pablo Rosado-Aíza, Fabiola R Gómez-Velázquez, David I. Ibarra-Zarate, Luz María Alonso-Valerdi

    Contact person

    César E. Corona-González

    https://orcid.org/0000-0002-7680-2953

    a00833959@tec.mx

    Project name

    Psychophysiological data from Mexican children with learning difficulties who strengthen reading and math skills by assistive technology

    Year that the project ran

    2023

    Brief overview of the tasks in the experiment

    The current dataset consists of psychometric and electrophysiological data from children with reading or math learning difficulties. These data were collected to evaluate improvements in reading or math skills resulting from using an online learning method called Smartick.

    The psychometric evaluations from children with reading difficulties encompassed: spelling tests, where 1) orthographic and 2) phonological errors were considered, 3) reading speed, expressed in words read per minute, and 4) reading comprehension, where multiple-choice questions were given to the children. The last 2 parameters were determined according to the standards from the Ministry of Public Education (Secretaría de Educación Pública in Spanish) in Mexico. On the other hand, group 2 assessments embraced: 1) an assessment of general mathematical knowledge, as well as 2) the hits percentage, and 3) reaction time from an arithmetical task. Additionally, selective attention and intelligence quotient (IQ) were also evaluated.

    Then, individuals underwent an EEG experimental paradigm where two conditions were recorded: 1) a 3-minute eyes-open resting state and 2) performing either reading or mathematical activities. EEG recordings from the reading experiment consisted of reading a text aloud and then answering questions about the text. Alternatively, EEG recordings from the math experiment involved the solution of two blocks with 20 arithmetic operations (addition and subtraction). Subsequently, each child was randomly subcategorized as 1) the experimental group, who were asked to engage with Smartick for three months, and 2) the control group, who were not involved with the intervention. Once the 3-month period was over, every child was reassessed as described before.

    Description of the contents of the dataset

    The dataset contains a total of 76 subjects (sub-), where two study groups were assessed: 1) reading difficulties (R) and 2) math difficulties (M). Then, each individual was subcategorized as experimental subgroup (e), where children were compromised to engage with Smartick, or control subgroup (c), where they did not get involved with any intervention.

    Every subject was followed up on for three months. During this period, each subject underwent two EEG sessions, representing the PRE-intervention (ses-1) and the POST-intervention (ses-2).

    The EEG recordings from the reading difficulties group consisted of a resting state condition (run-1) and while performing active reading and reading comprehension activities (run-2). On the other hand, EEG data from the math difficulties group was collected from a resting state condition (run-1) and when solving two blocks of 20 arithmetic operations (run-2 and run-3). All EEG files were stored in .set format. The nomenclature and description from filenames are shown below:

    NomenclatureDescription
    sub-Subject
    MMath group
    RReading group
    cControl subgroup
    eExperimental subgroup
    ses-1PRE-intervention
    ses-2POST-Intervention
    run-1EEG for baseline
    run-2EEG for reading activity, or the first block of math
    run-3EEG for the second block of math

    Example: the file sub-Rc11_ses-1_task-SmartickDataset_run-2_eeg.set is related to: - The 11th subject from the reading difficulties group, control subgroup (sub-Rc11). - EEG recording from the PRE-intervention (ses-1) while performing the reading activity (run-2)

    Independent variables

    • Study groups:
      • Reading difficulties
        • Control: children did not follow any intervention
        • Experimental: Children used the reading program of Smartick for 3 months
      • Math difficulties
        • Control: children did not follow any intervention
        • Experimental: Children used the math program of Smartick for 3 months
    • Condition:
      • PRE-intervention: first psychological and electroencephalographic evaluation
      • POST-intervention: second psychological and electroencephalographic evaluation

    Dependent variables

    • Psychometric data from the reading difficulties group:

      • Orthographic_ERR: number of orthographic errors.
      • Phonological_ERR: number of phonological errors.
      • Selective_Attention: score from the selective attention test.
      • Reading_Speed: reading speed in words per minute.
      • Comprehension: score on a reading comprehension task.
      • GROUP: C for the control group, E for the experimental group.
      • GENDER: M for male, F for Female.
      • AGE: age at the beginning of the study.
      • IQ: intelligence quotient.
    • Psychometric data from the math difficulties group:

      • WRAT4: score from the WRAT-4 test.
      • hits: hits during the EEG acquisition [%].
      • RT: reaction time during the EEG acquisition [s].
      • Selective_Attention: score from the selective attention test.
      • GROUP: C for the control Group, E for the experimental group.
      • GENDER: M for male, F for female.
      • AGE: age at the beginning of the study.
      • IQ: intelligence quotient.

    Psychometric data can be found in the 01_Psychometric_Data.xlsx file

    • Engagement percentage within Smartick (only for experimental group)
      • These values represent the engagement percentage through Smartick.
      • Students were asked to get involved with the online method for learning for 3 months, 5 days a week.
      • Greater values than 100% denote participants who regularly logged in more than 5 days weekly.

    Engagement percentage be found in the 05_SessionEngagement.xlsx file

    Methods

    Subjects

    Seventy-six Mexican children between 7 and 13 years old were enrolled in this study.

    Information about the recruitment procedure

    The sample was recruited through non-profit foundations that support learning and foster care programs.

    Apparatus

    g.USBamp RESEARCH amplifier

    Initial setup

    1. Explain the task to the participant.
    2. Sign informed consent.
    3. Set up electrodes.

    Task details

    The stimuli nested folder contains all stimuli employed in the EEG experiments.

    Level 1 - Math: Images used in the math experiment.​​​​​​​ - Reading: Images used in the reading experiment.

    Level 2 - Math * POST_Operations: arithmetic operations from the POST-intervention.
    * PRE_Operations: arithmetic operations from the PRE-intervention. - Reading * POST_Reading1: text 1 and text-related comprehension questions from the POST-intervention. * POST_Reading2: text 2 and text-related comprehension questions from the POST-intervention. * POST_Reading3: text 3 and text-related comprehension questions from the POST-intervention. * PRE_Reading1: text 1 and text-related comprehension questions from the PRE-intervention. * PRE_Reading2: text 2 and text-related comprehension questions from the PRE-intervention. * PRE_Reading3: text 3 and text-related comprehension questions from the PRE-intervention.

    Level 3 - Math * Operation01.jpg to Operation20.jpg: arithmetical operations solved during the first block of the math

  18. R

    Replication data for GPUZIP v2.0: accelerating checkpointing on GPUs with...

    • redu.unicamp.br
    bin, gif, png +3
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados de Pesquisa da Unicamp (2024). Replication data for GPUZIP v2.0: accelerating checkpointing on GPUs with prefetching and compression [Dataset]. http://doi.org/10.25824/redu/KJ9KVA
    Explore at:
    bin(213899), bin(356542180), bin(1145477), tsv(15433), png(159777), tsv(349), bin(1779351), bin(78), bin(85), bin(356894691), bin(5003), png(62210), bin(4688), bin(357331271), bin(1770062), bin(217482), png(318107), bin(204533445), png(94099), bin(2586151), bin(68715574), png(137684), bin(2461933), png(147874), png(163749), bin(66802419), bin(2768563), png(230336), png(72659), png(230164), png(137007), png(10375), bin(2449608), gif(172702), bin(383289), bin(1193096), png(62393), bin(2363), bin(1313504), png(76977), bin(434671), png(86925), bin(54), gif(236467), png(110688), png(163703), png(125764), bin(6644), bin(1965849), png(113074), png(137879), png(121685), png(108341), bin(323), bin(1187130085), png(53246), png(110697), bin(2463786), png(102206), bin(2824728), png(113090), png(47524), bin(2546459), txt(47), bin(1121866), png(84423), bin(106105093), png(257465), bin(122831), bin(2259005), bin(343437336), png(201105), png(91622), png(110712), txt(3182), bin(4310), png(81317), txt(3157), bin(364357966), bin(2664), png(160109), bin(346523715), png(103002), png(88170), png(108462), bin(2317440), bin(998364), bin(59), png(224748), bin(491538), png(103052), bin(2445213), png(102997), png(57091), bin(719824), bin(5082861), bin(1283212), bin(6454230), bin(2320408), png(81544), bin(361138029), txt(2214), text/x-python(10727), png(147848), png(102949), bin(83896651), bin(2434187), bin(96702), png(137044), png(148285), png(128060), png(76801), tsv(22638), png(88260), png(133633), bin(322391), bin(222), gif(478790), png(81302), bin(120712), bin(66038649), bin(1647859), bin(357091000), png(160299), png(164412), png(17264), bin(5286), png(74312), bin(1251797), png(74266), bin(462518), png(233561), bin(1017628), bin(459735), png(28323), txt(2006), bin(357333963), png(92183), bin(2379181), bin(356544408), png(148578), png(69636), bin(950994), png(99582), png(76971), png(137696), bin(341518), png(74245), png(77813), png(89782), bin(6407704), png(58416), bin(4521120), bin(117021545), png(201181), bin(131685578), bin(1325426), png(163688), bin(3241830), bin(116367109), png(97453), bin(1215613), png(224640), png(120951), tsv(2381), bin(356543280), txt(2048), bin(6407710), bin(516483), png(16497), bin(4995050), bin(3256245), png(122727), png(92464), bin(6454198), bin(2520792), bin(16472087), bin(180432), bin(306753246), bin(356967929), bin(240), png(97527), bin(2839548), bin(66), png(74250), png(148294), bin(499558), bin(235), bin(1453796953), bin(1240596), txt(2136), png(62406), bin(61), png(159772), png(148043), bin(2848781), bin(4599913), png(86674), bin(4276493), bin(667428828), png(137040), txt(5578), txt(1674), bin(359), png(136627), png(89675), bin(2574596), png(255936), bin(250), text/x-python(2442), png(160300), text/x-python(3612), png(75986), bin(2574602), bin(5296), bin(21301126), bin(2520799), png(164092), png(139858), bin(251), txt(20), png(164103), png(88184), png(86800), txt(2122), bin(101627796), png(89848), txt(568), png(92238), png(121553), png(138895), bin(52544932), bin(357334804), bin(2731), png(159780), bin(965131), txt(2647), png(245529), bin(1316718), png(57444), png(89827), png(110796), bin(122915), bin(4359859), png(125926), png(74755), bin(156565074), bin(1001929), png(72186), png(65408), bin(1304424), bin(1513968), png(109409), png(91397), bin(122738), bin(357623484), bin(4977), png(137683), bin(1014130), png(92213), png(225172), bin(357334301), txt(1189), png(53428), bin(384)Available download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Repositório de Dados de Pesquisa da Unicamp
    Dataset funded by
    Fundação de Amparo à Pesquisa do Estado de São Paulo
    Petróleo Brasileiro
    Description

    GPUZIP v2.0 Reproducibility Dataset This dataset provides all the necessary materials to reproduce the results presented in the GPUZIP v2.0 article. It is organized into folders, each containing a README.md.txt file that describes its contents and explains how to interpret the files. Note:This dataset is organized as a directory structure, so for better visualization change the "View type" to "Tree" before explore the dataset through this web application. Types of Files The repository contains the following file types: .md.txt: Markdown-formatted README files. For optimal readability, use a Markdown viewer such as VSCode or Learn More, however, as a straightforward approach any text reader (e.g., Notepad, cat, vi, nano) can also read them. .*.zipfile: Compressed file (usually called .zip). Files with the .extension.zipfile format (e.g., large-mod.su.zipfile) should be unzipped to access their original format (e.g., large-mod.su). Throughout the documentation, files are always referenced by their uncompressed extensions (e.g., .su). To ensure consistency and avoid confusion, it is recommended that all .zipfile files be unzipped before exploring the repository. Hint: Please see the scripts below for unzipping all files. .xlsx: Excel files. Compatible with LibreOffice, Google Sheets, and Numbers. .par: Configuration files for proprietary RTM runs. Readable with any text editor. .hdr: Header files for velocity models. Refer to Datasets/HowToReadDatasetFiles.md.txt for details. .bin: Raw binary data files containing velocity models in float format. See Datasets/HowToReadDatasetFiles.md.txt for parsing instructions. .data: Binary data files, similar to .bin. .su: Seismic Unix files containing seismic traces. Refer to Datasets/HowToReadDatasetFiles.md.txt for details. .png, .jpg, .jpeg, .gif: Rendered visuals of velocity models or diagrams. .qdrep: Nsight Systems profiling files. Compatible with Nsight Systems 2024.01.1. Root Directory Contents Datasets/ Contains input datasets, including velocity models, seismic traces, and configurations. Detailed information is provided in Datasets/HowToReadDatasetFiles.md.txt. DataWarmUp/ Holds results from compressor calibration experiments, including raw data, logs, and the compiled .xlsx summaries. Experiments were conducted with two shots. See DataWarmUp/README.md.txt for more information. GeometryScript/ Utility script for rendering shot distributions in the datasets. Helpful in visualizing experiment setups. NSight/ This folder contains a subset of Nsight profiling files for the Marmousi3D dataset, covering all compressors and a cache size of two across all checkpointing algorithms. If needed, contact the authors for additional profiling data. Quality/ Contains the results for all shots for quality assessment (Section 7.6). See Quality/README.md.txt. TimeBreakdown/ Complete results for Section 7.4 of the GPUZIP v2.0 article. This folder includes detailed breakdowns of two-shot experiments. See TimeBreakdown/README.md.txt for details. SpeedupAndMemory.xlsx Comprehensive data used to generate charts in Figure 6 and Table 4 (Sections 7.2 and 7.1) of the article. Extra: Util for Unzipping All Files We provide a simple script to unzip all files so that data exploration can be more fluid. Feel free to use it. Windows (.bat) @echo off setlocal enabledelayedexpansion for /r %%f in (.zipfile) do ( echo Decompressing: %%f powershell -Command "Expand-Archive -Path '%%f' -DestinationPath '%%~dpf' -Force" if not errorlevel 1 ( echo Decompressed successfully: %%f del "%%f" ) else ( echo Failed to decompress: %%f ) ) echo All zip files processed. pause Shell script (MacOS, Linux, Unix) #!/bin/bash find . -type f -name ".zipfile" | while read -r zipfile; do echo "Decompressing: $zipfile" unzip -o "$zipfile" -d "$(dirname "$zipfile")" if [ $? -eq 0 ]; then echo "Successfully decompressed: $zipfile" rm "$zipfile" else echo "Failed to decompress: $zipfile" fi done echo "All zip files processed." How Do I Read .bin, .data, and .su Files? See: Datasets/HowToReadDatasetFiles.md.txt How Do I Read .par and .hdr Files? See: Datasets/HowToReadDatasetFiles.md.txt How to Interpret Log Files? To analyze cache hits, misses, and memory consumption, refer to the logs in the TimeBreakdown folder (decom-*.txt files). Key metrics can be extracted as follows: Cache Hits: Search for RET_HIT. Cache Misses: Search for RET_MIS. Prefetched Items: Search for ===> Prefetching:. Prefetch Action Vector (PAV): Search for PAV:. Memory Consumption: Search for [MEM_TRACK]. Checkpoint Pool Size: Search for Checkpoint Pool Size. Each log file concludes with a summary from Nsight.

  19. mCLOUD Metadatenkatalog

    • processor1.francecentral.cloudapp.azure.com
    • data.europa.eu
    Updated Aug 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Ministry for Digital Affairs and Transport (BMDV) (2022). mCLOUD Metadatenkatalog [Dataset]. http://processor1.francecentral.cloudapp.azure.com/dataset/mcloud-metadatenkatalog
    Explore at:
    Dataset updated
    Aug 8, 2022
    Dataset provided by
    Federal Ministry of Transport and Digital Infrastructurehttp://www.bmvi.de/
    License

    Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
    License information was derived automatically

    Description

    The BMDV open data portal mCLOUD offers a Export interface (REST-API) via the data as RDF according to the DCAT-AP.de specification or can be exported as CSV.

    Export as DCAT-AP.de in RDF/XML:
    Basic path: https://mcloud.de/export/datasets
    Export as CSV:
    Basic path: https://mcloud.de/export/csv/datasets
    Parameters:

    The parameters in the requests are based on the parameters in the portal for a remote search (URL).
    At the end of a hit page in the portal, the export is always offered. So one possibility is to search the portal as normal and then copy the export URL at the end of a page.

    Individual data set
    A single data set can be retrieved by appending the UUID.
    E.g. https://mcloud.de/export/datasets/922e436b- 2f0d-42d7-b3f4-528debab8b87
    This export is directly available in the mCLOUD in the data record as a "link to the metadata".
    Predefined filters:

    All data sets that have been added in the last 24 hours:
    filter=newdatasets
    https://mcloud.de/export/datasets?filter=newdatasets

    All datasets that were changed in the last 24 hours (also includes newly added sets):
    filter=modifieddatasets
    https://mcloud.de/export/datasets?filter=modifieddatasets

    Paging (default):

    pageSize=10 (number of sentences on one page)
    page=1 (display first page)
    https://mcloud.de/export/datasets?page=1&pageSize=10

    Im DCAT-AP.de export always includes navigation information at the beginning:
    itemsPerPage (= pageSize parameter)
    totalItems (total number)
    firstPage (= first page for page parameter)
    lastPage (= last page for page parameters)

    Search term:
    query=vehicle
    https://mcloud.de/export/datasets?query=Vehicle
    Search facet:
    aggs=...
    After that, the facet is specified exactly as in the portal request. Please note the coding:
    format%3ACSV = type of access "CSV"
    categories%3Aroads = category "road"
    format%3ACSV%40%40categories%3Aroads = type of access "CSV" AND category "road"

    Together:
    aggs=format%3ACSV %40%40categories%3Aroads
    https ://mcloud.de/export/datasets?aggs=format%3ACSV%40%40categories%3Aroads

    Here is the search in the portal, you can use this as a guide:
    https://mcloud .de/web/guest/suche/-/results/filter/auto/format%3ACSV%40%40categories%3Aroads/0
    At the end of the page there is also the link (as RDF):
    "https://mcloud.de/export/datasets?page=1&pageSize=1147&sortOrder=desc&sortField=latest&aggs=format%3ACSV%40%40c categories%3Aroads" target="_blank">https://mcloud.de/export/datasets?page=1&pageSize=1147&sortOrder=desc&sortField=latest&aggs=format%3ACSV%40%40categories%3Aroads< /a>
    Sort field:
    No specification sorted by ID of the datasets
    sortField=relevance (relevance)
    sortField=latest (Relevance)
    Sort order:
    sortOrder=asc (ascending, default)
    sortOrder=desc (descending)

  20. One Direction All Songs With Lyrics

    • kaggle.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saksham Nanda (2024). One Direction All Songs With Lyrics [Dataset]. https://www.kaggle.com/datasets/mllion/one-direction-all-songs-with-lyrics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Kaggle
    Authors
    Saksham Nanda
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In the Loving Memory 💐 of Liam Payne (1993-2024)

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20924130%2Fae17be6b14a27c4882b9fcaadd23bfd9%2FEI_ZpYZntoovd3fRtY4SZxvRvqHlTwAHoNjFMu1Bfaj.jpg?generation=1732537774342370&alt=media" alt="">

    Dataset Description:

    1. S.No.
    2. Description: Serial number representing the index of each song entry in the dataset.
    3. Usability: Acts as a unique identifier for each row but does not provide analytical value for exploration or modeling.
    4. Song
    5. Description: Name of the song by One Direction.
    6. Usability: Essential for song-level analysis, such as identifying trends or filtering data by specific songs.
    7. Artist(s)
    8. Description: The performing artist(s) for the song.
    9. Usability: Useful for verifying the artist's contributions if the dataset expands to include collaborations with other artists. Can be used for grouping or filtering data.
    10. Writer(s)
    11. Description: Names of the writers who contributed to the song.
    12. Usability: Provides insight into creative contributors, allowing for analysis of recurring writers or studying patterns in songwriting styles.
    13. Album(s)
    14. Description: Album(s) in which the song was included.
    15. Usability: Useful for album-level aggregation, release patterns, or analyzing thematic or stylistic evolution across albums.
    16. Year
    17. Description: The year the song was released.
    18. Usability: Critical for temporal analysis, such as studying trends over time, release frequency, or the evolution of lyrics and themes.
    19. Lyrics
    20. Description: Full text of the song's lyrics.
    21. Usability: Valuable for natural language processing tasks like sentiment analysis, thematic exploration, keyword extraction, or lyrical comparison across songs.

    Acknowledgements Wikipedia, ChatGPT, genius.com

    OPEN FOR COLLABORATION: - I am open to collaborate with anyone who wants to add on features to this dataset or knows how to collect data by using APIs (For instance the Spotify API for developers)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Organization logoOrganization logo

Google Analytics Sample

Google Analytics Sample (BigQuery)

Explore at:
20 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?

Search
Clear search
Close search
Google apps
Main menu