63 datasets found
  1. Recipe Site Traffic: Analysis & Prediction

    • kaggle.com
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    Kaggle
    Authors
    Michael Matta
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

    Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
    PPTX Presentation

    Recipe Site Traffic

    From: Head of Data Science
    Received: Today
    Subject: New project from the product team

    Hey!

    I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

    I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

    They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

    You can find more details about what I expect you to do here. And information on the data here.

    I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

    Good Luck!

    From: Product Manager - Recipe Discovery
    To: Head of Data Science
    Received: Yesterday
    Subject: Can you help us predict popular recipes?

    Hi,

    We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

    At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

    Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

    We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

    Look forward to seeing your presentation.

    About Tasty Bytes

    Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

    Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

    Example Recipe

    This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

    Tomato Soup

    Servings: 4
    Time to make: 2 hours
    Category: Lunch/Snack
    Cost per serving: $

    Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

    Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

    Method: 1. Cut the tomatoes into quarters….

    Data Information

    The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

    As you will see, they haven't given us all of the information they have about each recipe.

    You can find the data here.

    I will let you decide how to process it, just make sure you include all your decisions in your report.

    Don't forget to double check the data really does match what they say - it might not.

    Column NameDetails
    recipeNumeric, unique identifier of recipe
    caloriesNumeric, number of calories
    carbohydrateNumeric, amount of carbohydrates in grams
    sugarNumeric, amount of sugar in grams
    proteinNumeric, amount of prote...
  2. E-commerce - Users of a French C2C fashion store

    • kaggle.com
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    Kaggle
    Authors
    Jeffrey Mvutu Mabilama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Foreword

    This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

    My Telegram bot will answer your queries and allow you to contact me.

    Context

    There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

    Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

    This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

    • For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

    If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

    This dataset is part of a preview of a much larger dataset. Please contact me for more.

    Content

    The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

    Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Questions you might want to answer using this dataset:

    • Are e-commerce users interested in social network feature ?
    • Are my users active enough (compared to those of this dataset) ?
    • How likely are people from other countries to sign up in a C2C website ?
    • How many users are likely to drop off after years of using my service ?

    Example works:

    • Report(s) made using SQL queries can be found on the data.world page of the dataset.
    • Notebooks may be found on the Kaggle page of the dataset.

    License

    CC-BY-NC-SA 4.0

    For other licensing options, contact me.

  3. d

    COVID-19 Test Sites

    • catalog.data.gov
    • s.cnmilf.com
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Philadelphia (2025). COVID-19 Test Sites [Dataset]. https://catalog.data.gov/dataset/covid-19-test-sites
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    City of Philadelphia
    Description

    A dataset of COVID-19 testing sites. A dataset of COVID-19 testing sites. If looking for a test, please use the Testing Sites locator app. You will be asked for identification and will also be asked for health insurance information. Identification will be required to receive a test. If you don’t have health insurance, you may still be able to receive a test by paying out-of-pocket. Some sites may also: - Limit testing to people who meet certain criteria. - Require an appointment. - Require a referral from your doctor. Check a location’s specific details on the map. Then, call or visit the provider’s website before going for a test.

  4. Facebook users worldwide 2017-2027

    • statista.com
    • de.statista.com
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  5. g

    GiGL Spaces to Visit

    • gimi9.com
    • ckan.publishing.service.gov.uk
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GiGL Spaces to Visit [Dataset]. https://gimi9.com/dataset/uk_gigl-spaces-to-visit/
    Explore at:
    Description

    🇬🇧 United Kingdom English Introduction The GiGL Spaces to Visit dataset provides locations and boundaries for open space sites in Greater London that are available to the public as destinations for leisure, activities and community engagement. It includes green corridors that provide opportunities for walking and cycling. The dataset has been created by Greenspace Information for Greater London CIC (GiGL). As London’s Environmental Records Centre, GiGL mobilises, curates and shares data that underpin our knowledge of London’s natural environment. We provide impartial evidence to support informed discussion and decision making in policy and practice. GiGL maps under licence from the Greater London Authority. Description This dataset is a sub-set of the GiGL Open Space dataset, the most comprehensive dataset available of open spaces in London. Sites are selected for inclusion in Spaces to Visit based on their public accessibility and likelihood that people would be interested in visiting. The dataset is a mapped Geographic Information System (GIS) polygon dataset where one polygon (or multi-polygon) represents one space. As well as site boundaries, the dataset includes information about a site’s name, size and type (e.g. park, playing field etc.). GiGL developed the Spaces to Visit dataset to support anyone who is interested in London’s open spaces - including community groups, web and app developers, policy makers and researchers - with an open licence data source. More detailed and extensive data are available under GiGL data use licences for GIGL partners, researchers and students. Information services are also available for ecological consultants, biological recorders and community volunteers – please see www.gigl.org.uk for more information. Please note that access and opening times are subject to change (particularly at the current time) so if you are planning to visit a site check on the local authority or site website that it is open. The dataset is updated on a quarterly basis. If you have questions about this dataset please contact GiGL’s GIS and Data Officer. Data sources The boundaries and information in this dataset, are a combination of data collected during the London Survey Method habitat and open space survey programme (1986 – 2008) and information provided to GiGL from other sources since. These sources include London borough surveys, land use datasets, volunteer surveys, feedback from the public, park friends’ groups, and updates made as part of GiGL’s on-going data validation and verification process. Due to data availability, some areas are more up-to-date than others. We are continually working on updating and improving this dataset. If you have any additional information or corrections for sites included in the Spaces to Visit dataset please contact GiGL’s GIS and Data Officer. NOTE: The dataset contains OS data © Crown copyright and database rights 2025. The site boundaries are based on Ordnance Survey mapping, and the data are published under Ordnance Survey's 'presumption to publish'. When using these data please acknowledge GiGL and Ordnance Survey as the source of the information using the following citation: ‘Dataset created by Greenspace Information for Greater London CIC (GiGL), 2025 – Contains Ordnance Survey and public sector information licensed under the Open Government Licence v3.0 ’

  6. g

    Alexa, International Top 100 Websites, Global, 10.12.2007

    • geocommons.com
    Updated Apr 29, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexa (2008). Alexa, International Top 100 Websites, Global, 10.12.2007 [Dataset]. http://geocommons.com/search.html
    Explore at:
    Dataset updated
    Apr 29, 2008
    Dataset provided by
    data
    Alexa
    Description

    This Dataset shows the Alexa Top 100 International Websites, and provides metrics on the volume of traffic that these sites were able to handle. The Alexa top 100 lists the 100 most visited websites in the world and measures various statistical information. I have looked up the Headquarters, either through alexa, or a Whois Lookup to get street address with i was then able to geocode. I was only able to successfully geocode 85 of the top 100 sites throughout the world. Source of Data was Alexa.com, Source URL: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none Data was from October 12, 2007. Alexa is updated daily so to get more up to date information visit their site directly. they don't have maps though.

  7. Average daily time spent on social media worldwide 2012-2024

    • statista.com
    • es.statista.com
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How much time do people spend on social media?

                  As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
                  the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
                  People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
                  During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
    
  8. Number of internet users worldwide 2014-2029

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    World
    Description

    The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.

  9. p

    Lithuania Number Dataset

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Lithuania Number Dataset [Dataset]. https://listtodata.com/lithuania-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Lithuania
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Lithuania number dataset is a database of phone numbers collected from trusted sources. This means the numbers come from reliable places like government records, websites, or phone companies. The companies that provide this data work hard to ensure it is correct. They even offer source URLs, so you can see where the data came from. Moreover, you get 24/7 support, so if you have questions, help is always available. List to Data is a helpful website for finding important cell numbers quickly. Additionally, the phone numbers in the Lithuania number dataset follow an opt-in system. This means people agreed to share their phone numbers. This system is important because it keeps the data legal. It ensures that you are only contacting people who have given permission. Number data in Lithuania makes it easy to connect with the right people. Lithuania phone data is a special set of phone numbers that you can filter to meet your needs. You can easily filter the list by gender, age, and relationship status. For example, you can quickly sort the data to contact older adults or young singles easily. This flexibility makes it easier to communicate with the right audience. Therefore, you can connect with the people you want to reach. Also, the Lithuanian phone data follows strict GDPR rules. These rules protect people’s privacy and make sure their information stays safe. We collect and use the database of Lithuania in ways that respect everyone’s rights. Additionally, it removes any invalid numbers. You can find important phone numbers easily on our website, List to Data. Lithuania phone number list is a collection of phone numbers from people living in Lithuania. This list is completely correct and valid, meaning all numbers work properly. Companies check every phone number to ensure it is accurate. If you find a number that doesn’t work, you can get a new one for free. Moreover, Lithuania phone number list is about all numbers from authorized customers. People on this list agreed to share their numbers. As a result, you can use the data without worrying about legal issues. This makes the phonebook safe and useful for businesses that want to connect with people in Lithuania.

  10. p

    Italy Number Dataset

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Italy Number Dataset [Dataset]. https://listtodata.com/italy-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Belgium, Italy
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Italy number dataset includes phone numbers that businesses can trust. The dataset comes from reliable sources, ensuring accuracy. These sources collect numbers from various places, such as public records and directories. You can also find source URLs, which help you verify where the data came from. This adds another layer of credibility to the information. Additionally, this data provides 24/7 support. This is important for businesses that need quick answers. Furthermore, this Italy number dataset follows an opt-in process. This means every person whose number appears in the list agreed to have their number shared. They understand how we will use their information, making it safe to contact them. With this number dataset, businesses gain access to trustworthy and reliable information. List to Data is a website that helps you quickly find important phone numbers. Italy phone data is a valuable database that allows businesses to filter information based on specific needs. This means you can filter the data by gender, age, and relationship status. For example, businesses can easily find numbers for younger people to reach that age group. This ability to filter information makes communication more effective. You can focus on the audience that matters most to you. Moreover, you can remove invalid Italy phone data from the list. That means if any number becomes inactive, you can take it out. Keeping only active numbers helps ensure that your contacts are always up-to-date. This process makes it easy to get up-to-date info regularly. The ability to filter, remove invalid data, and stay GDPR compliant makes this data powerful for organizations. Italy phone number list is a collection of phone numbers from people living in Italy. This list is very useful for businesses and organizations that want to reach out to these individuals. The numbers in this list are 100% correct and valid. This means that every number works, so businesses can call confidently. If any number does not work, you receive a replacement guarantee. Furthermore, every number in the Italy phone number list comes from a customer permission basis. This means that people on the list agreed to have their phone numbers shared. By using this list, businesses can effectively connect with the right people while keeping everything legal and safe. The valid numbers and replacement guarantee make this list an excellent tool for outreach.

  11. Audio Commons Estimation Results Data for deliverables D4.4, D4.10 and D4.12...

    • zenodo.org
    • data.europa.eu
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frederic Font; Frederic Font (2020). Audio Commons Estimation Results Data for deliverables D4.4, D4.10 and D4.12 [Dataset]. http://doi.org/10.5281/zenodo.2546643
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Frederic Font; Frederic Font
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of running the automatic audio annotation algorithms for pitch, tempo and key used for the evaluation of algorithms developed during the AudioCommons H2020 EU project and which are part of the Audio Commons Audio Extractor tool. It also includes estimation results information for the single-eventness audio descriptor also developed for the same tool.

    These estimation results data has been used to generate the following documents:

    • Deliverable D4.4: Evaluation report on the first prototype tool for the automatic semantic description of music samples
    • Deliverable D4.10: Evaluation report on the second prototype tool for the automatic semantic description of music samples
    • Deliverable D4.12: Release of tool for the automatic semantic description of music samples

    All these documents are available in the materials section of the AudioCommons website.

    All data in this repository is provided in the form of CSV files. Each CSV file corresponds to the analysis results of one musical task and one of the individual datasets used in the aforementioned deliverables. This repository does not include the audio files of each individual dataset, but includes references to the audio files. The following paragraphs describe the structure of the CSV files and give some notes about how to obtain the audio files in case these would be needed.


    Structure of the CSV files

    All the CSV files in this repository (with the sole exception of SINGLE EVENT - Estimation Results Truth.csv) are named according to the following convention: "DATASET_NAME - ESTIMATION_TASK Estimation Results.csv". Therefore, estimation results for pitch, tempo and tonality music tasks are separated in different files. All these files share the same structure for the first 2 CSV columns:

    1. Audio reference: reference to the corresponding audio file. This will either be a string withe the filename, or the Freesound ID (for one dataset based on Freesound content). See below for details about how to obtain those files.
    2. Audio reference type: will be one of Filename or Freesound ID, and specifies how the previous column should be interpreted.

    The rest of the columns include the estimation results for each one of the algorithms included in the evaluation of each music facet. For each algorithms two columns are reserved, the first one containing the actual estimation and the second one the confidence of this estimation (see CSV file previews below). The format of actual estimations depends on the musical task, check the description of the corresponding ground truth dataset for more information on that. The confidence value is a float number, typically in the range from 0.0 to 1.0. It can happen that one or both columns are empty for a given analysis algorithm and CSV row. This will be the case if the algorithm could not successfully produce an estimation for the audio file row corresponding to the CSV row.

    The remaining CSV file, SINGLE EVENT - Estimation Results.csv, has the following 4 columns:

    • Freesound ID: sound ID used in Freesound to identify the audio clip.
    • ACExtractorV2: single-eventness estimation of the algorithm included in the second version of the Audio Commons Audio Extractor tool (bool).
    • ACExtractorV2-opt: single-eventness estimation of the algorithm included in the second version of the Audio Commons Audio Extractor tool with optimized parameters (bool).
    • ACExtractorV3: single-eventness estimation of the algorithm included in the third version of the Audio Commons Audio Extractor tool (bool).

    How to get the audio data

    In this section we provide some notes about how to obtain the audio files corresponding to the estimation results provided here. Note that due to licensing restrictions we are not allowed to re-distribute the audio data corresponding to most of these automatic annotations.

    • Apple Loops (APPL): This dataset includes some of the music loops included in Apple's music software such as Logic or GarageBand. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here.
    • Carlos Vaquero Instruments Dataset (CVAQ): This dataset includes single instrument recordings carried out by Carlos Vaqueroas part of this master thesis. Sounds are available as Freesound packs and can be downloaded at this page: https://freesound.org/people/Carlos_Vaquero/packs
    • Freesound Loops 4k (FSL4): This dataset set includes a selection of music loops taken from Freesound. Detailed instructions about how to set up this dataset are provided here.
    • Giant Steps Key Dataset (GSKY): This dataset includes a selection of previews from Beatport annotated by key. Audio and original annotations available here.
    • Good-sounds Dataset (GSND): This dataset contains monophonic recordings of instrument samples. Full description, original annotations and audio are available here.
    • University of IOWA Musical Instrument Samples (IOWA): This dataset was created by the Electronic Music Studios of the University of IOWA and contains recordings of instrument samples. The dataset is available upon request by visiting this website.
    • Mixcraft Loops (MIXL): This dataset includes some of the music loops included in Acoustica's Mixcraft music software. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here.
    • NSynth Dataset Test and Validation sets (NSYT and NSYV): NSynth is a large-scale and high-quality dataset of annotated musical notes built with synthesized sounds by Google's Magenta team. Full dataset description including original annotations and audio files is available here.
    • Philarmonia Orchestra Sound Samples Dataset (PHIL): This includes thousands of free, downloadable sound samples specially recorded by Philharmonia Orchestra players. Audio files are freely downloadable from the philarmonia orchestra website.
    • Freesound Single Events Dataset (SINGLE EVENT): This includes a selection of Freesound audio clips representing audio signals containing either a single audio eventor multiple ones. Original audio files can be retrieved by downloading individual audio clips from Freesound using the ID identifier provided in the CSV file. A similar procedure to that described here could be followed.
  12. Council building information - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 17, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2016). Council building information - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/council-building-information
    Explore at:
    Dataset updated
    Jun 17, 2016
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A dataset providing information about local council services in Leeds. Leeds City Council uses this information to populate the Knowledge Panels on the Google search website. The dataset includes type of service, contact information and opening times. What is a Knowledge Panel? When people search for a business on Google, they may see information about that business in a box that appears to the right of their search results. The information in the box, called the Knowledge Panel, can help customers discover and contact your business. Is the information correct?

  13. Access to Mental Health

    • hub.arcgis.com
    • share-open-data-njtpa.hub.arcgis.com
    Updated Dec 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Urban Observatory by Esri (2018). Access to Mental Health [Dataset]. https://hub.arcgis.com/maps/07f70065653b4386b5c87cbe9b50b314
    Explore at:
    Dataset updated
    Dec 4, 2018
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    Urban Observatory by Esri
    Area covered
    Description

    This map shows the access to mental health providers in every county and state in the United States according to the 2024 County Health Rankings & Roadmaps data for counties, states, and the nation. It translates the numbers to explain how many additional mental health providers are needed in each county and state. According to the data, in the United States overall there are 319 people per mental health provider in the U.S. The maps clearly illustrate that access to mental health providers varies widely across the country.The data comes from this County Health Rankings 2024 layer. An updated layer is usually published each year, which allows comparisons from year to year. This map contains layers for 2024 and also for 2022 as a comparison. County Health Rankings & Roadmaps (CHR&R), a program of the University of Wisconsin Population Health Institute with support provided by the Robert Wood Johnson Foundation, draws attention to why there are differences in health within and across communities by measuring the health of nearly all counties in the nation. This map's layers contain 2024 CHR&R data for nation, state, and county levels. The CHR&R Annual Data Release is compiled using county-level measures from a variety of national and state data sources. CHR&R provides a snapshot of the health of nearly every county in the nation. A wide range of factors influence how long and how well we live, including: opportunities for education, income, safe housing and the right to shape policies and practices that impact our lives and futures. Health Outcomes tell us how long people live on average within a community, and how people experience physical and mental health in a community. Health Factors represent the things we can improve to support longer and healthier lives. They are indicators of the future health of our communities. Some example measures are:Life ExpectancyAccess to Exercise OpportunitiesUninsuredFlu VaccinationsChildren in PovertySchool Funding AdequacySevere Housing Cost BurdenBroadband AccessTo see a full list of variables, definitions and descriptions, explore the Fields information by clicking the Data tab here in the Item Details of this layer. For full documentation, visit the Measures page on the CHR&R website. Notable changes in the 2024 CHR&R Annual Data Release:Measures of birth and death now provide more detailed race categories including a separate category for ‘Native Hawaiian or Other Pacific Islander’ and a ‘Two or more races’ category where possible. Find more information on the CHR&R website.Ranks are no longer calculated nor included in the dataset. CHR&R introduced a new graphic to the County Health Snapshots on their website that shows how a county fares relative to other counties in a state and nation. Data Processing:County Health Rankings data and metadata were prepared and formatted for Living Atlas use by the CHR&R team. 2021 U.S. boundaries are used in this dataset for a total of 3,143 counties. Analytic data files can be downloaded from the CHR&R website.

  14. c

    COSMO-Bench

    • kilthub.cmu.edu
    txt
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel McGann; Easton Potokar; Michael Kaess (2025). COSMO-Bench [Dataset]. http://doi.org/10.1184/R1/29652158.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 15, 2025
    Dataset provided by
    Carnegie Mellon University
    Authors
    Daniel McGann; Easton Potokar; Michael Kaess
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: Recent years have seen a focus on research into distributed optimization algorithms for multi-robot Collaborative Simultaneous Localization and Mapping (C-SLAM). Research in this domain, however, is made difficult by a lack of standard benchmark datasets. Such datasets have been used to great effect in the field of single-robot SLAM, and researchers focused on multi-robot problems would benefit greatly from dedicated benchmark datasets. To address this gap we design and release the Collaborative Open-Source Multi-robot Optimization Benchmark (COSMO-Bench) -- a suite of 24 datasets derived from a state-of-the-art C-SLAM front-end and real-world LiDAR data. For additional details please see our associated publication: https://arxiv.org/abs/2508.16731This entry, hosted through Carnegie Mellon University libraries, serves to host the official dataset release in perpetuity. However, we also support a website that provides a somewhat nicer user interface at cosmobench.comNOTE - Shortly after making this data available we were notified of some issues with the groundtruth of the CU-Multi data on which the kittredge and main_campus datasets are based. This issue has since been resolved and new versions of the affected datasets have been uploaded. If you are one of the handful of people that downloaded these datasets before September 15th 2025, please update to the corrected versions. To verify that you have the correct versions please see instructions in README.md

  15. Find Ryan White HIV/AIDS Medical Care Providers

    • catalog.data.gov
    • data.virginia.gov
    • +3more
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health Resources and Services Administration, Department of Health & Human Services (2025). Find Ryan White HIV/AIDS Medical Care Providers [Dataset]. https://catalog.data.gov/dataset/find-ryan-white-hiv-aids-medical-care-providers
    Explore at:
    Dataset updated
    Jul 25, 2025
    Description

    The Find Ryan White HIV/AIDS Medical Care Providers tool is a locator that helps people living with HIV/AIDS access medical care and related services. Users can search for Ryan White-funded medical care providers near a specific complete address, city and state, state and county, or ZIP code. Search results are sorted by distance away and include the Ryan White HIV/AIDS facility name, address, approximate distance from the search point, telephone number, website address, and a link for driving directions. HRSA's Ryan White program funds an array of grants at the state and local levels in areas where most needed. These grants provide medical and support services to more than a half million people who otherwise would be unable to afford care.

  16. Number of global social network users 2017-2028

    • statista.com
    • es.statista.com
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How many people use social media?

                  Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
    
                  Who uses social media?
                  Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
                  when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
    
                  How much time do people spend on social media?
                  Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
    
                  What are the most popular social media platforms?
                  Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
    
  17. TfL Live Traffic Cameras - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2025). TfL Live Traffic Cameras - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/tfl-live-traffic-cameras
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    The live traffic camera feed provides images from 177 cameras at key sites across the Capital, showing what's happening on London's streets. All images are TfL branded, have a location description, and date and time-stamp. They are refreshed at least every three minutes. Individual feeds may be interrupted if there is a system fault or if a camera is being serviced. Images are not captured when a camera is in use for managing traffic, when a camera is being maintained or in the event of a camera or system fault. Some ideas for re-use include: Freight or delivery services could use the live feed to follow traffic traffic conditions and plan routes accordingly Radio stations could add a live camera feed to a traffic news page Organisations with staff intranets could add the traffic camera feed so people can plan their journeys home Find out more about the feeds available from Transport for London. The BBC use TFL camera images for the live camera feeds on their website. Visit the BBC website to see live camera images.

  18. ChatQA-Training-Data

    • huggingface.co
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Data Description

    We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!

      Other… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.
    
  19. Mobile internet users worldwide 2020-2029

    • statista.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.

  20. h

    fineweb-edu

    • huggingface.co
    Updated Jan 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FineData (2025). fineweb-edu [Dataset]. http://doi.org/10.57967/hf/2497
    Explore at:
    Dataset updated
    Jan 3, 2025
    Dataset authored and provided by
    FineData
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    📚 FineWeb-Edu

    1.3 trillion tokens of the finest educational data the 🌐 web has to offer

    Paper: https://arxiv.org/abs/2406.17557

      What is it?
    

    📚 FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We then… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction
Organization logo

Recipe Site Traffic: Analysis & Prediction

Practice End-to-End Analysis of Recipe Data for Traffic Prediction

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2025
Dataset provided by
Kaggle
Authors
Michael Matta
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column NameDetails
recipeNumeric, unique identifier of recipe
caloriesNumeric, number of calories
carbohydrateNumeric, amount of carbohydrates in grams
sugarNumeric, amount of sugar in grams
proteinNumeric, amount of prote...
Search
Clear search
Close search
Google apps
Main menu