63 datasets found

Recipe Site Traffic: Analysis & Prediction

kaggle.com

Updated Sep 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 21, 2025

Dataset provided by

Kaggle

Authors

Michael Matta

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column Name	Details
recipe	Numeric, unique identifier of recipe
calories	Numeric, number of calories
carbohydrate	Numeric, amount of carbohydrates in grams
sugar	Numeric, amount of sugar in grams
protein	Numeric, amount of prote...

E-commerce - Users of a French C2C fashion store
kaggle.com
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Dataset provided by
Kaggle
Authors
Jeffrey Mvutu Mabilama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
Foreword

This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

My Telegram bot will answer your queries and allow you to contact me.

Context

There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

This dataset is part of a preview of a much larger dataset. Please contact me for more.

Content

The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Questions you might want to answer using this dataset:

Are e-commerce users interested in social network feature ?

Are my users active enough (compared to those of this dataset) ?

How likely are people from other countries to sign up in a C2C website ?

How many users are likely to drop off after years of using my service ?

Example works:

Report(s) made using SQL queries can be found on the data.world page of the dataset.

Notebooks may be found on the Kaggle page of the dataset.

License

CC-BY-NC-SA 4.0

For other licensing options, contact me.
d
COVID-19 Test Sites
catalog.data.gov
s.cnmilf.com
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Philadelphia (2025). COVID-19 Test Sites [Dataset]. https://catalog.data.gov/dataset/covid-19-test-sites
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
City of Philadelphia
Description
A dataset of COVID-19 testing sites. A dataset of COVID-19 testing sites. If looking for a test, please use the Testing Sites locator app. You will be asked for identification and will also be asked for health insurance information. Identification will be required to receive a test. If you don’t have health insurance, you may still be able to receive a test by paying out-of-pocket. Some sites may also: - Limit testing to people who meet certain criteria. - Require an appointment. - Require a referral from your doctor. Check a location’s specific details on the map. Then, call or visit the provider’s website before going for a test.
Facebook users worldwide 2017-2027
statista.com
de.statista.com
+2more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
g
GiGL Spaces to Visit
gimi9.com
ckan.publishing.service.gov.uk
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GiGL Spaces to Visit [Dataset]. https://gimi9.com/dataset/uk_gigl-spaces-to-visit/
Explore at:
Description
🇬🇧 United Kingdom English Introduction The GiGL Spaces to Visit dataset provides locations and boundaries for open space sites in Greater London that are available to the public as destinations for leisure, activities and community engagement. It includes green corridors that provide opportunities for walking and cycling. The dataset has been created by Greenspace Information for Greater London CIC (GiGL). As London’s Environmental Records Centre, GiGL mobilises, curates and shares data that underpin our knowledge of London’s natural environment. We provide impartial evidence to support informed discussion and decision making in policy and practice. GiGL maps under licence from the Greater London Authority. Description This dataset is a sub-set of the GiGL Open Space dataset, the most comprehensive dataset available of open spaces in London. Sites are selected for inclusion in Spaces to Visit based on their public accessibility and likelihood that people would be interested in visiting. The dataset is a mapped Geographic Information System (GIS) polygon dataset where one polygon (or multi-polygon) represents one space. As well as site boundaries, the dataset includes information about a site’s name, size and type (e.g. park, playing field etc.). GiGL developed the Spaces to Visit dataset to support anyone who is interested in London’s open spaces - including community groups, web and app developers, policy makers and researchers - with an open licence data source. More detailed and extensive data are available under GiGL data use licences for GIGL partners, researchers and students. Information services are also available for ecological consultants, biological recorders and community volunteers – please see www.gigl.org.uk for more information. Please note that access and opening times are subject to change (particularly at the current time) so if you are planning to visit a site check on the local authority or site website that it is open. The dataset is updated on a quarterly basis. If you have questions about this dataset please contact GiGL’s GIS and Data Officer. Data sources The boundaries and information in this dataset, are a combination of data collected during the London Survey Method habitat and open space survey programme (1986 – 2008) and information provided to GiGL from other sources since. These sources include London borough surveys, land use datasets, volunteer surveys, feedback from the public, park friends’ groups, and updates made as part of GiGL’s on-going data validation and verification process. Due to data availability, some areas are more up-to-date than others. We are continually working on updating and improving this dataset. If you have any additional information or corrections for sites included in the Spaces to Visit dataset please contact GiGL’s GIS and Data Officer. NOTE: The dataset contains OS data © Crown copyright and database rights 2025. The site boundaries are based on Ordnance Survey mapping, and the data are published under Ordnance Survey's 'presumption to publish'. When using these data please acknowledge GiGL and Ordnance Survey as the source of the information using the following citation: ‘Dataset created by Greenspace Information for Greater London CIC (GiGL), 2025 – Contains Ordnance Survey and public sector information licensed under the Open Government Licence v3.0 ’
g
Alexa, International Top 100 Websites, Global, 10.12.2007
geocommons.com
Updated Apr 29, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexa (2008). Alexa, International Top 100 Websites, Global, 10.12.2007 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Apr 29, 2008
Dataset provided by
data
Alexa
Description
This Dataset shows the Alexa Top 100 International Websites, and provides metrics on the volume of traffic that these sites were able to handle. The Alexa top 100 lists the 100 most visited websites in the world and measures various statistical information. I have looked up the Headquarters, either through alexa, or a Whois Lookup to get street address with i was then able to geocode. I was only able to successfully geocode 85 of the top 100 sites throughout the world. Source of Data was Alexa.com, Source URL: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none Data was from October 12, 2007. Alexa is updated daily so to get more up to date information visit their site directly. they don't have maps though.

Average daily time spent on social media worldwide 2012-2024

statista.com
es.statista.com
+2more

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How much time do people spend on social media?

              As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
              the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
              People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
              During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
p
Lithuania Number Dataset
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Lithuania Number Dataset [Dataset]. https://listtodata.com/lithuania-dataset
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Lithuania
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Lithuania number dataset is a database of phone numbers collected from trusted sources. This means the numbers come from reliable places like government records, websites, or phone companies. The companies that provide this data work hard to ensure it is correct. They even offer source URLs, so you can see where the data came from. Moreover, you get 24/7 support, so if you have questions, help is always available. List to Data is a helpful website for finding important cell numbers quickly. Additionally, the phone numbers in the Lithuania number dataset follow an opt-in system. This means people agreed to share their phone numbers. This system is important because it keeps the data legal. It ensures that you are only contacting people who have given permission. Number data in Lithuania makes it easy to connect with the right people. Lithuania phone data is a special set of phone numbers that you can filter to meet your needs. You can easily filter the list by gender, age, and relationship status. For example, you can quickly sort the data to contact older adults or young singles easily. This flexibility makes it easier to communicate with the right audience. Therefore, you can connect with the people you want to reach. Also, the Lithuanian phone data follows strict GDPR rules. These rules protect people’s privacy and make sure their information stays safe. We collect and use the database of Lithuania in ways that respect everyone’s rights. Additionally, it removes any invalid numbers. You can find important phone numbers easily on our website, List to Data. Lithuania phone number list is a collection of phone numbers from people living in Lithuania. This list is completely correct and valid, meaning all numbers work properly. Companies check every phone number to ensure it is accurate. If you find a number that doesn’t work, you can get a new one for free. Moreover, Lithuania phone number list is about all numbers from authorized customers. People on this list agreed to share their numbers. As a result, you can use the data without worrying about legal issues. This makes the phonebook safe and useful for businesses that want to connect with people in Lithuania.
p
Italy Number Dataset
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Italy Number Dataset [Dataset]. https://listtodata.com/italy-dataset
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Belgium, Italy
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Italy number dataset includes phone numbers that businesses can trust. The dataset comes from reliable sources, ensuring accuracy. These sources collect numbers from various places, such as public records and directories. You can also find source URLs, which help you verify where the data came from. This adds another layer of credibility to the information. Additionally, this data provides 24/7 support. This is important for businesses that need quick answers. Furthermore, this Italy number dataset follows an opt-in process. This means every person whose number appears in the list agreed to have their number shared. They understand how we will use their information, making it safe to contact them. With this number dataset, businesses gain access to trustworthy and reliable information. List to Data is a website that helps you quickly find important phone numbers. Italy phone data is a valuable database that allows businesses to filter information based on specific needs. This means you can filter the data by gender, age, and relationship status. For example, businesses can easily find numbers for younger people to reach that age group. This ability to filter information makes communication more effective. You can focus on the audience that matters most to you. Moreover, you can remove invalid Italy phone data from the list. That means if any number becomes inactive, you can take it out. Keeping only active numbers helps ensure that your contacts are always up-to-date. This process makes it easy to get up-to-date info regularly. The ability to filter, remove invalid data, and stay GDPR compliant makes this data powerful for organizations. Italy phone number list is a collection of phone numbers from people living in Italy. This list is very useful for businesses and organizations that want to reach out to these individuals. The numbers in this list are 100% correct and valid. This means that every number works, so businesses can call confidently. If any number does not work, you receive a replacement guarantee. Furthermore, every number in the Italy phone number list comes from a customer permission basis. This means that people on the list agreed to have their phone numbers shared. By using this list, businesses can effectively connect with the right people while keeping everything legal and safe. The valid numbers and replacement guarantee make this list an excellent tool for outreach.
Audio Commons Estimation Results Data for deliverables D4.4, D4.10 and D4.12...
zenodo.org
data.europa.eu
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frederic Font; Frederic Font (2020). Audio Commons Estimation Results Data for deliverables D4.4, D4.10 and D4.12 [Dataset]. http://doi.org/10.5281/zenodo.2546643
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2546643
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Frederic Font; Frederic Font
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the results of running the automatic audio annotation algorithms for pitch, tempo and key used for the evaluation of algorithms developed during the AudioCommons H2020 EU project and which are part of the Audio Commons Audio Extractor tool. It also includes estimation results information for the single-eventness audio descriptor also developed for the same tool.

These estimation results data has been used to generate the following documents:

Deliverable D4.4: Evaluation report on the first prototype tool for the automatic semantic description of music samples

Deliverable D4.10: Evaluation report on the second prototype tool for the automatic semantic description of music samples

Deliverable D4.12: Release of tool for the automatic semantic description of music samples

All these documents are available in the materials section of the AudioCommons website.

All data in this repository is provided in the form of CSV files. Each CSV file corresponds to the analysis results of one musical task and one of the individual datasets used in the aforementioned deliverables. This repository does not include the audio files of each individual dataset, but includes references to the audio files. The following paragraphs describe the structure of the CSV files and give some notes about how to obtain the audio files in case these would be needed.

Structure of the CSV files

All the CSV files in this repository (with the sole exception of SINGLE EVENT - Estimation Results Truth.csv) are named according to the following convention: "DATASET_NAME - ESTIMATION_TASK Estimation Results.csv". Therefore, estimation results for pitch, tempo and tonality music tasks are separated in different files. All these files share the same structure for the first 2 CSV columns:

Audio reference: reference to the corresponding audio file. This will either be a string withe the filename, or the Freesound ID (for one dataset based on Freesound content). See below for details about how to obtain those files.

Audio reference type: will be one of Filename or Freesound ID, and specifies how the previous column should be interpreted.

The rest of the columns include the estimation results for each one of the algorithms included in the evaluation of each music facet. For each algorithms two columns are reserved, the first one containing the actual estimation and the second one the confidence of this estimation (see CSV file previews below). The format of actual estimations depends on the musical task, check the description of the corresponding ground truth dataset for more information on that. The confidence value is a float number, typically in the range from 0.0 to 1.0. It can happen that one or both columns are empty for a given analysis algorithm and CSV row. This will be the case if the algorithm could not successfully produce an estimation for the audio file row corresponding to the CSV row.

The remaining CSV file, SINGLE EVENT - Estimation Results.csv, has the following 4 columns:

Freesound ID: sound ID used in Freesound to identify the audio clip.

ACExtractorV2: single-eventness estimation of the algorithm included in the second version of the Audio Commons Audio Extractor tool (bool).

ACExtractorV2-opt: single-eventness estimation of the algorithm included in the second version of the Audio Commons Audio Extractor tool with optimized parameters (bool).

ACExtractorV3: single-eventness estimation of the algorithm included in the third version of the Audio Commons Audio Extractor tool (bool).

How to get the audio data

In this section we provide some notes about how to obtain the audio files corresponding to the estimation results provided here. Note that due to licensing restrictions we are not allowed to re-distribute the audio data corresponding to most of these automatic annotations.

Apple Loops (APPL): This dataset includes some of the music loops included in Apple's music software such as Logic or GarageBand. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here.

Carlos Vaquero Instruments Dataset (CVAQ): This dataset includes single instrument recordings carried out by Carlos Vaqueroas part of this master thesis. Sounds are available as Freesound packs and can be downloaded at this page: https://freesound.org/people/Carlos_Vaquero/packs

Freesound Loops 4k (FSL4): This dataset set includes a selection of music loops taken from Freesound. Detailed instructions about how to set up this dataset are provided here.

Giant Steps Key Dataset (GSKY): This dataset includes a selection of previews from Beatport annotated by key. Audio and original annotations available here.

Good-sounds Dataset (GSND): This dataset contains monophonic recordings of instrument samples. Full description, original annotations and audio are available here.

University of IOWA Musical Instrument Samples (IOWA): This dataset was created by the Electronic Music Studios of the University of IOWA and contains recordings of instrument samples. The dataset is available upon request by visiting this website.

Mixcraft Loops (MIXL): This dataset includes some of the music loops included in Acoustica's Mixcraft music software. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here.

NSynth Dataset Test and Validation sets (NSYT and NSYV): NSynth is a large-scale and high-quality dataset of annotated musical notes built with synthesized sounds by Google's Magenta team. Full dataset description including original annotations and audio files is available here.

Philarmonia Orchestra Sound Samples Dataset (PHIL): This includes thousands of free, downloadable sound samples specially recorded by Philharmonia Orchestra players. Audio files are freely downloadable from the philarmonia orchestra website.

Freesound Single Events Dataset (SINGLE EVENT): This includes a selection of Freesound audio clips representing audio signals containing either a single audio eventor multiple ones. Original audio files can be retrieved by downloading individual audio clips from Freesound using the ID identifier provided in the CSV file. A similar procedure to that described here could be followed.
Council building information - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jun 17, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2016). Council building information - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/council-building-information
Explore at:
Dataset updated
Jun 17, 2016
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A dataset providing information about local council services in Leeds. Leeds City Council uses this information to populate the Knowledge Panels on the Google search website. The dataset includes type of service, contact information and opening times. What is a Knowledge Panel? When people search for a business on Google, they may see information about that business in a box that appears to the right of their search results. The information in the box, called the Knowledge Panel, can help customers discover and contact your business. Is the information correct?
Access to Mental Health
hub.arcgis.com
share-open-data-njtpa.hub.arcgis.com
Updated Dec 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Urban Observatory by Esri (2018). Access to Mental Health [Dataset]. https://hub.arcgis.com/maps/07f70065653b4386b5c87cbe9b50b314
Explore at:
Dataset updated
Dec 4, 2018
Dataset provided by
Esrihttp://esri.com/
Authors
Urban Observatory by Esri
Area covered

Description
This map shows the access to mental health providers in every county and state in the United States according to the 2024 County Health Rankings & Roadmaps data for counties, states, and the nation. It translates the numbers to explain how many additional mental health providers are needed in each county and state. According to the data, in the United States overall there are 319 people per mental health provider in the U.S. The maps clearly illustrate that access to mental health providers varies widely across the country.The data comes from this County Health Rankings 2024 layer. An updated layer is usually published each year, which allows comparisons from year to year. This map contains layers for 2024 and also for 2022 as a comparison. County Health Rankings & Roadmaps (CHR&R), a program of the University of Wisconsin Population Health Institute with support provided by the Robert Wood Johnson Foundation, draws attention to why there are differences in health within and across communities by measuring the health of nearly all counties in the nation. This map's layers contain 2024 CHR&R data for nation, state, and county levels. The CHR&R Annual Data Release is compiled using county-level measures from a variety of national and state data sources. CHR&R provides a snapshot of the health of nearly every county in the nation. A wide range of factors influence how long and how well we live, including: opportunities for education, income, safe housing and the right to shape policies and practices that impact our lives and futures. Health Outcomes tell us how long people live on average within a community, and how people experience physical and mental health in a community. Health Factors represent the things we can improve to support longer and healthier lives. They are indicators of the future health of our communities. Some example measures are:Life ExpectancyAccess to Exercise OpportunitiesUninsuredFlu VaccinationsChildren in PovertySchool Funding AdequacySevere Housing Cost BurdenBroadband AccessTo see a full list of variables, definitions and descriptions, explore the Fields information by clicking the Data tab here in the Item Details of this layer. For full documentation, visit the Measures page on the CHR&R website. Notable changes in the 2024 CHR&R Annual Data Release:Measures of birth and death now provide more detailed race categories including a separate category for ‘Native Hawaiian or Other Pacific Islander’ and a ‘Two or more races’ category where possible. Find more information on the CHR&R website.Ranks are no longer calculated nor included in the dataset. CHR&R introduced a new graphic to the County Health Snapshots on their website that shows how a county fares relative to other counties in a state and nation. Data Processing:County Health Rankings data and metadata were prepared and formatted for Living Atlas use by the CHR&R team. 2021 U.S. boundaries are used in this dataset for a total of 3,143 counties. Analytic data files can be downloaded from the CHR&R website.
c
COSMO-Bench
kilthub.cmu.edu
txt
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel McGann; Easton Potokar; Michael Kaess (2025). COSMO-Bench [Dataset]. http://doi.org/10.1184/R1/29652158.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/29652158.v3
Dataset updated
Sep 15, 2025
Dataset provided by
Carnegie Mellon University
Authors
Daniel McGann; Easton Potokar; Michael Kaess
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: Recent years have seen a focus on research into distributed optimization algorithms for multi-robot Collaborative Simultaneous Localization and Mapping (C-SLAM). Research in this domain, however, is made difficult by a lack of standard benchmark datasets. Such datasets have been used to great effect in the field of single-robot SLAM, and researchers focused on multi-robot problems would benefit greatly from dedicated benchmark datasets. To address this gap we design and release the Collaborative Open-Source Multi-robot Optimization Benchmark (COSMO-Bench) -- a suite of 24 datasets derived from a state-of-the-art C-SLAM front-end and real-world LiDAR data. For additional details please see our associated publication: https://arxiv.org/abs/2508.16731This entry, hosted through Carnegie Mellon University libraries, serves to host the official dataset release in perpetuity. However, we also support a website that provides a somewhat nicer user interface at cosmobench.comNOTE - Shortly after making this data available we were notified of some issues with the groundtruth of the CU-Multi data on which the kittredge and main_campus datasets are based. This issue has since been resolved and new versions of the affected datasets have been uploaded. If you are one of the handful of people that downloaded these datasets before September 15th 2025, please update to the corrected versions. To verify that you have the correct versions please see instructions in README.md
Find Ryan White HIV/AIDS Medical Care Providers
catalog.data.gov
data.virginia.gov
+3more
Updated Jul 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health Resources and Services Administration, Department of Health & Human Services (2025). Find Ryan White HIV/AIDS Medical Care Providers [Dataset]. https://catalog.data.gov/dataset/find-ryan-white-hiv-aids-medical-care-providers
Explore at:
Dataset updated
Jul 25, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Health Resources and Services Administrationhttps://www.hrsa.gov/
Description
The Find Ryan White HIV/AIDS Medical Care Providers tool is a locator that helps people living with HIV/AIDS access medical care and related services. Users can search for Ryan White-funded medical care providers near a specific complete address, city and state, state and county, or ZIP code. Search results are sorted by distance away and include the Ryan White HIV/AIDS facility name, address, approximate distance from the search point, telephone number, website address, and a link for driving directions. HRSA's Ryan White program funds an array of grants at the state and local levels in areas where most needed. These grants provide medical and support services to more than a half million people who otherwise would be unable to afford care.

Number of global social network users 2017-2028

statista.com
es.statista.com
+2more

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How many people use social media?

              Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.

              Who uses social media?
              Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
              when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.

              How much time do people spend on social media?
              Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.

              What are the most popular social media platforms?
              Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.

TfL Live Traffic Cameras - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). TfL Live Traffic Cameras - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/tfl-live-traffic-cameras
Explore at:
Dataset updated
Jun 9, 2025
Dataset provided by
CKANhttps://ckan.org/
Description
The live traffic camera feed provides images from 177 cameras at key sites across the Capital, showing what's happening on London's streets. All images are TfL branded, have a location description, and date and time-stamp. They are refreshed at least every three minutes. Individual feeds may be interrupted if there is a system fault or if a camera is being serviced. Images are not captured when a camera is in use for managing traffic, when a camera is being maintained or in the event of a camera or system fault. Some ideas for re-use include: Freight or delivery services could use the live feed to follow traffic traffic conditions and plan routes accordingly Radio stations could add a live camera feed to a traffic news page Organisations with staff intranets could add the traffic camera feed so people can plan their journeys home Find out more about the feeds available from Transport for London. The BBC use TFL camera images for the live camera feeds on their website. Visit the BBC website to see live camera images.
ChatQA-Training-Data
huggingface.co
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2023
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Data Description

We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!

Other… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.
Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
h
fineweb-edu
huggingface.co
Updated Jan 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FineData (2025). fineweb-edu [Dataset]. http://doi.org/10.57967/hf/2497
Explore at:
Unique identifier
https://doi.org/10.57967/hf/2497
Dataset updated
Jan 3, 2025
Dataset authored and provided by
FineData
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
📚 FineWeb-Edu

1.3 trillion tokens of the finest educational data the 🌐 web has to offer

Paper: https://arxiv.org/abs/2406.17557

What is it?

📚 FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We then… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction

Recipe Site Traffic: Analysis & Prediction

Practice End-to-End Analysis of Recipe Data for Traffic Prediction

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 21, 2025

Dataset provided by

Kaggle

Authors

Michael Matta

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column Name	Details
recipe	Numeric, unique identifier of recipe
calories	Numeric, number of calories
carbohydrate	Numeric, amount of carbohydrates in grams
sugar	Numeric, amount of sugar in grams
protein	Numeric, amount of prote...

Clear search

Close search

Google apps

Main menu

Recipe Site Traffic: Analysis & Prediction

Recipe Site Traffic

About Tasty Bytes

Example Recipe

Data Information

E-commerce - Users of a French C2C fashion store

Foreword

Context

Content

Acknowledgements

Inspiration

License

COVID-19 Test Sites

Facebook users worldwide 2017-2027

GiGL Spaces to Visit

Alexa, International Top 100 Websites, Global, 10.12.2007

Average daily time spent on social media worldwide 2012-2024

Number of internet users worldwide 2014-2029

Lithuania Number Dataset

Italy Number Dataset

Audio Commons Estimation Results Data for deliverables D4.4, D4.10 and D4.12...

Council building information - Dataset - data.gov.uk

Access to Mental Health

COSMO-Bench

Find Ryan White HIV/AIDS Medical Care Providers

Number of global social network users 2017-2028

TfL Live Traffic Cameras - Dataset - data.gov.uk

ChatQA-Training-Data

Mobile internet users worldwide 2020-2029

fineweb-edu

Recipe Site Traffic: Analysis & Prediction

Practice End-to-End Analysis of Recipe Data for Traffic Prediction

Recipe Site Traffic

About Tasty Bytes

Example Recipe

Data Information