100+ datasets found

Recipe Site Traffic: Analysis & Prediction

kaggle.com

Updated Sep 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 21, 2025

Dataset provided by

Kaggle

Authors

Michael Matta

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column Name	Details
recipe	Numeric, unique identifier of recipe
calories	Numeric, number of calories
carbohydrate	Numeric, amount of carbohydrates in grams
sugar	Numeric, amount of sugar in grams
protein	Numeric, amount of prote...

h
first-impressions-dataset
huggingface.co
Updated Mar 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2024). first-impressions-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/first-impressions-dataset
Explore at:
Dataset updated
Mar 28, 2024
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
First Impressions Dataset

The dataset contains 20,000 images of people. For each person, a first impression of them was created. The first impression is a text consisting of several sentences.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset Content

The dataset includes a folder with images of 20,000 people. The .csv file consists of columns:

image_id - the… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/first-impressions-dataset.
H
Data for The Eclectic Reader
dataverse.harvard.edu
search.dataone.org
Updated Sep 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J.D. Porter; James English (2025). Data for The Eclectic Reader [Dataset]. http://doi.org/10.7910/DVN/QHLDXA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QHLDXA
Dataset updated
Sep 25, 2025
Dataset provided by
Harvard Dataverse
Authors
J.D. Porter; James English
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains two files related to our article about reader eclecticism. One file contains metadata about books, derived from their landing pages on Goodreads.com. It's formatted as JSON and structured like a Python dictionary, where the keys are urls for each book's works page on Goodreads. The values include the book's title (as a string), the author (string), the average rating (float), the number of ratings (integer), and some shelves (dictionary). The last of these refers to the shelf data available on each book's landing page; at the time of the scrape (fall 2021), Goodreads showed up to 10 of these, and included information about how many people had tagged the book with each shelf. They no longer do this, and reconstructing the weights is non-trivial (you can find detailed information about all of a book's shelves, but Goodreads sometimes groups shelves into an overarching category for the landing page). The information collected here does reflect user interaction with the book, but these are caveats worth considering. In any case, the sub-dictionary uses the shelves as keys and has their weights as values. The file contains information about 884,722 books. The second file shows how we've sorted all of the shelves in our dataset into just a few clusters. This file is very simple—just a two-column csv with the name of the shelf and its cluster—but producing it was complicated. First, we made a network out of our shelves. Each shelf is a node, and we draw an edge between two shelves if they appear in the same book. As we see additional books that combine those shelves, we add to the edge weight. In the end we got a network that shows how all 1,194 shelves in our network are used relative to each other. When we had the network, we used community detection to see how the shelves cluster together. There are many ways to do this, but we used the Louvain method. This approach is non-deterministic and sensitive to various decisions, like the granularity of the community detection. To shore up our sense of the community structure (sometimes called "modularity") of this network, we spent a lot of time on this process. We ran community detection 10,000 times each at a few different granularities. We examined the resulting communities to see which ones tended to show up often and which emerged rarely, and we also observed how shelves tended to show up together. In the end we settled on the eight communities you see in this spreadsheet. We picked the names of each community ourselves. If you want to repeat this process, you will probably wind up with a somewhat different picture. We request that any outputs resulting from use of this dataset acknowledge the Price Lab / J.D. Porter. We have chosen not to share data about specific Goodreads users, in order to protect their privacy. We are, however, open to corresponding with researchers about sharing and collaboration.
Internet and Computer use, London - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jun 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). Internet and Computer use, London - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/internet-and-computer-use-london
Explore at:
Dataset updated
Jun 9, 2025
Dataset provided by
CKANhttps://ckan.org/
Area covered
London
Description
Statistics of how many adults access the internet and use different types of technology covering: home internet access how people connect to the web how often people use the web/computers whether people use mobile devices whether people buy goods over the web whether people carried out specified activities over the internet For more information see the ONS website and the UKDS website.

Number of global social network users 2017-2028

statista.com
de.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How many people use social media?

              Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.

              Who uses social media?
              Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
              when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.

              How much time do people spend on social media?
              Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.

              What are the most popular social media platforms?
              Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.

Z
Keras video classification example with a subset of UCF101 - Action...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated May 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikolaj Buchwald (2023). Keras video classification example with a subset of UCF101 - Action Recognition Data Set (top 10 videos) [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7882860
Explore at:
Dataset updated
May 11, 2023
Dataset provided by
Poznan Supercomputing and Networking Center, PAS
Authors
Mikolaj Buchwald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classify video clips with natural scenes of actions performed by people visible in the videos.

See the UCF101 Dataset web page: https://www.crcv.ucf.edu/data/UCF101.php#Results_on_UCF101

This example datasets consists of the 10 most numerous video from the UCF101 dataset. For the top 5 version, see: https://doi.org/10.5281/zenodo.7924745 .

Based on this code: https://keras.io/examples/vision/video_classification/ (needs to be updated, if has not yet been already; see the issue: https://github.com/keras-team/keras-io/issues/1342).

Testing if data can be downloaded from figshare with wget, see: https://github.com/mojaveazure/angsd-wrapper/issues/10

For generating the subset, see this notebook: https://colab.research.google.com/github/sayakpaul/Action-Recognition-in-TensorFlow/blob/main/Data_Preparation_UCF101.ipynb -- however, it also needs to be adjusted (if has not yet been already - then I will post a link to the notebook here or elsewhere, e.g., in the corrected notebook with Keras example).

I would like to thank Sayak Paul for contacting me about his example at Keras documentation being out of date.

Cite this dataset as:

Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. https://doi.org/10.48550/arXiv.1212.0402

To download the dataset via the command line, please use:

wget -q https://zenodo.org/record/7882861/files/ucf101_top10.tar.gz -O ucf101_top10.tar.gz tar xf ucf101_top10.tar.gz
h
Data from: stereoset
huggingface.co
opendatalab.com
Updated Dec 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McGill NLP Group (2021). stereoset [Dataset]. https://huggingface.co/datasets/McGill-NLP/stereoset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2021
Dataset authored and provided by
McGill NLP Group
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for StereoSet

Dataset Summary

StereoSet is a dataset that measures stereotype bias in language models. StereoSet consists of 17,000 sentences that measures model preferences across gender, race, religion, and profession.

Supported Tasks and Leaderboards

multiple-choice question answering

Languages

English (en)

Dataset Structure Data Instances

intersentence

{'bias_type': 'race', 'context': 'Many people live in… See the full description on the dataset page: https://huggingface.co/datasets/McGill-NLP/stereoset.
h
hispanic-people-liveness-detection-video-dataset
huggingface.co
Updated Apr 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2024). hispanic-people-liveness-detection-video-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/hispanic-people-liveness-detection-video-dataset
Explore at:
Dataset updated
Apr 24, 2024
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Biometric Attack Dataset, Hispanic People

The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset

The dataset for face anti spoofing and face recognition includes images and videos of hispanic people. 32,600+ photos & video of 16,300 people from 20 countries. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/hispanic-people-liveness-detection-video-dataset.
Facebook users worldwide 2017-2027
statista.com
de.statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Coursera Specialization Dataset 2023-SEP
kaggle.com
zip
Updated Sep 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomas Uždavinys (2023). Coursera Specialization Dataset 2023-SEP [Dataset]. https://www.kaggle.com/datasets/uzdavinys/coursera-specialization-dataset-2023-sep
Explore at:
zip(7458817 bytes)Available download formats
Dataset updated
Sep 5, 2023
Authors
Tomas Uždavinys
Description
The dataset was collected via web scraping from Corusera's website and contains six .csv tables with rich information on specializations/professional certificates, courses, and weekly study materials. for all available courses. The source code used for web scraping has also been made available online (see GitHub link https://github.com/TK-Problem/Coursera-scrapper) . Just keep in mind that Corusera's website can change in the future and may not be fully functional. Also, read the README.md file for the explanation of why number of reviews doesn't;t match between different .csv tables.

The data was scraped on 2023-09-03 it might not be up to date in the future.

All tables can be joined using SpecializationURL and CourseURL columns.
S
TibetanQA: Tibetan Dataset for Machine Reading Comprehension
scidb.cn
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuan Sun; Zhengcuo Dan; Sisi Liu; Xiaobing Zhao (2022). TibetanQA: Tibetan Dataset for Machine Reading Comprehension [Dataset]. http://doi.org/10.11922/sciencedb.j00001.00351
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.j00001.00351
Dataset updated
Feb 11, 2022
Dataset provided by
Science Data Bank
Authors
Yuan Sun; Zhengcuo Dan; Sisi Liu; Xiaobing Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper constructs a dataset for Tibetan machine reading comprehension. The data comes from Yunzang website, and covers 12 fields of nature, culture, education, geography, history, life, society, art, technology, people, science and sports. The questions and answers of the dataset are manually entered and marked by 20 Tibetan professionals. It contains 631 articles, 903 paragraphs, and 2,000 question-and-answer pairs constructed based on the paragraphs. Data items mainly include article ID, title, paragraph, question and answer. The publication of this dataset is of great value for promoting the development of Tibetan information processing.
p
RCS Data Switzerland
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). RCS Data Switzerland [Dataset]. https://listtodata.com/rcs-data-switzerland
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Authors
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Switzerland
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
RCS Data Switzerland can help you connect with many people and grow your business. This dataset is perfect for getting probable RCS users number all across the country. Also, people can always use this for easy communication or direct marketing. Besides, the RCS Data Switzerland is a simple method for talking directly through SMS to interested people. If you want to boost your business easily, this database website is just suitable for you. Moreover, our RCS Data Switzerland is an excellent tool for marketing in this country. In addition, RCS messaging lets businesses send large, high-quality content to users, while SMS has fewer features but works on more devices. SMS became popular first, but RCS can improve its limited abilities. With this trustworthy number list, you can easily follow your marketing techniques. Most importantly, the best part is that everyone can enjoy a remarkable return on investment (ROI). Switzerland RCS Data will make your marketing more successful. The RCS system displays when a message is read or received. In fact, users can share files and high-quality photos. Also, this verified list is perfect for sending messages. However, you can reach people in different parts of the country. Our Switzerland RCS Data has over 95% accurate and up-to-date mobile numbers. Our special team confirms all the numbers to make sure they are the latest and active. Hence, our website presents customizable packages to fit your requirements. Additionally, the Switzerland RCS Data helps you reach the right people in your marketing efforts. By using this data correctly, you can develop your business across the nation. All data was created by obeying GDPR rules. Moreover, you get this dataset in an Excel or CSV file. In other words, this data allows you to share special offers, news, or reminders. In the end, you can buy this RCS Data from our website.
Dataset - Understanding the software and data used in the social sciences
eprints.soton.ac.uk
Updated Mar 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna (2023). Dataset - Understanding the software and data used in the social sciences [Dataset]. http://doi.org/10.5281/zenodo.7785710
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7785710
Dataset updated
Mar 30, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna
Description
This is a repository for a UKRI Economic and Social Research Council (ESRC) funded project to understand the software used to analyse social sciences data. Any software produced has been made available under a BSD 2-Clause license and any data and other non-software derivative is made available under a CC-BY 4.0 International License. Note that the software that analysed the survey is provided for illustrative purposes - it will not work on the decoupled anonymised data set. Exceptions to this are: Data from the UKRI ESRC is mostly made available under a CC BY-NC-SA 4.0 Licence. Data from Gateway to Research is made available under an Open Government Licence (Version 3.0). Contents Survey data & analysis: esrc_data-survey-analysis-data.zip Other data: esrc_data-other-data.zip Transcripts: esrc_data-transcripts.zip Data Management Plan: esrc_data-dmp.zip Survey data & analysis The survey ran from 3rd February 2022 to 6th March 2023 during which 168 responses were received. Of these responses, three were removed because they were supplied by people from outside the UK without a clear indication of involvement with the UK or associated infrastructure. A fourth response was removed as both came from the same person which leaves us with 164 responses in the data. The survey responses, Question (Q) Q1-Q16, have been decoupled from the demographic data, Q17-Q23. Questions Q24-Q28 are for follow-up and have been removed from the data. The institutions (Q17) and funding sources (Q18) have been provided in a separate file as this could be used to identify respondents. Q17, Q18 and Q19-Q23 have all been independently shuffled. The data has been made available as Comma Separated Values (CSV) with the question number as the header of each column and the encoded responses in the column below. To see what the question and the responses correspond to you will have to consult the survey-results-key.csv which decodes the question and responses accordingly. A pdf copy of the survey questions is available on GitHub. The survey data has been decoupled into: survey-results-key.csv - maps a question number and the responses to the actual question values. q1-16-survey-results.csv- the non-demographic component of the survey responses (Q1-Q16). q19-23-demographics.csv - the demographic part of the survey (Q19-Q21, Q23). q17-institutions.csv - the institution/location of the respondent (Q17). q18-funding.csv - funding sources within the last 5 years (Q18). Please note the code that has been used to do the analysis will not run with the decoupled survey data. Other data files included CleanedLocations.csv - normalised version of the institutions that the survey respondents volunteered. DTPs.csv - information on the UKRI Doctoral Training Partnerships (DTPs) scaped from the UKRI DTP contacts web page in October 2021. projectsearch-1646403729132.csv.gz - data snapshot from the UKRI Gateway to Research released on the 24th February 2022 made available under an Open Government Licence. locations.csv - latitude and longitude for the institutions in the cleaned locations. subjects.csv - research classifications for the ESRC projects for the 24th February data snapshot. topics.csv - topic classification for the ESRC projects for the 24th February data snapshot. Interview transcripts The interview transcripts have been anonymised and converted to markdown so that it's easier to process in general. List of interview transcripts: 1269794877.md 1578450175.md 1792505583.md 2964377624.md 3270614512.md 40983347262.md 4288358080.md 4561769548.md 4938919540.md 5037840428.md 5766299900.md 5996360861.md 6422621713.md 6776362537.md 7183719943.md 7227322280.md 7336263536.md 75909371872.md 7869268779.md 8031500357.md 9253010492.md Data Management Plan The study's Data Management Plan is provided in PDF format and shows the different data sets used throughout the duration of the study and where they have been deposited, as well as how long the SSI will keep these records.
Active People Survey KPI Data, Borough - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jun 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). Active People Survey KPI Data, Borough - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/active-people-survey-kpi-data-borough
Explore at:
Dataset updated
Jun 9, 2025
Dataset provided by
CKANhttps://ckan.org/
Description
Key Performance Indicators from Active People Survey (APS). Data on volunteering, club membership, tuition, organised sport, competition, satisfaction with local sports provision, for local authorities, based on Active People Survey. KPI 1 Participation is defined as taking part on at least 3 days a week in moderate intensity sport and active recreation (at least 12 days in the last 4 weeks) for at least 30 minutes continuously in any one session. Participation includes recreational walking and cycling. KPI 2 Volunteering is defined as ‘Volunteering to support sport for at least one hour a week’. KPI 3 Club membership is defined as ‘being a member of a club particularly so that you can participate in sport or recreational activity in the last 4 weeks’. KPI 4 Receiving tuition is defined as ‘having received tuition from an instructor or coach to improve your performance in any sport or recreational activity in the last 12 months’. KPI 5 Organised Competition is defined as ‘having taken part in any organised competition in any sport or recreational activity in the last 12 months’. KPI 6 Satisfaction is the percentage of adults who are very or fairly satisfied with sports provision in their local area. Organised sport is defined as the percentage of adults who have done at least one of the following: received tuition in the last 12 months, taken part in organised competition in the last 12 months or been a member of a club to play sport. A statistically significant change is indicated by 'increase' or 'decrease' and this means that we are 95% certain that there has been a real change (increase or decrease). For more information on measuring statistically significant change within Active People, see the briefing note on Sport England’s website. The 'Base' refers to the sample size, i.e. the number of respondents. http://activepeople.sportengland.org/
h
black-people-liveness-detection-video-dataset
huggingface.co
Updated Apr 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2024). black-people-liveness-detection-video-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/black-people-liveness-detection-video-dataset
Explore at:
Dataset updated
Apr 11, 2024
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Biometric Attack Dataset, Black People

The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset

The dataset for face anti spoofing and face recognition includes images and videos of black people. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals presenting spoofs, using facial presentations. Our dataset proposes… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/black-people-liveness-detection-video-dataset.
mmlu
huggingface.co
Updated May 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Center for AI Safety (2023). mmlu [Dataset]. https://huggingface.co/datasets/cais/mmlu
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 10, 2023
Dataset authored and provided by
Center for AI Safetyhttps://safe.ai/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for MMLU

Dataset Summary

Measuring Massive Multitask Language Understanding by Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt (ICLR 2021). This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. The test spans subjects in the humanities, social sciences, hard sciences, and other areas that are important for some people to learn. This covers 57 tasks… See the full description on the dataset page: https://huggingface.co/datasets/cais/mmlu.
h
web-camera-people-behavior
huggingface.co
Updated Mar 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). web-camera-people-behavior [Dataset]. https://huggingface.co/datasets/UniDataPro/web-camera-people-behavior
Explore at:
Dataset updated
Mar 30, 2025
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Web Camera People Behavior Dataset for computer vision tasks

Dataset includes 2,300+ individuals, contributing to a total of 53,800+ videos and 9,300+ images captured via webcams. It is designed to study social interactions and behaviors in various remote meetings, including video calls, video conferencing, and online meetings. By leveraging this dataset, developers and researchers can enhance their understanding of human behavior in digital communication settings, contributing to… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/web-camera-people-behavior.
h
crowd-counting-dataset
huggingface.co
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2024). crowd-counting-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/crowd-counting-dataset
Explore at:
Dataset updated
Feb 16, 2024
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Crowd Counting Dataset

The dataset includes images featuring crowds of people ranging from 0 to 5000 individuals. The dataset includes a diverse range of scenes and scenarios, capturing crowds in various settings. Each image in the dataset is accompanied by a corresponding JSON file containing detailed labeling information for each person in the crowd for crowd count and classification.

Types of crowds in the dataset: 0-1000, 1000-2000, 2000-3000, 3000-4000 and 4000-5000 This… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/crowd-counting-dataset.
c
COSMO-Bench
kilthub.cmu.edu
txt
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel McGann; Easton Potokar; Michael Kaess (2025). COSMO-Bench [Dataset]. http://doi.org/10.1184/R1/29652158.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/29652158.v3
Dataset updated
Sep 15, 2025
Dataset provided by
Carnegie Mellon University
Authors
Daniel McGann; Easton Potokar; Michael Kaess
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: Recent years have seen a focus on research into distributed optimization algorithms for multi-robot Collaborative Simultaneous Localization and Mapping (C-SLAM). Research in this domain, however, is made difficult by a lack of standard benchmark datasets. Such datasets have been used to great effect in the field of single-robot SLAM, and researchers focused on multi-robot problems would benefit greatly from dedicated benchmark datasets. To address this gap we design and release the Collaborative Open-Source Multi-robot Optimization Benchmark (COSMO-Bench) -- a suite of 24 datasets derived from a state-of-the-art C-SLAM front-end and real-world LiDAR data. For additional details please see our associated publication: https://arxiv.org/abs/2508.16731This entry, hosted through Carnegie Mellon University libraries, serves to host the official dataset release in perpetuity. However, we also support a website that provides a somewhat nicer user interface at cosmobench.comNOTE - Shortly after making this data available we were notified of some issues with the groundtruth of the CU-Multi data on which the kittredge and main_campus datasets are based. This issue has since been resolved and new versions of the affected datasets have been uploaded. If you are one of the handful of people that downloaded these datasets before September 15th 2025, please update to the corrected versions. To verify that you have the correct versions please see instructions in README.md
h
selfie_and_video
huggingface.co
Updated Oct 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2023). selfie_and_video [Dataset]. https://huggingface.co/datasets/UniqueData/selfie_and_video
Explore at:
Dataset updated
Oct 19, 2023
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Selfies and video dataset

4000 people in this dataset. Each person took a selfie on a webcam, took a selfie on a mobile phone. In addition, people recorded video from the phone and from the webcam, on which they pronounced a given set of numbers. Includes folders corresponding to people in the dataset. Each folder includes 8 files (4 images and 4 videos).

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/selfie_and_video.

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction

Recipe Site Traffic: Analysis & Prediction

Practice End-to-End Analysis of Recipe Data for Traffic Prediction

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 21, 2025

Dataset provided by

Kaggle

Authors

Michael Matta

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column Name	Details
recipe	Numeric, unique identifier of recipe
calories	Numeric, number of calories
carbohydrate	Numeric, amount of carbohydrates in grams
sugar	Numeric, amount of sugar in grams
protein	Numeric, amount of prote...

Clear search

Close search

Google apps

Main menu

Recipe Site Traffic: Analysis & Prediction

Recipe Site Traffic

About Tasty Bytes

Example Recipe

Data Information

first-impressions-dataset

Data for The Eclectic Reader

Internet and Computer use, London - Dataset - data.gov.uk

Number of global social network users 2017-2028

Keras video classification example with a subset of UCF101 - Action...

Data from: stereoset

intersentence

hispanic-people-liveness-detection-video-dataset

Facebook users worldwide 2017-2027

Coursera Specialization Dataset 2023-SEP

TibetanQA: Tibetan Dataset for Machine Reading Comprehension

RCS Data Switzerland

Dataset - Understanding the software and data used in the social sciences

Active People Survey KPI Data, Borough - Dataset - data.gov.uk

black-people-liveness-detection-video-dataset

mmlu

web-camera-people-behavior

crowd-counting-dataset

COSMO-Bench

selfie_and_video

Recipe Site Traffic: Analysis & Prediction

Practice End-to-End Analysis of Recipe Data for Traffic Prediction

Recipe Site Traffic

About Tasty Bytes

Example Recipe

Data Information