100+ datasets found
  1. Recipe Site Traffic: Analysis & Prediction

    • kaggle.com
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    Kaggle
    Authors
    Michael Matta
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

    Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
    PPTX Presentation

    Recipe Site Traffic

    From: Head of Data Science
    Received: Today
    Subject: New project from the product team

    Hey!

    I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

    I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

    They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

    You can find more details about what I expect you to do here. And information on the data here.

    I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

    Good Luck!

    From: Product Manager - Recipe Discovery
    To: Head of Data Science
    Received: Yesterday
    Subject: Can you help us predict popular recipes?

    Hi,

    We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

    At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

    Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

    We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

    Look forward to seeing your presentation.

    About Tasty Bytes

    Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

    Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

    Example Recipe

    This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

    Tomato Soup

    Servings: 4
    Time to make: 2 hours
    Category: Lunch/Snack
    Cost per serving: $

    Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

    Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

    Method: 1. Cut the tomatoes into quarters….

    Data Information

    The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

    As you will see, they haven't given us all of the information they have about each recipe.

    You can find the data here.

    I will let you decide how to process it, just make sure you include all your decisions in your report.

    Don't forget to double check the data really does match what they say - it might not.

    Column NameDetails
    recipeNumeric, unique identifier of recipe
    caloriesNumeric, number of calories
    carbohydrateNumeric, amount of carbohydrates in grams
    sugarNumeric, amount of sugar in grams
    proteinNumeric, amount of prote...
  2. h

    first-impressions-dataset

    • huggingface.co
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2024). first-impressions-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/first-impressions-dataset
    Explore at:
    Dataset updated
    Mar 28, 2024
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    First Impressions Dataset

    The dataset contains 20,000 images of people. For each person, a first impression of them was created. The first impression is a text consisting of several sentences.

      💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset
    
    
    
    
    
    
      Content
    

    The dataset includes a folder with images of 20,000 people. The .csv file consists of columns:

    image_id - the… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/first-impressions-dataset.

  3. H

    Data for The Eclectic Reader

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Sep 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J.D. Porter; James English (2025). Data for The Eclectic Reader [Dataset]. http://doi.org/10.7910/DVN/QHLDXA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    J.D. Porter; James English
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains two files related to our article about reader eclecticism. One file contains metadata about books, derived from their landing pages on Goodreads.com. It's formatted as JSON and structured like a Python dictionary, where the keys are urls for each book's works page on Goodreads. The values include the book's title (as a string), the author (string), the average rating (float), the number of ratings (integer), and some shelves (dictionary). The last of these refers to the shelf data available on each book's landing page; at the time of the scrape (fall 2021), Goodreads showed up to 10 of these, and included information about how many people had tagged the book with each shelf. They no longer do this, and reconstructing the weights is non-trivial (you can find detailed information about all of a book's shelves, but Goodreads sometimes groups shelves into an overarching category for the landing page). The information collected here does reflect user interaction with the book, but these are caveats worth considering. In any case, the sub-dictionary uses the shelves as keys and has their weights as values. The file contains information about 884,722 books. The second file shows how we've sorted all of the shelves in our dataset into just a few clusters. This file is very simple—just a two-column csv with the name of the shelf and its cluster—but producing it was complicated. First, we made a network out of our shelves. Each shelf is a node, and we draw an edge between two shelves if they appear in the same book. As we see additional books that combine those shelves, we add to the edge weight. In the end we got a network that shows how all 1,194 shelves in our network are used relative to each other. When we had the network, we used community detection to see how the shelves cluster together. There are many ways to do this, but we used the Louvain method. This approach is non-deterministic and sensitive to various decisions, like the granularity of the community detection. To shore up our sense of the community structure (sometimes called "modularity") of this network, we spent a lot of time on this process. We ran community detection 10,000 times each at a few different granularities. We examined the resulting communities to see which ones tended to show up often and which emerged rarely, and we also observed how shelves tended to show up together. In the end we settled on the eight communities you see in this spreadsheet. We picked the names of each community ourselves. If you want to repeat this process, you will probably wind up with a somewhat different picture. We request that any outputs resulting from use of this dataset acknowledge the Price Lab / J.D. Porter. We have chosen not to share data about specific Goodreads users, in order to protect their privacy. We are, however, open to corresponding with researchers about sharing and collaboration.

  4. Internet and Computer use, London - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2025). Internet and Computer use, London - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/internet-and-computer-use-london
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    Area covered
    London
    Description

    Statistics of how many adults access the internet and use different types of technology covering: home internet access how people connect to the web how often people use the web/computers whether people use mobile devices whether people buy goods over the web whether people carried out specified activities over the internet For more information see the ONS website and the UKDS website.

  5. Number of global social network users 2017-2028

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How many people use social media?

                  Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
    
                  Who uses social media?
                  Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
                  when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
    
                  How much time do people spend on social media?
                  Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
    
                  What are the most popular social media platforms?
                  Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
    
  6. Z

    Keras video classification example with a subset of UCF101 - Action...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated May 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikolaj Buchwald (2023). Keras video classification example with a subset of UCF101 - Action Recognition Data Set (top 10 videos) [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7882860
    Explore at:
    Dataset updated
    May 11, 2023
    Dataset provided by
    Poznan Supercomputing and Networking Center, PAS
    Authors
    Mikolaj Buchwald
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classify video clips with natural scenes of actions performed by people visible in the videos.

    See the UCF101 Dataset web page: https://www.crcv.ucf.edu/data/UCF101.php#Results_on_UCF101

    This example datasets consists of the 10 most numerous video from the UCF101 dataset. For the top 5 version, see: https://doi.org/10.5281/zenodo.7924745 .

    Based on this code: https://keras.io/examples/vision/video_classification/ (needs to be updated, if has not yet been already; see the issue: https://github.com/keras-team/keras-io/issues/1342).

    Testing if data can be downloaded from figshare with wget, see: https://github.com/mojaveazure/angsd-wrapper/issues/10

    For generating the subset, see this notebook: https://colab.research.google.com/github/sayakpaul/Action-Recognition-in-TensorFlow/blob/main/Data_Preparation_UCF101.ipynb -- however, it also needs to be adjusted (if has not yet been already - then I will post a link to the notebook here or elsewhere, e.g., in the corrected notebook with Keras example).

    I would like to thank Sayak Paul for contacting me about his example at Keras documentation being out of date.

    Cite this dataset as:

    Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. https://doi.org/10.48550/arXiv.1212.0402

    To download the dataset via the command line, please use:

    wget -q https://zenodo.org/record/7882861/files/ucf101_top10.tar.gz -O ucf101_top10.tar.gz tar xf ucf101_top10.tar.gz

  7. h

    Data from: stereoset

    • huggingface.co
    • opendatalab.com
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McGill NLP Group (2021). stereoset [Dataset]. https://huggingface.co/datasets/McGill-NLP/stereoset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2021
    Dataset authored and provided by
    McGill NLP Group
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for StereoSet

      Dataset Summary
    

    StereoSet is a dataset that measures stereotype bias in language models. StereoSet consists of 17,000 sentences that measures model preferences across gender, race, religion, and profession.

      Supported Tasks and Leaderboards
    

    multiple-choice question answering

      Languages
    

    English (en)

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    intersentence

    {'bias_type': 'race', 'context': 'Many people live in… See the full description on the dataset page: https://huggingface.co/datasets/McGill-NLP/stereoset.

  8. h

    hispanic-people-liveness-detection-video-dataset

    • huggingface.co
    Updated Apr 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2024). hispanic-people-liveness-detection-video-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/hispanic-people-liveness-detection-video-dataset
    Explore at:
    Dataset updated
    Apr 24, 2024
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Biometric Attack Dataset, Hispanic People

      The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset
    

    The dataset for face anti spoofing and face recognition includes images and videos of hispanic people. 32,600+ photos & video of 16,300 people from 20 countries. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/hispanic-people-liveness-detection-video-dataset.

  9. Facebook users worldwide 2017-2027

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  10. Coursera Specialization Dataset 2023-SEP

    • kaggle.com
    zip
    Updated Sep 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomas Uždavinys (2023). Coursera Specialization Dataset 2023-SEP [Dataset]. https://www.kaggle.com/datasets/uzdavinys/coursera-specialization-dataset-2023-sep
    Explore at:
    zip(7458817 bytes)Available download formats
    Dataset updated
    Sep 5, 2023
    Authors
    Tomas Uždavinys
    Description

    The dataset was collected via web scraping from Corusera's website and contains six .csv tables with rich information on specializations/professional certificates, courses, and weekly study materials. for all available courses. The source code used for web scraping has also been made available online (see GitHub link https://github.com/TK-Problem/Coursera-scrapper) . Just keep in mind that Corusera's website can change in the future and may not be fully functional. Also, read the README.md file for the explanation of why number of reviews doesn't;t match between different .csv tables.

    The data was scraped on 2023-09-03 it might not be up to date in the future.

    All tables can be joined using SpecializationURL and CourseURL columns.

  11. S

    TibetanQA: Tibetan Dataset for Machine Reading Comprehension

    • scidb.cn
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Sun; Zhengcuo Dan; Sisi Liu; Xiaobing Zhao (2022). TibetanQA: Tibetan Dataset for Machine Reading Comprehension [Dataset]. http://doi.org/10.11922/sciencedb.j00001.00351
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    Science Data Bank
    Authors
    Yuan Sun; Zhengcuo Dan; Sisi Liu; Xiaobing Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper constructs a dataset for Tibetan machine reading comprehension. The data comes from Yunzang website, and covers 12 fields of nature, culture, education, geography, history, life, society, art, technology, people, science and sports. The questions and answers of the dataset are manually entered and marked by 20 Tibetan professionals. It contains 631 articles, 903 paragraphs, and 2,000 question-and-answer pairs constructed based on the paragraphs. Data items mainly include article ID, title, paragraph, question and answer. The publication of this dataset is of great value for promoting the development of Tibetan information processing.

  12. p

    RCS Data Switzerland

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). RCS Data Switzerland [Dataset]. https://listtodata.com/rcs-data-switzerland
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Switzerland
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    RCS Data Switzerland can help you connect with many people and grow your business. This dataset is perfect for getting probable RCS users number all across the country. Also, people can always use this for easy communication or direct marketing. Besides, the RCS Data Switzerland is a simple method for talking directly through SMS to interested people. If you want to boost your business easily, this database website is just suitable for you. Moreover, our RCS Data Switzerland is an excellent tool for marketing in this country. In addition, RCS messaging lets businesses send large, high-quality content to users, while SMS has fewer features but works on more devices. SMS became popular first, but RCS can improve its limited abilities. With this trustworthy number list, you can easily follow your marketing techniques. Most importantly, the best part is that everyone can enjoy a remarkable return on investment (ROI). Switzerland RCS Data will make your marketing more successful. The RCS system displays when a message is read or received. In fact, users can share files and high-quality photos. Also, this verified list is perfect for sending messages. However, you can reach people in different parts of the country. Our Switzerland RCS Data has over 95% accurate and up-to-date mobile numbers. Our special team confirms all the numbers to make sure they are the latest and active. Hence, our website presents customizable packages to fit your requirements. Additionally, the Switzerland RCS Data helps you reach the right people in your marketing efforts. By using this data correctly, you can develop your business across the nation. All data was created by obeying GDPR rules. Moreover, you get this dataset in an Excel or CSV file. In other words, this data allows you to share special offers, news, or reminders. In the end, you can buy this RCS Data from our website.

  13. Dataset - Understanding the software and data used in the social sciences

    • eprints.soton.ac.uk
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna (2023). Dataset - Understanding the software and data used in the social sciences [Dataset]. http://doi.org/10.5281/zenodo.7785710
    Explore at:
    Dataset updated
    Mar 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna
    Description

    This is a repository for a UKRI Economic and Social Research Council (ESRC) funded project to understand the software used to analyse social sciences data. Any software produced has been made available under a BSD 2-Clause license and any data and other non-software derivative is made available under a CC-BY 4.0 International License. Note that the software that analysed the survey is provided for illustrative purposes - it will not work on the decoupled anonymised data set. Exceptions to this are: Data from the UKRI ESRC is mostly made available under a CC BY-NC-SA 4.0 Licence. Data from Gateway to Research is made available under an Open Government Licence (Version 3.0). Contents Survey data & analysis: esrc_data-survey-analysis-data.zip Other data: esrc_data-other-data.zip Transcripts: esrc_data-transcripts.zip Data Management Plan: esrc_data-dmp.zip Survey data & analysis The survey ran from 3rd February 2022 to 6th March 2023 during which 168 responses were received. Of these responses, three were removed because they were supplied by people from outside the UK without a clear indication of involvement with the UK or associated infrastructure. A fourth response was removed as both came from the same person which leaves us with 164 responses in the data. The survey responses, Question (Q) Q1-Q16, have been decoupled from the demographic data, Q17-Q23. Questions Q24-Q28 are for follow-up and have been removed from the data. The institutions (Q17) and funding sources (Q18) have been provided in a separate file as this could be used to identify respondents. Q17, Q18 and Q19-Q23 have all been independently shuffled. The data has been made available as Comma Separated Values (CSV) with the question number as the header of each column and the encoded responses in the column below. To see what the question and the responses correspond to you will have to consult the survey-results-key.csv which decodes the question and responses accordingly. A pdf copy of the survey questions is available on GitHub. The survey data has been decoupled into: survey-results-key.csv - maps a question number and the responses to the actual question values. q1-16-survey-results.csv- the non-demographic component of the survey responses (Q1-Q16). q19-23-demographics.csv - the demographic part of the survey (Q19-Q21, Q23). q17-institutions.csv - the institution/location of the respondent (Q17). q18-funding.csv - funding sources within the last 5 years (Q18). Please note the code that has been used to do the analysis will not run with the decoupled survey data. Other data files included CleanedLocations.csv - normalised version of the institutions that the survey respondents volunteered. DTPs.csv - information on the UKRI Doctoral Training Partnerships (DTPs) scaped from the UKRI DTP contacts web page in October 2021. projectsearch-1646403729132.csv.gz - data snapshot from the UKRI Gateway to Research released on the 24th February 2022 made available under an Open Government Licence. locations.csv - latitude and longitude for the institutions in the cleaned locations. subjects.csv - research classifications for the ESRC projects for the 24th February data snapshot. topics.csv - topic classification for the ESRC projects for the 24th February data snapshot. Interview transcripts The interview transcripts have been anonymised and converted to markdown so that it's easier to process in general. List of interview transcripts: 1269794877.md 1578450175.md 1792505583.md 2964377624.md 3270614512.md 40983347262.md 4288358080.md 4561769548.md 4938919540.md 5037840428.md 5766299900.md 5996360861.md 6422621713.md 6776362537.md 7183719943.md 7227322280.md 7336263536.md 75909371872.md 7869268779.md 8031500357.md 9253010492.md Data Management Plan The study's Data Management Plan is provided in PDF format and shows the different data sets used throughout the duration of the study and where they have been deposited, as well as how long the SSI will keep these records.

  14. Active People Survey KPI Data, Borough - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2025). Active People Survey KPI Data, Borough - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/active-people-survey-kpi-data-borough
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    Key Performance Indicators from Active People Survey (APS). Data on volunteering, club membership, tuition, organised sport, competition, satisfaction with local sports provision, for local authorities, based on Active People Survey. KPI 1 Participation is defined as taking part on at least 3 days a week in moderate intensity sport and active recreation (at least 12 days in the last 4 weeks) for at least 30 minutes continuously in any one session. Participation includes recreational walking and cycling. KPI 2 Volunteering is defined as ‘Volunteering to support sport for at least one hour a week’. KPI 3 Club membership is defined as ‘being a member of a club particularly so that you can participate in sport or recreational activity in the last 4 weeks’. KPI 4 Receiving tuition is defined as ‘having received tuition from an instructor or coach to improve your performance in any sport or recreational activity in the last 12 months’. KPI 5 Organised Competition is defined as ‘having taken part in any organised competition in any sport or recreational activity in the last 12 months’. KPI 6 Satisfaction is the percentage of adults who are very or fairly satisfied with sports provision in their local area. Organised sport is defined as the percentage of adults who have done at least one of the following: received tuition in the last 12 months, taken part in organised competition in the last 12 months or been a member of a club to play sport. A statistically significant change is indicated by 'increase' or 'decrease' and this means that we are 95% certain that there has been a real change (increase or decrease). For more information on measuring statistically significant change within Active People, see the briefing note on Sport England’s website. The 'Base' refers to the sample size, i.e. the number of respondents. http://activepeople.sportengland.org/

  15. h

    black-people-liveness-detection-video-dataset

    • huggingface.co
    Updated Apr 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2024). black-people-liveness-detection-video-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/black-people-liveness-detection-video-dataset
    Explore at:
    Dataset updated
    Apr 11, 2024
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Biometric Attack Dataset, Black People

      The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset
    

    The dataset for face anti spoofing and face recognition includes images and videos of black people. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals presenting spoofs, using facial presentations. Our dataset proposes… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/black-people-liveness-detection-video-dataset.

  16. mmlu

    • huggingface.co
    Updated May 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for AI Safety (2023). mmlu [Dataset]. https://huggingface.co/datasets/cais/mmlu
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2023
    Dataset authored and provided by
    Center for AI Safetyhttps://safe.ai/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for MMLU

      Dataset Summary
    

    Measuring Massive Multitask Language Understanding by Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt (ICLR 2021). This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. The test spans subjects in the humanities, social sciences, hard sciences, and other areas that are important for some people to learn. This covers 57 tasks… See the full description on the dataset page: https://huggingface.co/datasets/cais/mmlu.

  17. h

    web-camera-people-behavior

    • huggingface.co
    Updated Mar 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). web-camera-people-behavior [Dataset]. https://huggingface.co/datasets/UniDataPro/web-camera-people-behavior
    Explore at:
    Dataset updated
    Mar 30, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Web Camera People Behavior Dataset for computer vision tasks

    Dataset includes 2,300+ individuals, contributing to a total of 53,800+ videos and 9,300+ images captured via webcams. It is designed to study social interactions and behaviors in various remote meetings, including video calls, video conferencing, and online meetings. By leveraging this dataset, developers and researchers can enhance their understanding of human behavior in digital communication settings, contributing to… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/web-camera-people-behavior.

  18. h

    crowd-counting-dataset

    • huggingface.co
    Updated Feb 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2024). crowd-counting-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/crowd-counting-dataset
    Explore at:
    Dataset updated
    Feb 16, 2024
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Crowd Counting Dataset

    The dataset includes images featuring crowds of people ranging from 0 to 5000 individuals. The dataset includes a diverse range of scenes and scenarios, capturing crowds in various settings. Each image in the dataset is accompanied by a corresponding JSON file containing detailed labeling information for each person in the crowd for crowd count and classification.

    Types of crowds in the dataset: 0-1000, 1000-2000, 2000-3000, 3000-4000 and 4000-5000 This… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/crowd-counting-dataset.

  19. c

    COSMO-Bench

    • kilthub.cmu.edu
    txt
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel McGann; Easton Potokar; Michael Kaess (2025). COSMO-Bench [Dataset]. http://doi.org/10.1184/R1/29652158.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 15, 2025
    Dataset provided by
    Carnegie Mellon University
    Authors
    Daniel McGann; Easton Potokar; Michael Kaess
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: Recent years have seen a focus on research into distributed optimization algorithms for multi-robot Collaborative Simultaneous Localization and Mapping (C-SLAM). Research in this domain, however, is made difficult by a lack of standard benchmark datasets. Such datasets have been used to great effect in the field of single-robot SLAM, and researchers focused on multi-robot problems would benefit greatly from dedicated benchmark datasets. To address this gap we design and release the Collaborative Open-Source Multi-robot Optimization Benchmark (COSMO-Bench) -- a suite of 24 datasets derived from a state-of-the-art C-SLAM front-end and real-world LiDAR data. For additional details please see our associated publication: https://arxiv.org/abs/2508.16731This entry, hosted through Carnegie Mellon University libraries, serves to host the official dataset release in perpetuity. However, we also support a website that provides a somewhat nicer user interface at cosmobench.comNOTE - Shortly after making this data available we were notified of some issues with the groundtruth of the CU-Multi data on which the kittredge and main_campus datasets are based. This issue has since been resolved and new versions of the affected datasets have been uploaded. If you are one of the handful of people that downloaded these datasets before September 15th 2025, please update to the corrected versions. To verify that you have the correct versions please see instructions in README.md

  20. h

    selfie_and_video

    • huggingface.co
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). selfie_and_video [Dataset]. https://huggingface.co/datasets/UniqueData/selfie_and_video
    Explore at:
    Dataset updated
    Oct 19, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Selfies and video dataset

    4000 people in this dataset. Each person took a selfie on a webcam, took a selfie on a mobile phone. In addition, people recorded video from the phone and from the webcam, on which they pronounced a given set of numbers. Includes folders corresponding to people in the dataset. Each folder includes 8 files (4 images and 4 videos).

      💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/selfie_and_video.
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction
Organization logo

Recipe Site Traffic: Analysis & Prediction

Practice End-to-End Analysis of Recipe Data for Traffic Prediction

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2025
Dataset provided by
Kaggle
Authors
Michael Matta
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column NameDetails
recipeNumeric, unique identifier of recipe
caloriesNumeric, number of calories
carbohydrateNumeric, amount of carbohydrates in grams
sugarNumeric, amount of sugar in grams
proteinNumeric, amount of prote...
Search
Clear search
Close search
Google apps
Main menu