5 datasets found
  1. F

    English Open Ended Classification Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Open Ended Classification Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-open-ended-classification-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the English Open Ended Classification Prompt-Response Dataset—an extensive collection of 3000 meticulously curated prompt and response pairs. This dataset is a valuable resource for training Language Models (LMs) to classify input text accurately, a crucial aspect in advancing generative AI.

    Dataset Content: This open-ended classification dataset comprises a diverse set of prompts and responses where the prompt contains input text to be classified and may also contain task instruction, context, constraints, and restrictions while completion contains the best classification category as response. Both these prompts and completions are available in English language. As this is an open-ended dataset, there will be no options given to choose the right classification category as a part of the prompt.

    These prompt and completion pairs cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more. Each prompt is accompanied by a response, providing valuable information and insights to enhance the language model training process. Both the prompt and response were manually curated by native English people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This open-ended classification prompt and completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains prompts and responses with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Prompt Diversity: To ensure diversity, this open-ended classification dataset includes prompts with varying complexity levels, ranging from easy to medium and hard. Additionally, prompts are diverse in terms of length from short to medium and long, creating a comprehensive variety. The classification dataset also contains prompts with constraints and persona restrictions, which makes it even more useful for LLM training.Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of responses depending on the prompt. These formats include single-word, short phrase, and single sentence type of response. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.Data Format and Annotation Details: This fully labeled English Open Ended Classification Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt length, prompt complexity, domain, response, response type, and rich text presence.Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

    The English version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

    Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom open-ended classification prompt and completion data tailored to specific needs, providing flexibility and customization options.License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy English Open Ended Classification Prompt-Completion Dataset to enhance the classification abilities and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

  2. Anti-Spoofing Replay Dataset

    • kaggle.com
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Axon Labs (2025). Anti-Spoofing Replay Dataset [Dataset]. https://www.kaggle.com/datasets/axondata/replay-attack-mobile-dataset-ibeta-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Axon Labs
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Replay Attack Dataset consists of 1,500 individuals who provided selfies, followed by 3,000 replay display attacks executed across 15 different mobile devices. These attacks are captured from a diverse range of devices, spanning low, medium, and high-end mobile phones, providing extensive variation in screen types, lighting, and environmental conditions

    Dataset Description:

    • 1,500 individuals: Diverse mix of genders and ethnicities, each contributing one selfie.
    • 3,000 replay display attacks: Crafted from the provided selfies, representing a wide range of mobile devices.
    • 15 mobile devices: Devices span low, medium, and high-end spectrums, ensuring a broad variety of screen qualities and device types.
    • Real-life selfies: High-quality selfies with no filters, at least 720p resolution, and clear facial features.
    • Replay display attacks: Attacks include zoom-in, zoom-out phases for multiple frames, with no visible phone borders.

    Real Life Selfies Description: - Selfie quality: 720p or higher resolution - Face clarity: Clear facial images with no filters or alterations - Diverse contributors: Representing a wide range of ethnicities and genders to ensure balanced data

    Replay Display Attacks Description: - Attack duration: Each attack video lasts approximately 5 seconds. - Device diversity: Captured across 15 mobile devices, ensuring high diversity of device types. - Zoom in movement: Replay videos include zoom-in and zoom-out phases for multiple frames, mimicking realistic attack behaviors - Phone borders: The attacks are designed such that phone borders are non-visible, focusing on the display content

    Full version of dataset is availible for commercial usage - leave a request on our website Axonlabs to purchase the dataset 💰

    Potential Use Cases:

    • Liveness Detection: Ideal for training and evaluating liveness detection models to differentiate between real selfies and replay attacks
    • Spoof Detection: Enables the development of anti-spoofing technology and security measures for facial recognition systems in mobile and biometric authentication Biometric Authentication: Useful for improving security in systems that rely on mobile face recognition

    Why This Dataset Is Important:

    • Tailored for iBeta Certification Level 1: Meets the standards for iBeta's Level 1 biometric testing, ensuring high relevance for real-world applications
    • Replay Attacks Are Common PAD Threats: Replay attacks are among the most prevalent Presentation Attack Detection (PAD) threats, making this dataset crucial for developing robust countermeasures in biometric security

    Why This Dataset is Better Than Competitors:

    • Zoom-In, Zoom-Out Phase: Unique inclusion of zooming effects throughout the attack sequence, providing a more realistic and dynamic simulation of replay attacks
    • No Visible Phone Borders: Unlike other datasets, this dataset removes visible phone borders, making the attack scenarios more similar to real-world threats Diverse Device Range: Provides attacks from a broad spectrum of mobile devices, enhancing the dataset’s applicability across different device types and qualities

    Other Datasets for iBeta 1:

    iBeta level 1 certification Dataset

    Photo Print Attacks Dataset

    Cutout 2D attacks Dataset

    Display Replay Attacks Dataset

    Keywords: Mobile Devices, Liveness Detection, Replay Attack Dataset, Biometric Authentication, Spoof Detection, Facial Recognition, Anti-Spoofing Technology, iBeta Certification, Presentation Attack Detection (PAD), AI Dataset, Deep Learning, Machine Learning, Security Systems, Replay Display Attacks

  3. w

    Global Financial Inclusion (Global Findex) Database 2021 - India

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Dec 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2022). Global Financial Inclusion (Global Findex) Database 2021 - India [Dataset]. https://microdata.worldbank.org/index.php/catalog/4653
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2021
    Area covered
    India
    Description

    Abstract

    The fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.

    The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.

    The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.

    Geographic coverage

    Excluded populations living in Northeast states and remote islands and Jammu and Kashmir. The excluded areas represent less than 10 percent of the total population.

    Analysis unit

    Individual

    Kind of data

    Observation data/ratings [obs]

    Sampling procedure

    In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.

    In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.

    In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.

    The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).

    For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.

    Sample size for India is 3000.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Questionnaires are available on the website.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.

  4. first-impressions

    • kaggle.com
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Waqas786 (2024). first-impressions [Dataset]. https://www.kaggle.com/datasets/muhammadwaqas786/first-impressions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Waqas786
    Description

    Researchers mostly use the dataset launched for ChaLearn Looking At People First Impression Challenge (ECCV Challenge). The CVPR’17 dataset (an extension to the ECCV challenge dataset) consists of video files labelled with Big Five Personality Traits. The dataset consists of 3,000 high-definition YouTube videos featuring YouTubers speaking in English. To create the dataset, selected videos were divided into 10,000 clips with an average duration of 15 seconds. The dataset comprises three sets for training, validation, and testing, with a ratio of 3:1:1. The videos feature YouTubers from various nationalities, genders, and age groups. To label the videos with Big-Five personality traits, Amazon Mechanical Turk (AMT) was used. Each video clip was assigned a label corresponding to its Big-Five values, which range from 0 to 1.

  5. May 1960 Puerto Montt, Valdivia, Chile Images

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Dec 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA National Centers for Environmental Information (Point of Contact) (2024). May 1960 Puerto Montt, Valdivia, Chile Images [Dataset]. https://catalog.data.gov/dataset/may-1960-puerto-montt-valdivia-chile-images2
    Explore at:
    Dataset updated
    Dec 6, 2024
    Dataset provided by
    National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Area covered
    Valdivia, Puerto Montt, Chile, Valdivia
    Description

    On May 22, 1960, a Mw 9.5 earthquake, the largest earthquake ever instrumentally recorded, occurred in southern Chile. The series of earthquakes that followed ravaged southern Chile and ruptured over a period of days a 1,000 km section of the fault, one of the longest ruptures ever reported. The number of fatalities associated with both the earthquake and tsunami has been estimated to be between 490 and 5,700. Reportedly there were 3,000 injured, and initially there were 717 missing in Chile. The Chilean government estimated 2,000,000 people were left homeless and 58,622 houses were completely destroyed. Damage (including tsunami damage) was more than $500 million U.S. dollars.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). English Open Ended Classification Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-open-ended-classification-text-dataset

English Open Ended Classification Prompt & Response Dataset

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

Dataset funded by
FutureBeeAI
Description

What’s Included

Welcome to the English Open Ended Classification Prompt-Response Dataset—an extensive collection of 3000 meticulously curated prompt and response pairs. This dataset is a valuable resource for training Language Models (LMs) to classify input text accurately, a crucial aspect in advancing generative AI.

Dataset Content: This open-ended classification dataset comprises a diverse set of prompts and responses where the prompt contains input text to be classified and may also contain task instruction, context, constraints, and restrictions while completion contains the best classification category as response. Both these prompts and completions are available in English language. As this is an open-ended dataset, there will be no options given to choose the right classification category as a part of the prompt.

These prompt and completion pairs cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more. Each prompt is accompanied by a response, providing valuable information and insights to enhance the language model training process. Both the prompt and response were manually curated by native English people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

This open-ended classification prompt and completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains prompts and responses with different types of rich text, including tables, code, JSON, etc., with proper markdown.

Prompt Diversity: To ensure diversity, this open-ended classification dataset includes prompts with varying complexity levels, ranging from easy to medium and hard. Additionally, prompts are diverse in terms of length from short to medium and long, creating a comprehensive variety. The classification dataset also contains prompts with constraints and persona restrictions, which makes it even more useful for LLM training.Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of responses depending on the prompt. These formats include single-word, short phrase, and single sentence type of response. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.Data Format and Annotation Details: This fully labeled English Open Ended Classification Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt length, prompt complexity, domain, response, response type, and rich text presence.Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

The English version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom open-ended classification prompt and completion data tailored to specific needs, providing flexibility and customization options.License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy English Open Ended Classification Prompt-Completion Dataset to enhance the classification abilities and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

Search
Clear search
Close search
Google apps
Main menu