19 datasets found
  1. Dating App Fame & Behavior

    • kaggle.com
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Utkarsh Singh (2023). Dating App Fame & Behavior [Dataset]. https://www.kaggle.com/utkarshx27/lovoo-dating-app-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Utkarsh Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13364933%2F23694fae55e2e76299358693ba6f32b9%2Flv-share.jpg?generation=1684843825246772&alt=media" alt=""> ➡️ There are total 3 datasets containing valuable information. ➡️ Understand people's fame and behavior's on a dating app platform. | Column Name | Description | |---------------------|------------------------------| | Age | The age of the user. | | Number of Users | The total number of users. | | Percent Want Chats | Percentage of users who want chats. | | Percent Want Friends| Percentage of users who want friendships. | | Percent Want Dates | Percentage of users who want romantic dates. | | Mean Kisses Received| Average number of kisses received by users. | | Mean Visits Received| Average number of profile visits received by users. | | Mean Followers | Average number of followers for each user. | | Mean Languages Known| Average number of languages known by users. | | Total Want Chats | Total count of users interested in chats. | | Total Want Friends | Total count of users looking for friendships. | | Total Want Dates | Total count of users seeking romantic dates. | | Total Kisses Received| Overall count of kisses received by users. | | Total Visits Received| Overall count of profile visits received by users. | | Total Followers | Overall count of followers for all users. | | Total Languages Spoken| Total count of languages spoken by all users. |

    SUMMARY

    When Dating apps like Tinder were becoming viral, people wanted to have the best profile in order to get more matches and more potential encounters. Unlike other previous dating platforms, those new ones emphasized on the mutuality of attraction before allowing any two people to get in touch and chat. This made it all the more important to create the best profile in order to get the best first impression.

    Parallel to that, we Humans have always been in awe before charismatic and inspiring people. The more charismatic people tend to be followed and listened to by more people. Through their metrics such as the number of friends/followers, social networks give some ways of "measuring" the potential charisma of some people.

    In regard to all that, one can then think:

    what makes a great user profile ? how to make the best first impression in order to get more matches (and ultimately find love, or new friendships) ? what makes a person charismatic ? how do charismatic people present themselves ? In order to try and understand those different social questions, I decided to create a dataset of user profile informations using the social network Lovoo when it came out. By using different methodologies, I was able to gather user profile data, as well as some usually unavailable metrics (such as the number of profile visits).

    Content

    The dataset contains user profile infos of users of the website Lovoo.

    The dataset was gathered during spring 2015 (april, may). At that time, Lovoo was expanding in european countries (among others), while Tinder was trending both in America and in Europe. At that time the iOS version of the Lovoo app was in version 3.

    Accessory image data The dataset references pictures (field pictureId) of user profiles. These pictures are also available for a fraction of users but have not been uploaded and should be asked separately.

    The idea when gathering the profile pictures was to determine whether some correlations could be identified between a profile picture and the reputation or success of a given profile. Since first impression matters, a sound hypothesis to make is that the profile picture might have a great influence on the number of profile visits, matches and so on. Do not forget that only a fraction of a user's profile is seen when browsing through a list of users.

    https://s1.dmcdn.net/v/BnWkG1M7WuJDq2PKP/x480

    Details about collection methodology In order to gather the data, I developed a set of tools that would save the data while browsing through profiles and doing searches. Because of this approach (and the constraints that forced me to develop this approach) I could only gather user profiles that were recommended by Lovoo's algorithm for 2 profiles I created for this purpose occasion (male, open to friends & chats & dates). That is why there are only female users in the dataset. Another work could be done to fetch similar data for both genders or other age ranges.

    Regarding the number of user profiles It turned out that the recommendation algorithm always seemed to output the same set of user profiles. This meant Lovoo's algorithm was probably heavily relying on settings like location (to recommend more people nearby than people in different places or countries) and maybe cookies. This diminished the number of different user profiles that would be pr...

  2. o

    Dating App Sentiment Analysis Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Dating App Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/consumer/77355978-301e-414e-8094-a205b7a505b6
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset provides a collection of user reviews and ratings for dating applications, primarily sourced from the Google Play Store for the Indian region between 2017 and 2022. It offers valuable insights into user sentiment, evolving trends, and common feedback regarding dating apps. The data is particularly useful for practising Natural Language Processing (NLP) tasks such as sentiment analysis, topic modelling, and identifying user concerns.

    Columns

    • Index: A unique identifier for each review entry.
    • Name: The name of the user who left the review.
    • Username: The username of the reviewer.
    • Review: The textual content of the review left by the user.
    • Rating: The numerical rating given by the user to the app, indicating their satisfaction level.
    • #ThumbsUp: A measure of how useful the review was perceived to be by other users.
    • Date&Time: The specific date and time when the review was posted.
    • App: The name of the dating application being reviewed.
    • Label Count: A numerical label, the specific purpose of which is not detailed in the provided information, but it appears to relate to ranges of index or other numerical values within the dataset.

    Distribution

    The dataset is typically provided in a CSV file format. It contains a substantial number of records, estimated to be around 527,000 individual reviews. This makes it suitable for large-scale data analysis and machine learning projects. The dataset structure is tabular, with clearly defined columns for review content, metadata, and user feedback. Specific row/record counts are not exact but are indicated by the extensive range of index labels.

    Usage

    This dataset is ideally suited for a variety of analytical and machine learning applications: * Analysing trends in dating app usage and perception over the years. * Determining which dating applications receive more favourable responses and if this consistency has changed over time. * Identifying common issues reported by users who give low ratings (below 3/5). * Investigating the correlation between user enthusiasm and their app ratings. * Performing sentiment analysis on review texts to gauge overall user sentiment. * Developing Natural Language Processing (NLP) models for text classification, entity recognition, or summarisation. * Examining the perceived usefulness of top-rated reviews. * Understanding user behaviour and preferences across different dating apps.

    Coverage

    The dataset primarily covers user reviews from the Google Play Store, specifically for the Indian country region ('in'), despite being titled as "all regions" in some contexts. The data spans a time range from 2017 to 2022, offering a multi-year perspective on dating app trends and user feedback. There are no specific demographic details for the reviewers themselves beyond their reviews and ratings.

    License

    CCO

    Who Can Use It

    This dataset is suitable for: * Data Scientists and Analysts: For conducting deep dives into user sentiment, trend analysis, and predictive modelling. * NLP Practitioners and Researchers: As a practical dataset for training and evaluating natural language processing models, especially for text classification and sentiment analysis tasks. * App Developers and Product Managers: To understand user feedback, identify areas for improvement in their own or competing dating applications, and inform product development strategies. * Market Researchers: To gain insights into the consumer behaviour and preferences within the online dating market. * Students and Beginners: It is tagged as 'Beginner' friendly, making it a good resource for those new to data analysis or NLP projects.

    Dataset Name Suggestions

    • Google Play Dating App Reviews (India, 2017-2022)
    • Indian Dating App User Reviews
    • Mobile Dating App Reviews & Ratings
    • Dating App Sentiment Analysis Dataset
    • Google Play Dating App Feedback

    Attributes

    Original Data Source: Dating Apps Reviews 2017-2022 (all regions)

  3. J

    Effect of Online Dating on Assortative Mating: Evidence from South Korea...

    • journaldata.zbw.eu
    • jda-test.zbw.eu
    stata do, txt
    Updated Dec 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soohyung Lee; Soohyung Lee (2022). Effect of Online Dating on Assortative Mating: Evidence from South Korea (replication data) [Dataset]. http://doi.org/10.15456/jae.2022326.0659077625
    Explore at:
    txt(1209), stata do(8522)Available download formats
    Dataset updated
    Dec 7, 2022
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Soohyung Lee; Soohyung Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Korea
    Description

    Online dating services have increased in popularity around the world, but a lack of quality data hinders our understanding of their role in family formation. This paper studies the effect of online dating services on marital sorting, using a novel dataset with verified information on people and their spouses. Estimates based on matching techniques suggest that, relative to other spouse search methods, online dating promotes marriages that exhibit weaker sorting along occupation and geographical proximity but stronger sorting along education and other demographic traits. Sensitivity analysis, including the Rosenbaum Bounds approach, suggests that online dating's impact on marital sorting is robust to potential selection bias.

  4. f

    Data_Sheet_1_Polar Similars: Using Massive Mobile Dating Data to Predict...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jon Levy; Devin Markell; Moran Cerf (2023). Data_Sheet_1_Polar Similars: Using Massive Mobile Dating Data to Predict Synchronization and Similarity in Dating Preferences.docx [Dataset]. http://doi.org/10.3389/fpsyg.2019.02010.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Jon Levy; Devin Markell; Moran Cerf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Leveraging a massive dataset of over 421 million potential matches between single users on a leading mobile dating application, we were able to identify numerous characteristics of effective matching. Effective matching is defined as the exchange of contact information with the likely intent to meet in person. The characteristics of effective match include alignment of psychological traits (i.e., extroversion), physical traits (i.e., height), personal choices (i.e., desiring the same relationship type), and shared experiences. For nearly all characteristics, the more similar the individuals were, the higher the likelihood was of them finding each other desirable and opting to meet in person. The only exception was introversion, where introverts rarely had an effective match with other introverts. When investigating the preliminary stages of the choice process we looked at the consistency between the choice of men/women, the time it took users to make these binary choices, and the tendency of yes/no decisions. We used a biologically inspired choice model to estimate the decision process and could predict the selection and response time with nearly 60% accuracy. Given that people make their initial selection in no more than 11 s, and ultimately prefer a partner who shares numerous attributes with them, we suggest that users are less selective in their early preferences and gradually, during their conversation, converge onto clusters that share a high degree of similarity in characteristics.

  5. o

    Bumble Dating App Reviews Dataset

    • opendatabay.com
    .undefined
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Bumble Dating App Reviews Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/75525cb3-a9aa-42fe-b336-09411e9d2f7b
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Reviews & Ratings
    Description

    This dataset contains user reviews and comments from the Bumble dating application on the Google Play Store. Bumble is an online dating app where, in heterosexual matches, female users typically initiate the first contact. Beyond romantic connections, Bumble also facilitates finding friends through "BFF mode" and business networking via "Bumble Bizz". This dataset is valuable for understanding user experiences and sentiment towards the app.

    Columns

    • reviewId: A unique identifier for each user's review.
    • userName: The name of the user who posted the review.
    • userImage: A URL to the user's profile image.
    • content: The textual comment or feedback provided by the user.
    • score: The rating given by the user, ranging from 1 to 5.
    • thumbsUpCount: The number of 'thumbs up' or likes a specific comment received.
    • reviewCreatedVersion: The version number of the app on which the review was created.
    • at: The date and time when the review was created.
    • replyContent: The content of any reply made by the Bumble company to the user's comment.
    • repliedAt: The date and time when the company's reply was posted.

    Distribution

    The dataset is typically provided as a data file, often in CSV format. It appears to contain a substantial number of records, with reviewId having 168,651 unique values. The data quality is rated as 5 out of 5, and the version of this dataset is 1.0.

    Usage

    This dataset is ideal for: * Natural Language Processing (NLP) tasks, such as sentiment analysis of user comments. * Market research to gain insights into user satisfaction and preferences regarding dating apps. * Analysing app performance based on user ratings and feedback. * Studying trends in social networks and popular culture related to online dating. * Identifying common user issues or popular features within the Bumble app.

    Coverage

    The dataset is global in its geographic scope. The reviews span a time period from 29 November 2015 to 28 June 2025. It primarily covers the experiences of Google Play Store users of the Bumble app. As of June 2016, 46.2% of Bumble's users were female.

    License

    CC-BY

    Who Can Use It

    • Data scientists and machine learning engineers interested in text analysis and sentiment modelling.
    • App developers seeking direct user feedback to improve application features and user experience.
    • Researchers focusing on online dating dynamics, social media behaviour, and popular culture.
    • Businesses aiming to understand consumer sentiment and competitive landscapes in the social networking and dating industries.

    Dataset Name Suggestions

    • Bumble Google Play Reviews
    • Bumble App User Feedback
    • Bumble Play Store Ratings
    • Bumble Dating App Reviews Dataset

    Attributes

    Original Data Source: Bumble Dating App - Google Play Store Review

  6. f

    Anonymised dataset.

    • plos.figshare.com
    xlsx
    Updated May 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelica Emery-Rhowbotham; Helen Killaspy; Sharon Eager; Brynmor Lloyd-Evans (2025). Anonymised dataset. [Dataset]. http://doi.org/10.1371/journal.pmen.0000184.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 8, 2025
    Dataset provided by
    PLOS Mental Health
    Authors
    Angelica Emery-Rhowbotham; Helen Killaspy; Sharon Eager; Brynmor Lloyd-Evans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most people seek to establish romantic or intimate relationships in life, including people with mental health problems. However, this has been a neglected topic in mental health practice and research. This study aimed to investigate views of mental health and social care staff about the appropriateness of helping service users with romantic relationships, barriers to doing this, and suggestions for useful ways to support this. An online survey comprising both closed, multiple response and free-text questions was circulated to mental health organisations across the U.K. via social media, professional networks and use of snowballing sampling. A total of 63 responses were received. Quantitative data were analysed using descriptive statistics, and are reported as frequencies and percentages. Qualitative data were interpreted using thematic analysis, using an inductive approach. Although most participants reported that ‘finding a relationship’ conversations were appropriate in their job role, many barriers to supporting service users were identified, including: a lack of training; concerns about professional boundaries; concerns about service user capacity and vulnerability; and concerns about being intrusive. Participant suggestions for future support included educating service users on safe dating behaviours, and practical interventions such as assisting service users to use dating sites and engage with social activities to develop social skills and meet others. Staff were willing to help service users seek an intimate relationship but may need specific training or guidance to facilitate this confidently and safely. This study elucidates the need for further research in this area, particularly in understanding service user perspectives, and in developing resources to support staff in this work.

  7. World Marriage Data (UN Population Division)

    • kaggle.com
    Updated Dec 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitish Jaipuria (2020). World Marriage Data (UN Population Division) [Dataset]. https://www.kaggle.com/datasets/thirstysoul/world-marriage-data-un-population-division
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2020
    Dataset provided by
    Kaggle
    Authors
    Nitish Jaipuria
    Area covered
    United Nations
    Description

    Content

    World Marriage Data 2012 provides a comparable and up-to-date set of data on the marital status of the population for all countries and areas of the world. Data are presented for the closest date available around five reference dates: the years closest to 1970, 1985, 1995, 2005 and the most recent data available.

    Acknowledgements

    I have pulled this data from the United Nations Data portal, did some simple post-processing to make it more user-friendly.

    Inspiration

    I primarily feel this data will be useful in conjunction with other datasets related to different disciplines wherein understanding the marriage trends will add value to the analysis.

  8. How Couples Meet and Stay Together (HCMST)

    • redivis.com
    application/jsonl +7
    Updated Nov 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University Libraries (2022). How Couples Meet and Stay Together (HCMST) [Dataset]. http://doi.org/10.57761/ktkz-wg93
    Explore at:
    spss, arrow, application/jsonl, stata, avro, sas, parquet, csvAvailable download formats
    Dataset updated
    Nov 3, 2022
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford University Libraries
    Description

    Abstract

    How Couples Meet and Stay Together (HCMST) is a study of how Americans meet their spouses and romantic partners.

    • The study is a nationally representative study of American adults.
    • 4,002 adults responded to the survey, 3,009 of those had a spouse or main
      romantic partner.
    • The study oversamples self-identified gay, lesbian, and bisexual adults
    • Follow-up surveys were implemented one and two years after the main survey, to study couple dissolution rates. Version 3.0 of the dataset includes two follow- up surveys, waves 2 and 3.
    • Waves 4 and 5 are provided as separate data files that can be linked back to the main file via variable caseid_new.

    The study will provide answers to the following research questions:

    1. Do traditional couples and nontraditional couples meet in the same way? What kinds of couples are more likely to have met online?
    2. Have the most recent marriage cohorts (especially the traditional heterosexual same-race married couples) met in the same way their parents and grandparents did?
    3. Does meeting online lead to greater or less couple stability?
    4. How do the couple dissolution rates of nontraditional couples compare to the couple dissolution rates of more traditional same-race heterosexual couples?
    5. How does the availability of civil union, domestic partnership or same-sex marriage rights affect couple stability for same-sex couples? This study will provide the first nationally representative data on the couple dissolution rates of same-sex couples.

    Methodology

    Universe:

    The universe for the HCMST survey is English literate adults in the U.S.

    **Unit of Analysis: **

    Individual

    **Type of data collection: **

    Survey Data

    **Time of data collection: **

    Wave I, the main survey, was fielded between February 21 and April 2, 2009. Wave 2 was fielded March 12, 2010 to June 8, 2010. Wave 3 was fielded March 22, 2011 to August 29, 2011. Wave 4 was fielded between March and November of 2013. Wave 5 was fielded between November, 2014 and March, 2015. Dates for the background demographic surveys are described in the User's Guide, under documentation below.

    Geographic coverage:

    United States of America

    Smallest geographic unit:

    US region

    **Sample description: **

    The survey was carried out by survey firm Knowledge Networks (now called GfK). The survey respondents were recruited from an ongoing panel. Panelists are recruited via random digit dial phone survey. Survey questions were mostly answered online; some follow-up surveys were conducted by phone. Panelists who did not have internet access at home were given an internet access device (WebTV). For further information about how the Knowledge Networks hybrid phone-internet survey compares to other survey methodology, see attached documentation.

    The dataset contains variables that are derived from several sources. There are variables from the Main Survey Instrument, there are variables generated from the investigators which were created after the Main Survey, and there are demographic background variables from Knowledge Networks which pre-date the Main Survey. Dates for main survey and for the prior background surveys are included in the dataset for each respondent. The source for each variable is identified in the codebook, and in notes appended within the dataset itself (notes may only be available for the Stata version of the dataset).

    Respondents who had no spouse or main romantic partner were dropped from the Main Survey. Unpartnered respondents remain in the dataset, and demographic background variables are available for them.

    **Sample response rate: **

    Response to the main survey in 2009 from subjects, all of whom were already in the Knowledge Networks panel, was 71%. If we include the the prior initial Random Digit Dialing phone contact and agreement to join the Knowledge Networks panel (participation rate 32.6%), and the respondents’ completion of the initial demographic survey (56.8% completion), the composite overall response rate is a much lower .326*.568*.71= 13%. For further information on the calculation of response rates, and relevant citations, see the Note on Response Rates in the documentation. Response rates for the subsequent waves of the HCMST survey are simpler, using the denominator of people who completed wave 1 and who were eligible for follow-up. Response to wave 2 was 84.5%. Response rate to wave 3 was 72.9%. Response rate to wave 4 was 60.0%. Response rate to wave 5 was 46%. Response to wave 6 was 91.3%. Wave 6 was Internet only, so people who had left the GfK KnowledgePanel were not contacted.

    **Weights: **

    See "Notes on the Weights" in the Documentation section.

    Usage

    When you use the data, you agree to the following conditions:

    1. I will not use the data to identify individuals.
    2. I will not charge a fee for the data if I distribute it to others.
    3. I will inform the contact person abo
  9. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Taiwan, Tunisia, Isle of Man, Canada, Moldova (Republic of), Bangladesh, Nepal, Andorra, British Indian Ocean Territory, Northern Mariana Islands
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  10. Predict the Match Percentage

    • kaggle.com
    Updated Dec 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Mittal (2020). Predict the Match Percentage [Dataset]. https://www.kaggle.com/datasets/mittalvasu95/predict-the-match-percentage/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aditya Mittal
    Description

    Context

    In an era where technology plays a significant role in people’s lives, one cannot deny that it changes the way people interact and communicate with others. Today, technology has caused some significant changes in the dating world as well. Online dating is a new trend that is influencing many people around the world.

    As a data scientist, you are required to predict the match percentage between the users in a matrix format based on the attributes provided by the user on a dating website.

    Note

    Based on the user’s sexual orientation, you are required to perform the following:

    1. If a user is heterosexual (prefers the opposite sex), then the match percentage must be 0 for this user with respect to other users of the same gender if the other users have the same behavior.
    2. f a user is a homosexual (prefers the same sex), then the match percentage must be 0 for this user with respect to other users of the opposite gender if the other users have the same behavior.
    3. The match percentage of a user with her/himself must be zero.

    Content

    The data is of a dating site that describes the user from various attributes like sex, orientation, relationship status, smokes or not, languages known, etc.

    Acknowledgements

    This is a competition on HackerEarth. https://www.hackerearth.com/problem/machine-learning/predict-the-match-percentage-25-818cf487/

    Inspiration

    The idea is to find the match percentage between each user with another user. Also, consider the note point in a context.

  11. F

    General domain Human-Human conversation chats in English

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). General domain Human-Human conversation chats in English [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    This training dataset comprises more than 10,000 conversational text data between two native English people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.

    These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.

    These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.

    This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.

    This training dataset's licence belongs to FutureBeeAI!

  12. w

    Dataset of individuals using the Internet and tax revenue of countries per...

    • workwithdata.com
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of individuals using the Internet and tax revenue of countries per year in El Salvador and in 2021 (Historical) [Dataset]. https://www.workwithdata.com/datasets/countries-yearly?col=country%2Cdate%2Cinternet_pct%2Ctax_revenue_pct_gdp&f=2&fcol0=country&fcol1=date&fop0=%3D&fop1=%3D&fval0=El+Salvador&fval1=2021
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    El Salvador
    Description

    This dataset is about countries per year in El Salvador. It has 1 row and is filtered where the date is 2021. It features 4 columns: country, tax revenue, and individuals using the Internet.

  13. m

    Data from the experimental project 'Love or politics? Political views...

    • data.mendeley.com
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Beloborodova (2024). Data from the experimental project 'Love or politics? Political views regarding the war in Ukraine in an online dating experiment' by Anna Beloborodova [Dataset]. http://doi.org/10.17632/629wv9zm8p.3
    Explore at:
    Dataset updated
    Aug 27, 2024
    Authors
    Anna Beloborodova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    This dataset contains data from the experiment and python code for the project titled “Love or politics? Political views regarding the war in Ukraine in an online dating experiment”.

    Paper abstract: Political views affect various behaviors, including relationship formation. This study conducts a field experiment on a large Russian dating site and gathers data from over 3,000 profile evaluations. The findings reveal significant penalties for those who express pro-war or anti-war positions on their dating profiles. Age emerges as the most polarizing factor: younger individuals are less likely to approach pro-war profiles but not anti-war ones, whereas older individuals are less likely to respond positively to profiles indicating anti-war views but not pro-war ones. The results align with survey evidence of a positive relationship between respondents' age and expressed support for the war in Russia, although the experiment indicates a higher degree of polarization. Overall, the experimental findings demonstrate that survey data can reveal trends and relationships between individuals' characteristics and their opinions, but may overstate the levels of support for government agendas in non-democratic states.

    The experiment was conducted in October - November, 2022, on a large online dating site in Russia in three Russian regions: Moscow, Saint Petersburg, and Sverdlovskaya oblast. There are three separate data files, one for each region. Each file contains information on dating site users that have been liked by and/or have viewed the experimental profiles.

    File ExperimentDataMainLikedUsers.csv contains data on the main sample of liked users. The hair color of these users was recorded from profile photos whenever possible. Weights have also been added to enable analysis with adjustment for differences in age distribution between dating site users and a subset of the Russian population that shares similar observable characteristics.

    The folder also contains python code for data analysis.

    The description of the study is available at https://mpra.ub.uni-muenchen.de/120731/

  14. w

    Dataset of individuals using the Internet and tax revenue of countries per...

    • workwithdata.com
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of individuals using the Internet and tax revenue of countries per year in Maldives and in 2021 (Historical) [Dataset]. https://www.workwithdata.com/datasets/countries-yearly?col=country%2Cdate%2Cinternet_pct%2Ctax_revenue_pct_gdp&f=2&fcol0=country&fcol1=date&fop0=%3D&fop1=%3D&fval0=Maldives&fval1=2021
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Maldives
    Description

    This dataset is about countries per year in Maldives. It has 1 row and is filtered where the date is 2021. It features 4 columns: country, tax revenue, and individuals using the Internet.

  15. Data from: #PraCegoVer dataset

    • zenodo.org
    Updated Jan 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Oliveira dos Santos; Gabriel Oliveira dos Santos; Esther Luna Colombini; Esther Luna Colombini; Sandra Avila; Sandra Avila (2023). #PraCegoVer dataset [Dataset]. http://doi.org/10.5281/zenodo.7548638
    Explore at:
    Dataset updated
    Jan 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gabriel Oliveira dos Santos; Gabriel Oliveira dos Santos; Esther Luna Colombini; Esther Luna Colombini; Sandra Avila; Sandra Avila
    Description

    Automatically describing images using natural sentences is an essential task for visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer, and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    #PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    New Release

    We release pracegover_400k.json which contains 403,337 examples from the original dataset.json after preprocessing and duplication removal. It is split into train, validation, and test with 242036, 80628, and 80673 examples, respectively.

    Dataset Structure

    #PraCegoVer dataset comprehends a main file dataset.json and a collection of compressed files named images.tar.gz.partX
    containing the images. The file dataset.json comprehends a list of JSON objects with the attributes:

    • user: anonymized user that made the post;
    • filename: image file name;
    • raw_caption: raw caption;
    • caption: clean caption;
    • date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k, 173k, and 400k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz
    tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in the PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

  16. d

    Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakur, Nirmalya (2023). Twitter Big Data as A Resource For Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions [Dataset]. http://doi.org/10.7910/DVN/VPPTRF
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Thakur, Nirmalya
    Description

    Please cite the following paper when using this dataset: N. Thakur, “Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions,” Preprints, 2022, DOI: 10.20944/preprints202206.0383.v1 Abstract The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and use cases in assisted living, military, healthcare, firefighting, and industries. With the projected increase in the diverse uses of exoskeletons in the next few years in these application domains and beyond, it is crucial to study, interpret, and analyze user perspectives, public opinion, reviews, and feedback related to exoskeletons, for which a dataset is necessary. The Internet of Everything era of today's living, characterized by people spending more time on the Internet than ever before, holds the potential for developing such a dataset by mining relevant web behavior data from social media communications, which have increased exponentially in the last few years. Twitter, one such social media platform, is highly popular amongst all age groups, who communicate on diverse topics including but not limited to news, current events, politics, emerging technologies, family, relationships, and career opportunities, via tweets, while sharing their views, opinions, perspectives, and feedback towards the same. Therefore, this work presents a dataset of about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. Instructions: This dataset contains about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. The dataset contains only tweet identifiers (Tweet IDs) due to the terms and conditions of Twitter to re-distribute Twitter data only for research purposes. They need to be hydrated to be used. The process of retrieving a tweet's complete information (such as the text of the tweet, username, user ID, date and time, etc.) using its ID is known as the hydration of a tweet ID. The Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) or any similar application may be used for hydrating this dataset. Data Description This dataset consists of 7 .txt files. The following shows the number of Tweet IDs and the date range (of the associated tweets) in each of these files. Filename: Exoskeleton_TweetIDs_Set1.txt (Number of Tweet IDs – 22945, Date Range of Tweets - July 20, 2021 – May 21, 2022) Filename: Exoskeleton_TweetIDs_Set2.txt (Number of Tweet IDs – 19416, Date Range of Tweets - Dec 1, 2020 – July 19, 2021) Filename: Exoskeleton_TweetIDs_Set3.txt (Number of Tweet IDs – 16673, Date Range of Tweets - April 29, 2020 - Nov 30, 2020) Filename: Exoskeleton_TweetIDs_Set4.txt (Number of Tweet IDs – 16208, Date Range of Tweets - Oct 5, 2019 - Apr 28, 2020) Filename: Exoskeleton_TweetIDs_Set5.txt (Number of Tweet IDs – 17983, Date Range of Tweets - Feb 13, 2019 - Oct 4, 2019) Filename: Exoskeleton_TweetIDs_Set6.txt (Number of Tweet IDs – 34009, Date Range of Tweets - Nov 9, 2017 - Feb 12, 2019) Filename: Exoskeleton_TweetIDs_Set7.txt (Number of Tweet IDs – 11351, Date Range of Tweets - May 21, 2017 - Nov 8, 2017) Here, the last date for May is May 21 as it was the most recent date at the time of data collection. The dataset would be updated soon to incorporate more recent tweets.

  17. n

    Satellite (VIIRS) Thermal Hotspots and Fire Activity - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Satellite (VIIRS) Thermal Hotspots and Fire Activity - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/satellite-viirs-thermal-hotspots-and-fire-activity
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    This layer presents detectable thermal activity from VIIRS satellites for the last 7 days. VIIRS Thermal Hotspots and Fire Activity is a product of NASA’s Land, Atmosphere Near real-time Capability for EOS (LANCE) Earth Observation Data, part of NASA's Earth Science Data.Consumption Best Practices: As a service that is subject to Viral loads (very high usage), avoid adding Filters that use a Date/Time type field. These queries are not cacheable and WILL be subject to Rate Limiting by ArcGIS Online. To accommodate filtering events by Date/Time, we encourage using the included "Age" fields that maintain the number of Days or Hours since a record was created or last modified compared to the last service update. These queries fully support the ability to cache a response, allowing common query results to be supplied to many users without adding load on the service.When ingesting this service in your applications, avoid using POST requests, these requests are not cacheable and will also be subject to Rate Limiting measures.Source: NASA LANCE - VNP14IMG_NRT active fire detection - WorldScale/Resolution: 375-meterUpdate Frequency: Hourly using the aggregated live feed methodologyArea Covered: WorldWhat can I do with this layer?This layer represents the most frequently updated and most detailed global remotely sensed wildfire information. Detection attributes include time, location, and intensity. It can be used to track the location of fires from the recent past, a few hours up to seven days behind real time. This layer also shows the location of wildfire over the past 7 days as a time-enabled service so that the progress of fires over that timeframe can be reproduced as an animation.The VIIRS thermal activity layer can be used to visualize and assess wildfires worldwide. However, it should be noted that this dataset contains many “false positives” (e.g., oil/natural gas wells or volcanoes) since the satellite will detect any large thermal signal.Fire points in this service are generally available within 3 1/4 hours after detection by a VIIRS device. LANCE estimates availability at around 3 hours after detection, and esri livefeeds updates this feature layer every 15 minutes from LANCE.Even though these data display as point features, each point in fact represents a pixel that is >= 375 m high and wide. A point feature means somewhere in this pixel at least one "hot" spot was detected which may be a fire.VIIRS is a scanning radiometer device aboard the Suomi NPP and NOAA-20 satellites that collects imagery and radiometric measurements of the land, atmosphere, cryosphere, and oceans in several visible and infrared bands. The VIIRS Thermal Hotspots and Fire Activity layer is a livefeed from a subset of the overall VIIRS imagery, in particular from NASA's VNP14IMG_NRT active fire detection product. The downloads are automatically downloaded from LANCE, NASA's near real time data and imagery site, every 15 minutes.The 375-m data complements the 1-km Moderate Resolution Imaging Spectroradiometer (MODIS) Thermal Hotspots and Fire Activity layer; they both show good agreement in hotspot detection but the improved spatial resolution of the 375 m data provides a greater response over fires of relatively small areas and provides improved mapping of large fire perimeters.Attribute informationLatitude and Longitude: The center point location of the 375 m (approximately) pixel flagged as containing one or more fires/hotspots.Satellite: Whether the detection was picked up by the Suomi NPP satellite (N) or NOAA-20 satellite (1). For best results, use the virtual field WhichSatellite, redefined by an arcade expression, that gives the complete satellite name.Confidence: The detection confidence is a quality flag of the individual hotspot/active fire pixel. This value is based on a collection of intermediate algorithm quantities used in the detection process. It is intended to help users gauge the quality of individual hotspot/fire pixels. Confidence values are set to low, nominal and high. Low confidence daytime fire pixels are typically associated with areas of sun glint and lower relative temperature anomaly (<15K) in the mid-infrared channel I4. Nominal confidence pixels are those free of potential sun glint contamination during the day and marked by strong (>15K) temperature anomaly in either day or nighttime data. High confidence fire pixels are associated with day or nighttime saturated pixels.Please note: Low confidence nighttime pixels occur only over the geographic area extending from 11 deg E to 110 deg W and 7 deg N to 55 deg S. This area describes the region of influence of the South Atlantic Magnetic Anomaly which can cause spurious brightness temperatures in the mid-infrared channel I4 leading to potential false positive alarms. These have been removed from the NRT data distributed by FIRMS.FRP: Fire Radiative Power. Depicts the pixel-integrated fire radiative power in MW (MegaWatts). FRP provides information on the measured radiant heat output of detected fires. The amount of radiant heat energy liberated per unit time (the Fire Radiative Power) is thought to be related to the rate at which fuel is being consumed (Wooster et. al. (2005)).DayNight: D = Daytime fire, N = Nighttime fireHours Old: Derived field that provides age of record in hours between Acquisition date/time and latest update date/time. 0 = less than 1 hour ago, 1 = less than 2 hours ago, 2 = less than 3 hours ago, and so on.Additional information can be found on the NASA FIRMS site FAQ.Note about near real time data:Near real time data is not checked thoroughly before it's posted on LANCE or downloaded and posted to the Living Atlas. NASA's goal is to get vital fire information to its customers within three hours of observation time. However, the data is screened by a confidence algorithm which seeks to help users gauge the quality of individual hotspot/fire points. Low confidence daytime fire pixels are typically associated with areas of sun glint and lower relative temperature anomaly (<15K) in the mid-infrared channel I4. Medium confidence pixels are those free of potential sun glint contamination during the day and marked by strong (>15K) temperature anomaly in either day or nighttime data. High confidence fire pixels are associated with day or nighttime saturated pixels.RevisionsSeptember 15, 2022: Updated to include 'Hours_Old' field. Time series has been disabled by default, but still available.July 5, 2022: Terms of Use updated to Esri Master License Agreement, no longer stating that a subscription is required!This layer is provided for informational purposes and is not monitored 24/7 for accuracy and currency.If you would like to be alerted to potential issues or simply see when this Service will update next, please visit our Live Feed Status Page!

  18. n

    MODIS Thermal (Last 48 hours) - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MODIS Thermal (Last 48 hours) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/modis-thermal-last-48-hours
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    This layer presents detectable thermal activity from MODIS satellites for the last 7 days. MODIS Global Fires is a product of NASA’s Earth Observing System Data and Information System (EOSDIS), part of NASA's Earth Science Data. EOSDIS integrates remote sensing and GIS technologies to deliver global MODIS hotspot/fire locations to natural resource managers and other stakeholders around the World.Consumption Best Practices: As a service that is subject to Viral loads (very high usage), avoid adding Filters that use a Date/Time type field. These queries are not cacheable and WILL be subject to Rate Limiting by ArcGIS Online. To accommodate filtering events by Date/Time, we encourage using the included "Age" fields that maintain the number of Days or Hours since a record was created or last modified compared to the last service update. These queries fully support the ability to cache a response, allowing common query results to be supplied to many users without adding load on the service.When ingesting this service in your applications, avoid using POST requests, these requests are not cacheable and will also be subject to Rate Limiting measures.Source: NASA FIRMS - Active Fire Data - for WorldScale/Resolution: 1kmUpdate Frequency: 1/2 Hour (every 30 minutes) using the Aggregated Live Feed MethodologyArea Covered: WorldWhat can I do with this layer?The MODIS thermal activity layer can be used to visualize and assess wildfires worldwide. However, it should be noted that this dataset contains many “false positives” (e.g., oil/natural gas wells or volcanoes) since the satellite will detect any large thermal signal.Additional InformationMODIS stands for MODerate resolution Imaging Spectroradiometer. The MODIS instrument is on board NASA’s Earth Observing System (EOS) Terra (EOS AM) and Aqua (EOS PM) satellites. The orbit of the Terra satellite goes from north to south across the equator in the morning and Aqua passes south to north over the equator in the afternoon resulting in global coverage every 1 to 2 days. The EOS satellites have a ±55 degree scanning pattern and orbit at 705 km with a 2,330 km swath width.It takes approximately 2 – 4 hours after satellite overpass for MODIS Rapid Response to process the data, and for the Fire Information for Resource Management System (FIRMS) to update the website. Occasionally, hardware errors can result in processing delays beyond the 2-4 hour range. Additional information on the MODIS system status can be found at MODIS Rapid Response.Attribute InformationLatitude and Longitude: The center point location of the 1km (approx.) pixel flagged as containing one or more fires/hotspots (fire size is not 1km, but variable). Stored by Point Geometry. See What does a hotspot/fire detection mean on the ground?Brightness: The brightness temperature measured (in Kelvin) using the MODIS channels 21/22 and channel 31.Scan and Track: The actual spatial resolution of the scanned pixel. Although the algorithm works at 1km resolution, the MODIS pixels get bigger toward the edge of the scan. See What does scan and track mean?Date and Time: Acquisition date of the hotspot/active fire pixel and time of satellite overpass in UTC (client presentation in local time). Stored by Acquisition Date.Acquisition Date: Derived Date/Time field combining Date and Time attributes.Satellite: Whether the detection was picked up by the Terra or Aqua satellite.Confidence: The detection confidence is a quality flag of the individual hotspot/active fire pixel.Version: Version refers to the processing collection and source of data. The number before the decimal refers to the collection (e.g. MODIS Collection 6). The number after the decimal indicates the source of Level 1B data; data processed in near-real time by MODIS Rapid Response will have the source code “CollectionNumber.0”. Data sourced from MODAPS (with a 2-month lag) and processed by FIRMS using the standard MOD14/MYD14 Thermal Anomalies algorithm will have a source code “CollectionNumber.x”. For example, data with the version listed as 5.0 is collection 5, processed by MRR, data with the version listed as 5.1 is collection 5 data processed by FIRMS using Level 1B data from MODAPS.Bright.T31: Channel 31 brightness temperature (in Kelvins) of the hotspot/active fire pixel.FRP: Fire Radiative Power. Depicts the pixel-integrated fire radiative power in MW (MegaWatts). FRP provides information on the measured radiant heat output of detected fires. The amount of radiant heat energy liberated per unit time (the Fire Radiative Power) is thought to be related to the rate at which fuel is being consumed (Wooster et. al. (2005)).DayNight: The standard processing algorithm uses the solar zenith angle (SZA) to threshold the day/night value; if the SZA exceeds 85 degrees it is assigned a night value. SZA values less than 85 degrees are assigned a day time value. For the NRT algorithm the day/night flag is assigned by ascending (day) vs descending (night) observation. It is expected that the NRT assignment of the day/night flag will be amended to be consistent with the standard processing.Hours Old: Derived field that provides age of record in hours between Acquisition date/time and latest update date/time. 0 = less than 1 hour ago, 1 = less than 2 hours ago, 2 = less than 3 hours ago, and so on.RevisionsJune 22, 2022: Added 'HOURS_OLD' field to enhance Filtering data. Added 'Last 7 days' Layer to extend data to match time range of VIIRS offering. Added Field level descriptions.This map is provided for informational purposes and is not monitored 24/7 for accuracy and currency.If you would like to be alerted to potential issues or simply see when this Service will update next, please visit our Live Feed Status Page!

  19. Tweet Sentiment's Impact on Stock Returns

    • kaggle.com
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Tweet Sentiment's Impact on Stock Returns [Dataset]. https://www.kaggle.com/datasets/thedevastator/tweet-sentiment-s-impact-on-stock-returns
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Tweet Sentiment's Impact on Stock Returns

    862,231 Labeled Instances

    By [source]

    About this dataset

    This dataset contains 862,231 labeled tweets and associated stock returns, providing a comprehensive look into the impact of social media on company-level stock market performance. For each tweet, researchers have extracted data such as the date of the tweet and its associated stock symbol, along with metrics such as last price and various returns (1-day return, 2-day return, 3-day return, 7-day return). Also recorded are volatility scores for both 10 day intervals and 30 day intervals. Finally, sentiment scores from both Long Short - Term Memory (LSTM) and TextBlob models have been included to quantify the overall tone in which these messages were delivered. With this dataset you will be able to explore how tweets can affect a company's share prices both short term and long term by leveraging all of these data points for analysis!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use this dataset, users can utilize descriptive statistics such as histograms or regression techniques to establish relationships between tweet content & sentiment with corresponding stock return data points such as 1-day & 7-day returns measurements.

    The primary fields used for analysis include Tweet Text (TWEET), Stock symbol (STOCK), Date (DATE), Closing Price at the time of Tweet (LAST_PRICE) a range of Volatility measures 10 day Volatility(VOLATILITY_10D)and 30 day Volatility(VOLATILITY_30D ) for each Stock which capture changes in market fluctuation during different periods around when Twitter reactions occur. Additionally Sentiment Polarity analysis undertaken via two Machine learning algorithms LSTM Polarity(LSTM_POLARITY)and Textblob polarity provide insight into whether people are expressing positive or negative sentiments about each company at given times which again could influence thereby potentially influence Stock Prices over shorter term periods like 1-Day Returns(1_DAY_RETURN),2-Day Returns(2_DAY_RETURN)or longer term horizon like 7 Day Returns*7DAY RETURNS*.Finally MENTION field indicates if names/acronyms associated with Companies were specifically mentioned in each Tweet or not which gives extra insight into whether company specific contexts were present within individual Tweets aka “Company Relevancy”

    Research Ideas

    • Analyzing the degree to which tweets can influence stock prices. By analyzing relationships between variables such as tweet sentiment and stock returns, correlations can be identified that could be used to inform investment decisions.
    • Exploring natural language processing (NLP) models for predicting future market trends based on textual data such as tweets. Through testing and evaluating different text-based models using this dataset, better predictive models may emerge that can give investors advance warning of upcoming market shifts due to news or other events.
    • Investigating the impact of different types of tweets (positive/negative, factual/opinionated) on stock prices over specific time frames. By studying correlations between the sentiment or nature of a tweet and its effect on stocks, insights may be gained into what sort of news or events have a greater impact on markets in general

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: reduced_dataset-release.csv | Column name | Description | |:----------------------|:-------------------------------------------------------------------------------------------------------| | TWEET | Text of the tweet. (String) | | STOCK | Company's stock mentioned in the tweet. (String) | | DATE | Date the tweet was posted. (Date) | | LAST_PRICE | Company's last price at the time of tweeting. (Float) ...

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Utkarsh Singh (2023). Dating App Fame & Behavior [Dataset]. https://www.kaggle.com/utkarshx27/lovoo-dating-app-dataset/discussion
Organization logo

Dating App Fame & Behavior

Understand people's fame and behavior's on a dating app platform

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Utkarsh Singh
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13364933%2F23694fae55e2e76299358693ba6f32b9%2Flv-share.jpg?generation=1684843825246772&alt=media" alt=""> ➡️ There are total 3 datasets containing valuable information. ➡️ Understand people's fame and behavior's on a dating app platform. | Column Name | Description | |---------------------|------------------------------| | Age | The age of the user. | | Number of Users | The total number of users. | | Percent Want Chats | Percentage of users who want chats. | | Percent Want Friends| Percentage of users who want friendships. | | Percent Want Dates | Percentage of users who want romantic dates. | | Mean Kisses Received| Average number of kisses received by users. | | Mean Visits Received| Average number of profile visits received by users. | | Mean Followers | Average number of followers for each user. | | Mean Languages Known| Average number of languages known by users. | | Total Want Chats | Total count of users interested in chats. | | Total Want Friends | Total count of users looking for friendships. | | Total Want Dates | Total count of users seeking romantic dates. | | Total Kisses Received| Overall count of kisses received by users. | | Total Visits Received| Overall count of profile visits received by users. | | Total Followers | Overall count of followers for all users. | | Total Languages Spoken| Total count of languages spoken by all users. |

SUMMARY

When Dating apps like Tinder were becoming viral, people wanted to have the best profile in order to get more matches and more potential encounters. Unlike other previous dating platforms, those new ones emphasized on the mutuality of attraction before allowing any two people to get in touch and chat. This made it all the more important to create the best profile in order to get the best first impression.

Parallel to that, we Humans have always been in awe before charismatic and inspiring people. The more charismatic people tend to be followed and listened to by more people. Through their metrics such as the number of friends/followers, social networks give some ways of "measuring" the potential charisma of some people.

In regard to all that, one can then think:

what makes a great user profile ? how to make the best first impression in order to get more matches (and ultimately find love, or new friendships) ? what makes a person charismatic ? how do charismatic people present themselves ? In order to try and understand those different social questions, I decided to create a dataset of user profile informations using the social network Lovoo when it came out. By using different methodologies, I was able to gather user profile data, as well as some usually unavailable metrics (such as the number of profile visits).

Content

The dataset contains user profile infos of users of the website Lovoo.

The dataset was gathered during spring 2015 (april, may). At that time, Lovoo was expanding in european countries (among others), while Tinder was trending both in America and in Europe. At that time the iOS version of the Lovoo app was in version 3.

Accessory image data The dataset references pictures (field pictureId) of user profiles. These pictures are also available for a fraction of users but have not been uploaded and should be asked separately.

The idea when gathering the profile pictures was to determine whether some correlations could be identified between a profile picture and the reputation or success of a given profile. Since first impression matters, a sound hypothesis to make is that the profile picture might have a great influence on the number of profile visits, matches and so on. Do not forget that only a fraction of a user's profile is seen when browsing through a list of users.

https://s1.dmcdn.net/v/BnWkG1M7WuJDq2PKP/x480

Details about collection methodology In order to gather the data, I developed a set of tools that would save the data while browsing through profiles and doing searches. Because of this approach (and the constraints that forced me to develop this approach) I could only gather user profiles that were recommended by Lovoo's algorithm for 2 profiles I created for this purpose occasion (male, open to friends & chats & dates). That is why there are only female users in the dataset. Another work could be done to fetch similar data for both genders or other age ranges.

Regarding the number of user profiles It turned out that the recommendation algorithm always seemed to output the same set of user profiles. This meant Lovoo's algorithm was probably heavily relying on settings like location (to recommend more people nearby than people in different places or countries) and maybe cookies. This diminished the number of different user profiles that would be pr...

Search
Clear search
Close search
Google apps
Main menu