62 datasets found
  1. E-commerce - Users of a French C2C fashion store

    • kaggle.com
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    Kaggle
    Authors
    Jeffrey Mvutu Mabilama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Foreword

    This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

    My Telegram bot will answer your queries and allow you to contact me.

    Context

    There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

    Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

    This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

    • For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

    If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

    This dataset is part of a preview of a much larger dataset. Please contact me for more.

    Content

    The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

    Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Questions you might want to answer using this dataset:

    • Are e-commerce users interested in social network feature ?
    • Are my users active enough (compared to those of this dataset) ?
    • How likely are people from other countries to sign up in a C2C website ?
    • How many users are likely to drop off after years of using my service ?

    Example works:

    • Report(s) made using SQL queries can be found on the data.world page of the dataset.
    • Notebooks may be found on the Kaggle page of the dataset.

    License

    CC-BY-NC-SA 4.0

    For other licensing options, contact me.

  2. o

    How to make google plus posts private - Dataset - openAFRICA

    • open.africa
    Updated Jan 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). How to make google plus posts private - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/how-to-make-google-plus-posts-private
    Explore at:
    Dataset updated
    Jan 4, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number

  3. Insightful & Vast USA Statistics

    • kaggle.com
    Updated May 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golden Oak Research Group (2018). Insightful & Vast USA Statistics [Dataset]. https://www.kaggle.com/forums/f/6032/insightful-vast-usa-statistics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2018
    Dataset provided by
    Kaggle
    Authors
    Golden Oak Research Group
    Area covered
    United States
    Description

    Very Important

    • Check out the new must-see kernel for this dataset Click Here
    • Make Sure to upvote for more datasets and kernel :D

    Overview:

    Explore the dataset and potentially gain valuable insight into your data science project through interesting features. The dataset was developed for a portfolio optimization graduate project I was working on. The goal was to the monetize risk of company deleveraging by associated with changes in economic data. Applications of the dataset may include. To see the data in action visit my analytics page. Analytics Page & Dashboard and to access all 295,000+ records click here.

    • Mortgage-Backed Securities
    • Geographic Business Investment
    • Real Estate Analysis

    For any questions, you may reach us at research_development@goldenoakresearch.com. For immediate assistance, you may reach me on at 585-626-2965. Please Note: the number is my personal number and email is preferred

    Statistical Themes:

    Note: in total there are 75 fields the following are just themes the fields fall under Home Owner Costs: Sum of utilities, property taxes.

    • Second Mortgage: Households with a second mortgage statistics.
    • Home Equity Loan: Households with a Home equity Loan statistics.
    • Debt: Households with any type of debt statistics.
    • Mortgage Costs: Statistics regarding mortgage payments, home equity loans, utilities and property taxes
    • Home Owner Costs: Sum of utilities, property taxes statistics
    • Gross Rent: Contract rent plus the estimated average monthly cost of utility features
    • Gross Rent as Percent of Income Gross rent as the percent of income very interesting
    • High school Graduation: High school graduation statistics.
    • Population Demographics: Population demographic statistics.
    • Age Demographics: Age demographic statistics.
    • Household Income: Total income of people residing in the household.
    • Family Income: Total income of people related to the householder.

    Sources, if you wish to get the data your self :)

    2012-2016 ACS 5-Year Documentation was provided by the U.S. Census Reports. Retrieved May 2, 2018, from

    Access All 325,258 Location of Our Most Complete Database Ever:

    Providing you the potential to monetize risk and optimize your investment portfolio through quality economic features at unbeatable price. Access all 295,000+ records on an incredibly small scale, see links below for more details:

  4. LinkedIn Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 17, 2021
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

    Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

    Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

    Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

    Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.

  5. h

    fineweb

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FineData, fineweb [Dataset]. http://doi.org/10.57967/hf/2493
    Explore at:
    Dataset authored and provided by
    FineData
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    🍷 FineWeb

    15 trillion tokens of the finest data the 🌐 web has to offer

      What is it?
    

    The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.

  6. Texas County Voting Website Data

    • kaggle.com
    zip
    Updated Sep 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Russell (2020). Texas County Voting Website Data [Dataset]. https://www.kaggle.com/mewbius/lwv-oct-2017
    Explore at:
    zip(40347 bytes)Available download formats
    Dataset updated
    Sep 1, 2020
    Authors
    Emily Russell
    Area covered
    Texas
    Description

    General

    The League of Women Voters conducts surveys of Texas County voting websites. The data and further reading is available here (under County Website Reports). Any mistakes or errors found here are mine and the data on the LWV website is the authoritative data - I have no affiliation with the LWV but wanted to make the datasets more accessible.

    Data Changes

    I cleaned some of the data (split numeric and text ratings from one column to two columns) and made a few edits to values that appeared to be typos based on context - these will be noted in the description of each set. Column names were shortened in some cases and "NA" was added to empty cells. Each survey used slightly different questions, thought both 2016 sets appear to use the same ones and the 2017 is very similar.

    Commonalities

    Abbreviations used include SOS for the Texas Secretary of State website and 203 refers to Section 203 of the federal Voting Rights Act (for information, see this 2016 report).

    Each dataset has at least these columns: county name, fips, date, total points, overall evaluation, perc calc na, and perc calc num.

    • The county name was changed to match the name listed in the FIPS set, there were some typos and variations with hyphens.
    • The FIPS (Federal Information Processing Standards) code is from here.
    • The date is the month and year associated with the survey.
    • The total number of points is the sum of all points a county received.
    • The overall evaluation is the category associated with the number of points - these varied between sets and for 2020 the categories from the report were added to the dataset.
    • I added two columns to the end of each set, perc_calc_na and perc_calc_num that represent the percent of total points for that county out of the possible points for that dataset - the first has "NA" for any county without a website and the second has "-1" for those counties. Some of the surveys included bonus points - these were included in the total possible points for the calculation.
  7. P

    @##Can I add my frequent flyer number after booking? Dataset

    • paperswithcode.com
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). @##Can I add my frequent flyer number after booking? Dataset [Dataset]. https://paperswithcode.com/dataset/can-i-add-my-frequent-flyer-number-after-1
    Explore at:
    Dataset updated
    Jun 28, 2025
    Description

    Over 90% of Lufthansa travelers book flights online, and many forget to include their frequent flyer number. ☎️+1 (844) 459-5676 Fortunately, adding it afterward is very simple. ☎️+1 (844) 459-5676

    If you’ve already completed your Lufthansa booking and forgot your Miles & More number, don’t worry. ☎️+1 (844) 459-5676 You can still claim the miles you’ve earned. ☎️+1 (844) 459-5676

    The fastest way to add your frequent flyer number is by visiting Lufthansa’s official website. ☎️+1 (844) 459-5676 Log in to “My Bookings” with your credentials. ☎️+1 (844) 459-5676

    Use your six-digit booking code and last name to locate your itinerary. ☎️+1 (844) 459-5676 Once inside, you’ll see an option labeled “Add Frequent Flyer.” ☎️+1 (844) 459-5676

    Click that section, enter your Miles & More number, and save the changes. ☎️+1 (844) 459-5676 Lufthansa automatically updates the reservation and assigns mileage points. ☎️+1 (844) 459-5676

    If you booked your Lufthansa ticket through a third-party site, this method still works. ☎️+1 (844) 459-5676 Just make sure your name matches the frequent flyer profile. ☎️+1 (844) 459-5676

    Not a Miles & More member yet? Over 30 million people have joined for free benefits. ☎️+1 (844) 459-5676 Visit Lufthansa.com and enroll instantly to start. ☎️+1 (844) 459-5676

    Adding a frequent flyer number before departure ensures you don’t miss out on mileage points. ☎️+1 (844) 459-5676 These points can be redeemed for upgrades or rewards. ☎️+1 (844) 459-5676

    Some travelers forget to update partner programs. ☎️+1 (844) 459-5676 Lufthansa partners with airlines in Star Alliance, so you can add other loyalty numbers. ☎️+1 (844) 459-5676

    To use United MileagePlus, Singapore KrisFlyer, or Air Canada Aeroplan, follow the same process. ☎️+1 (844) 459-5676 Enter the number in your booking details section. ☎️+1 (844) 459-5676

    You can also call Lufthansa’s customer service to have a live agent help. ☎️+1 (844) 459-5676 Just dial and request to link your loyalty program. ☎️+1 (844) 459-5676

    Be ready to provide your reservation code and frequent flyer number during the call. ☎️+1 (844) 459-5676 The agent will confirm if the change was successful. ☎️+1 (844) 459-5676

    If you’ve already taken the flight but forgot to include your number, it’s still okay. ☎️+1 (844) 459-5676 Lufthansa allows retroactive credit for up to 6 months. ☎️+1 (844) 459-5676

    Visit the Miles & More website and navigate to “Claim Missing Miles.” ☎️+1 (844) 459-5676 Enter flight details, ticket number, and your membership ID. ☎️+1 (844) 459-5676

    Processing typically takes 3–4 weeks, and your account reflects miles earned from past trips. ☎️+1 (844) 459-5676 Lufthansa will notify you when points are posted. ☎️+1 (844) 459-5676

    Using your frequent flyer number not only earns miles, but also enables perks. ☎️+1 (844) 459-5676 These include early check-in, baggage allowance, and access to lounges. ☎️+1 (844) 459-5676

    Higher tier members also get benefits like priority boarding and mileage multipliers. ☎️+1 (844) 459-5676 If you’re loyal to Lufthansa, don’t skip the number. ☎️+1 (844) 459-5676

    You can add your frequent flyer number at the airport, too. ☎️+1 (844) 459-5676 Just head to a Lufthansa kiosk or speak to a representative. ☎️+1 (844) 459-5676

    Show them your ID and loyalty card. They’ll update the booking with your number. ☎️+1 (844) 459-5676 Don’t forget to ask for confirmation before departure. ☎️+1 (844) 459-5676

    Another method is using the Lufthansa mobile app. ☎️+1 (844) 459-5676 Go to your upcoming trip and tap “Edit traveler info” to enter the number. ☎️+1 (844) 459-5676

    To avoid forgetting again, save your number in your Lufthansa profile. ☎️+1 (844) 459-5676 This ensures future bookings auto-populate with your loyalty data. ☎️+1 (844) 459-5676

    Also note that miles can only be credited once per flight. ☎️+1 (844) 459-5676 You cannot earn miles on multiple programs for the same trip. ☎️+1 (844) 459-5676

    If you accidentally entered the wrong frequent flyer number, call Lufthansa support. ☎️+1 (844) 459-5676 An agent can delete the incorrect info and correct it. ☎️+1 (844) 459-5676

    In summary, yes—you can definitely add your frequent flyer number after booking. ☎️+1 (844) 459-5676 Just use online tools or call Lufthansa for help. ☎️+1 (844) 459-5676

  8. h

    male-selfie-image-dataset

    • huggingface.co
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2024). male-selfie-image-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/male-selfie-image-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2024
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Face Recognition, Face Detection, Male Photo Dataset 👨

      If you are interested in biometric data - visit our website to learn more and buy the dataset :)
    

    110,000+ photos of 74,000+ men from 141 countries. The dataset includes photos of people's faces. All people presented in the dataset are men. The dataset contains a variety of images capturing individuals from diverse backgrounds and age groups. Our dataset will diversify your data by adding more photos of men of… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/male-selfie-image-dataset.

  9. US Gross Rent ACS Statistics

    • kaggle.com
    Updated Aug 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golden Oak Research Group (2017). US Gross Rent ACS Statistics [Dataset]. https://www.kaggle.com/datasets/goldenoakresearch/acs-gross-rent-us-statistics/versions/3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Golden Oak Research Group
    Area covered
    United States
    Description

    What you get:

    Upvote! The database contains +40,000 records on US Gross Rent & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 325,272 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to upvote. Upvote right now, please. Enjoy!

    Get the full free database with coupon code: FreeDatabase, See directions at the bottom of the description... And make sure to upvote :) coupon ends at 2:00 pm 8-23-2017

    Gross Rent & Geographic Statistics:

    • Mean Gross Rent (double)
    • Median Gross Rent (double)
    • Standard Deviation of Gross Rent (double)
    • Number of Samples (double)
    • Square area of land at location (double)
    • Square area of water at location (double)

    Geographic Location:

    • Longitude (double)
    • Latitude (double)
    • State Name (character)
    • State abbreviated (character)
    • State_Code (character)
    • County Name (character)
    • City Name (character)
    • Name of city, town, village or CPD (character)
    • Primary, Defines if the location is a track and block group.
    • Zip Code (character)
    • Area Code (character)

    Abstract

    The data set originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.

    License

    Only proper citing is required please see the documentation for details. Have Fun!!!

    Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.

    For any questions, you may reach us at research_development@goldenoakresearch.com. For immediate assistance, you may reach me on at 585-626-2965

    please note: it is my personal number and email is preferred

    Check our data's accuracy: Census Fact Checker

    Access all 325,272 location for Free Database Coupon Code:

    Don't settle. Go big and win big. Optimize your potential**. Access all gross rent records and more on a scale roughly equivalent to a neighborhood, see link below:

    A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.

  10. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo (2024). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [Dataset]. http://doi.org/10.5281/zenodo.1188976
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

    The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

    Citing the RAVDESS

    The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.

    Academic paper citation

    Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

    Personal use citation

    Include a link to this Zenodo page - https://zenodo.org/record/1188976

    Commercial Licenses

    Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.

    Contact Information

    If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

    Example Videos

    Watch a sample of the RAVDESS speech and song videos.

    Emotion Classification Users

    If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].

    Construction and Validation

    Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.

    The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.

    Contents

    Audio-only files

    Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

    • Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.
    • Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

    Audio-Visual and Video-only files

    Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:

    • Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.
    • Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.

    File Summary

    In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

    File naming convention

    Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

    Filename identifiers

    • Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
    • Vocal channel (01 = speech, 02 = song).
    • Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
    • Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
    • Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
    • Repetition (01 = 1st repetition, 02 = 2nd repetition).
    • Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).


    Filename example: 02-01-06-01-02-01-12.mp4

    1. Video-only (02)
    2. Speech (01)
    3. Fearful (06)
    4. Normal intensity (01)
    5. Statement "dogs" (02)
    6. 1st Repetition (01)
    7. 12th Actor (12)
    8. Female, as the actor ID number is even.

    License information

    The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

    Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.

    Related Data sets

  11. California Medical Fee-for-Service

    • kaggle.com
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). California Medical Fee-for-Service [Dataset]. https://www.kaggle.com/datasets/thedevastator/california-medi-cal-fee-for-service-provider-inf
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    California
    Description

    California Medical Fee-for-Service

    Geographic, Legal and Specialty Data for In-State and Out-of-State Providers

    By California Health and Human Services [source]

    About this dataset

    Welcome to the California Health and Human Services Agency's Open Data Portal! Here, you can explore and utilize information from one of the state's most valuable assets: the non-confidential data set of Medi-Cal Fee-for-Service (FFS) program providers.

    This dataset provides insight into Medi-Cal FFS enrollment. The information was retrieved from the Provider Master File (PMF), which is maintained by the Provider Enrollment Division (PED). With this dataset, you will gain insights into provider number, legal name, type description, specialty description and other geographical data points such as county code, attention line address parts , landmark coordinate points (longitude/latitude) and more!

    The goal with this Open Data Portal initiative is to empower Californians with:

    • Increased public access to high quality health & human service data;
    • Stemmed creativity & innovation in research;
    • The ability to make informed decisions about our health & services providers;
    • Transparency in government policy expenditure measures.

    Our hope is that you'll use these tools for responsible data analytics exploration on not just Medi-Cal FFS provision but on any related subject matter that interest& benefit your community at large. Good luck & happy researching!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Research Ideas

    • Creating a mobile application or website to help people easily and quickly find their nearest Medi-Cal FFS providers based on location, specialty and provider type.
    • Developing analytics tools to help organizations understand the concentrations of providers across the state in order to inform decision making when considering regional expansion and improving service accessibility.
    • Developing a tool that visualizes specialty diversity across the state to identify areas with low provider density while helping inform strategies aimed at increasing access to care for communities with high needs populations

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Open Database License (ODbL) v1.0 - You are free to: - Share - copy and redistribute the material in any medium or format. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices. - No Derivatives - If you remix, transform, or build upon the material, you may not distribute the modified material. - No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

    Columns

    File: Profile_of_Enrolled_Medi-Cal_Fee-for-Service_FFS_Providers_as_of_May_1_2016.csv | Column name | Description | |:----------------------------|:---------------------------------------------------------------| | NPI | National Provider Identifier (Number) | | SERVICE LOCATION NUMBER | Unique identifier for the provider's service location (Number) | | LEGAL NAME | Legal name of the provider (Text) | | TYPE DESCRIPTION | Type of provider (Text) | | SPECIALTY DESCRIPTION | Specialty of the provider (Text) | | OUT OF STATE INDICATOR | Indicates if the provider is located out of state (Boolean) | | IN/OUT OF STATE | Indicates if the provider is located in or out of state (Text) | | COUNTY CODE | County code of the provider's service location (Number) | | COUNTY NAME | County name of the provider's service location (Text) | | ADDRESS ATTENTION | Attention line of the provider's address (Text) | | ADDRESS LINE 1 | First l...

  12. Lead Scoring Dataset

    • kaggle.com
    zip
    Updated Aug 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amrita Chatterjee (2020). Lead Scoring Dataset [Dataset]. https://www.kaggle.com/amritachatterjee09/lead-scoring-dataset
    Explore at:
    zip(411028 bytes)Available download formats
    Dataset updated
    Aug 17, 2020
    Authors
    Amrita Chatterjee
    Description

    Context

    An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.

    The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.

    Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.

    There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.

    X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.

    Content

    Variables Description * Prospect ID - A unique ID with which the customer is identified. * Lead Number - A lead number assigned to each lead procured. * Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc. * Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc. * Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not. * Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not. * Converted - The target variable. Indicates whether a lead has been successfully converted or not. * TotalVisits - The total number of visits made by the customer on the website. * Total Time Spent on Website - The total time spent by the customer on the website. * Page Views Per Visit - Average number of pages on the website viewed during the visits. * Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc. * Country - The country of the customer. * Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form. * How did you hear about X Education - The source from which the customer heard about X Education. * What is your current occupation - Indicates whether the customer is a student, umemployed or employed. * What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course. * Search - Indicating whether the customer had seen the ad in any of the listed items. * Magazine
    * Newspaper Article * X Education Forums
    * Newspaper * Digital Advertisement * Through Recommendations - Indicates whether the customer came in through recommendations. * Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses. * Tags - Tags assigned to customers indicating the current status of the lead. * Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead. * Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content. * Get updates on DM Content - Indicates whether the customer wants updates on the DM Content. * Lead Profile - A lead level assigned to each customer based on their profile. * City - The city of the customer. * Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile * Asymmetric Profile Index * Asymmetric Activity Score * Asymmetric Profile Score
    * I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not. * a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not. * Last Notable Activity - The last notable activity performed by the student.

    Acknowledgements

    UpGrad Case Study

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  13. San Francisco Crime Classification

    • kaggle.com
    Updated Nov 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2019). San Francisco Crime Classification [Dataset]. https://www.kaggle.com/datasets/kaggle/san-francisco-crime-classification/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2019
    Dataset authored and provided by
    Kaggle
    Area covered
    San Francisco
    Description

    Context

    From 1934 to 1963, San Francisco was infamous for housing some of the world's most notorious criminals on the inescapable island of Alcatraz. Today, the city is known more for its tech scene than its criminal past. But, with rising wealth inequality, housing shortages, and a proliferation of expensive digital toys riding BART to work, there is no scarcity of crime in the city by the bay. From Sunset to SOMA, and Marina to Excelsior, this dataset provides nearly 12 years of crime reports from across all of San Francisco's neighborhoods.

    This dataset was featured in our completed playground competition entitled San Francisco Crime Classification. The goals of the competition were to:

    Content

    This dataset contains incidents derived from SFPD Crime Incident Reporting system. The data ranges from 1/1/2003 to 5/13/2015. The training set and test set rotate every week, meaning week 1,3,5,7... belong to test set, week 2,4,6,8 belong to training set. There are 9 variables:

    • Dates - timestamp of the crime incident

    • Category - category of the crime incident (only in train.csv).

    • Descript - detailed description of the crime incident (only in train.csv)

    • DayOfWeek - the day of the week

    • PdDistrict - name of the Police Department District

    • Resolution - how the crime incident was resolved (only in train.csv)

    • Address - the approximate street address of the crime incident

    • X - Longitude

    • Y - Latitude

    Acknowledgements

    This dataset is part of our completed playground competition entitled San Francisco Crime Classification. Visit the competition page if you are interested in checking out past discussions, competition leaderboard, or more details regarding the competition. If you are curious to see how your results rank compared to others', you can still make a submission at the competition submission page!

    The original dataset is from SF OpenData, the central clearinghouse for data published by the City and County of San Francisco.

  14. o

    School information and student demographics

    • data.ontario.ca
    • datasets.ai
    • +1more
    xlsx
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education (2025). School information and student demographics [Dataset]. https://data.ontario.ca/dataset/school-information-and-student-demographics
    Explore at:
    xlsx(1565910), xlsx(1550796), xlsx(1566878), xlsx(1565304), xlsx(1562805), xlsx(1459001), xlsx(1462006), xlsx(1460629), xlsx(1500842), xlsx(1482917), xlsx(1547704), xlsx(1567330), xlsx(1580734), xlsx(1462064)Available download formats
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    Education
    License

    https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario

    Time period covered
    Jun 6, 2025
    Area covered
    Ontario
    Description

    Data includes: board and school information, grade 3 and 6 EQAO student achievements for reading, writing and mathematics, and grade 9 mathematics EQAO and OSSLT. Data excludes private schools, Education and Community Partnership Programs (ECPP), summer, night and continuing education schools.

    How Are We Protecting Privacy?

    Results for OnSIS and Statistics Canada variables are suppressed based on school population size to better protect student privacy. In order to achieve this additional level of protection, the Ministry has used a methodology that randomly rounds a percentage either up or down depending on school enrolment. In order to protect privacy, the ministry does not publicly report on data when there are fewer than 10 individuals represented.

      * Percentages depicted as 0 may not always be 0 values as in certain situations the values have been randomly rounded down or there are no reported results at a school for the respective indicator. * Percentages depicted as 100 are not always 100, in certain situations the values have been randomly rounded up.
    The school enrolment totals have been rounded to the nearest 5 in order to better protect and maintain student privacy.

    The information in the School Information Finder is the most current available to the Ministry of Education at this time, as reported by schools, school boards, EQAO and Statistics Canada. The information is updated as frequently as possible.

    This information is also available on the Ministry of Education's School Information Finder website by individual school.

    Descriptions for some of the data types can be found in our glossary.

    School/school board and school authority contact information are updated and maintained by school boards and may not be the most current version. For the most recent information please visit: https://data.ontario.ca/dataset/ontario-public-school-contact-information.

  15. Asylum and resettlement - Historic datasets

    • gov.uk
    Updated Aug 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2023). Asylum and resettlement - Historic datasets [Dataset]. https://www.gov.uk/government/statistical-data-sets/asylum-and-resettlement-datasets
    Explore at:
    Dataset updated
    Aug 24, 2023
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    This page contains data for the immigration system statistics up to March 2023.

    For current immigration system data, visit ‘Immigration system statistics data tables’.

    Asylum applications, decisions and resettlement

    https://assets.publishing.service.gov.uk/media/64625e6894f6df0010f5eaab/asylum-applications-datasets-mar-2023.xlsx">Asylum applications, initial decisions and resettlement (MS Excel Spreadsheet, 9.13 MB)
    Asy_D01: Asylum applications raised, by nationality, age, sex, UASC, applicant type, and location of application
    Asy_D02: Outcomes of asylum applications at initial decision, and refugees resettled in the UK, by nationality, age, sex, applicant type, and UASC
    This is not the latest data

    https://assets.publishing.service.gov.uk/media/64625ec394f6df0010f5eaac/asylum-applications-awaiting-decision-datasets-mar-2023.xlsx">Asylum applications awaiting a decision (MS Excel Spreadsheet, 1.26 MB)
    Asy_D03: Asylum applications awaiting an initial decision or further review, by nationality and applicant type
    This is not the latest data

    https://assets.publishing.service.gov.uk/media/62fa17698fa8f50b54374371/outcome-analysis-asylum-applications-datasets-jun-2022.xlsx">Outcome analysis of asylum applications (MS Excel Spreadsheet, 410 KB)
    Asy_D04: The initial decision and final outcome of all asylum applications raised in a period, by nationality
    This is not the latest data

    Age disputes

    https://assets.publishing.service.gov.uk/media/64625ef1427e41000cb437cb/age-disputes-datasets-mar-2023.xlsx">Age disputes (MS Excel Spreadsheet, 178 KB)
    Asy_D05: Age disputes raised and outcomes of age disputes
    This is not the latest data

    Asylum appeals

    https://assets.publishing.service.gov.uk/media/64625f0ca09dfc000c3c17cf/asylum-appeals-lodged-datasets-mar-2023.xlsx">Asylum appeals lodged and determined (MS Excel Spreadsheet, 817 KB)
    Asy_D06: Asylum appeals raised at the First-Tier Tribunal, by nationality and sex
    Asy_D07: Outcomes of asylum appeals raised at the First-Tier Tribunal, by nationality and sex
    This is not the latest data

    https://assets.publishing.service.gov.uk/media/64625f29427e41000cb437cd/asylum-claims-certified-section-94-datasets-mar-2023.xlsx"> Asylum claims certified under Section 94 (MS Excel Spreadsheet, 150 KB)
    Asy_D08: Initial decisions on asylum applications certified under Section 94, by nationality
    This is not the latest data

    Asylum support

    https://assets.publishing.service.gov.uk/media/6463a618d3231e000c32da99/asylum-seekers-receipt-support-datasets-mar-2023.xlsx">Asylum seekers in receipt of support (MS Excel Spreadsheet, 2.16 MB)
    Asy_D09: Asylum seekers in receipt of support at end of period, by nationality, support type, accommodation type, and UK region
    This is not the latest data

    https://assets.publishing.service.gov.uk/media/63ecd7388fa8f5612a396c40/applications-section-95-support-datasets-dec-2022.xlsx">Applications for section 95 su

  16. UCI Communities and Crime Unnormalized Data Set

    • kaggle.com
    Updated Feb 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kavitha (2018). UCI Communities and Crime Unnormalized Data Set [Dataset]. https://www.kaggle.com/kkanda/communities%20and%20crime%20unnormalized%20data%20set/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kavitha
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Introduction: The dataset used for this experiment is real and authentic. The dataset is acquired from UCI machine learning repository website [13]. The title of the dataset is ‘Crime and Communities’. It is prepared using real data from socio-economic data from 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crimedata from the 1995 FBI UCR [13]. This dataset contains a total number of 147 attributes and 2216 instances.

    The per capita crimes variables were calculated using population values included in the 1995 FBI data (which differ from the 1990 Census values).

    Content

    The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The crime attributes (N=18) that could be predicted are the 8 crimes considered 'Index Crimes' by the FBI)(Murders, Rape, Robbery, .... ), per capita (actually per 100,000 population) versions of each, and Per Capita Violent Crimes and Per Capita Nonviolent Crimes)

    predictive variables : 125 non-predictive variables : 4 potential goal/response variables : 18

    Acknowledgements

    http://archive.ics.uci.edu/ml/datasets/Communities%20and%20Crime%20Unnormalized

    U. S. Department of Commerce, Bureau of the Census, Census Of Population And Housing 1990 United States: Summary Tape File 1a & 3a (Computer Files),

    U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)

    U.S. Department of Justice, Bureau of Justice Statistics, Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)

    U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States (Computer File) (1995)

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    Data available in the dataset may not act as a complete source of information for identifying factors that contribute to more violent and non-violent crimes as many relevant factors may still be missing.

    However, I would like to try and answer the following questions answered.

    1. Analyze if number of vacant and occupied houses and the period of time the houses were vacant had contributed to any significant change in violent and non-violent crime rates in communities

    2. How has unemployment changed crime rate(violent and non-violent) in the communities?

    3. Were people from a particular age group more vulnerable to crime?

    4. Does ethnicity play a role in crime rate?

    5. Has education played a role in bringing down the crime rate?

  17. Brazilian Federal Legislative activity

    • kaggle.com
    zip
    Updated Dec 27, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irio Musskopf (2017). Brazilian Federal Legislative activity [Dataset]. https://www.kaggle.com/iriomk/brazilian-federal-legislative-activity
    Explore at:
    zip(54447105 bytes)Available download formats
    Dataset updated
    Dec 27, 2017
    Authors
    Irio Musskopf
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Brazil
    Description

    Brazilian? You can read a Portuguese version of this article here.

    Context

    Last year, while I was attending a data science course in Germany, my country was impeaching its president. My colleagues asked me to explain what was happening in Brazil and the possible political outcomes in South America. Although I was able to give a general context and tell multiple arguments in favor and against the impeachment, deep inside, my answer was "I really don't know".

    Understanding what happens in Politics is something that takes a lot of effort and research. When I decided I had to use my tech skills to make myself a better citizen, I dived into government data and started Operation Serenata de Amor.

    After reporting hundreds of politicians for small acts of corruption and learning how to encourage the population to engage in the democratic processes, my studies drove me to understand the legislative activity.

    Brazilians elect 594 citizens to be their representatives in the National Congress. How can we be sure that they are not defending their own interests or those who paid for their campaigns? My way, as a data scientist, is to ask the data.

    Content

    The National Congress of Brazil is composed of a Lower (Chamber of Deputies) and an Upper House (Federal Senate). In the first version of this dataset, you are going to find data only from the Chamber of Deputies. With 513 representatives, 86% of the congresspeople, I hope you have enough data to explore for some time.

    Would be impossible for me, a citizen without government ties, to collect this data without the help of public servants. I processed 9,717 fixed-width files and 73 XML's made officially available by the Chamber of Deputies and created 5 CSV's containing the same information. Multiple fields of the same file telling the same thing (e.g. body_id, body_name and body_abbreviation) were removed.

    Data on session attendance, votes, and propositions since past century were collected and scripted in a reproducible manner. The data collection and pre-processing scripts are available in a GitHub repository, under an open source license.

    Everything was collected from the Chamber of Deputies website at December 27, 2017, containing the whole legislative activity of the year. Presence and votes date from 1999, propositions go as far as 1946.

    When in question about the legislative process and how the sessions work in real world, the Internal Regulation of the Chamber of Deputies is the best Portuguese documentation for research. It's free!

    Acknowledgements

    Since the data was collected from a government website and the Brazilian law states that access to this information is free to any citizen, I am placing my own work published here in Public Domain.

    I'd like to thank the hundreds of people financially supporting the work of Operation Serenata de Amor and those responsible for passing the Information Access bill in 2011.

    Inspiration

    The legislative activity should tell the history while it's happening. How much has the Congress changed over the past decades? Do the congresspeople maintain the same political views or they vary on a weekly basis? Do people vote together with their state or party peers? How often? Can you model an algorithm to tell us the real parties inside Brazilian Congress?

  18. o

    Thorsten-Voice Dataset 2022.10

    • explore.openaire.eu
    Updated Oct 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thorsten Müller; Dominik Kreutz (2022). Thorsten-Voice Dataset 2022.10 [Dataset]. http://doi.org/10.5281/zenodo.7265580
    Explore at:
    Dataset updated
    Oct 30, 2022
    Authors
    Thorsten Müller; Dominik Kreutz
    Description

    The goal of project "Thorsten-Voice" is to provide voice datasets and TTS models for free and high quality german artificial voice. This dataset "Thorsten-Voice dataset 2022.10" is a neutrally spoken voice dataset recorded by Thorsten Müller, audio optimized by Dominik Kreutz and licenced under CC0 to provide it for anybody without any financial or licence struggle. "I contribute my personal voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone." (Thorsten Müller) Dataset details: ljspeech file and directory structure 12.450 recorded phrases (wav files) more than 11 hours of pure audio samplerate 22.050Hz mono normalized to -24dB no silence at beginning/ending avg spoken chars per second: 17,5 See more details on my Github page or Thorsten-Voice project website.

  19. Z

    Dataset for: The Evolution of the Manosphere Across the Web

    • data.niaid.nih.gov
    Updated Aug 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emiliano De Cristofaro (2020). Dataset for: The Evolution of the Manosphere Across the Web [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4007912
    Explore at:
    Dataset updated
    Aug 30, 2020
    Dataset provided by
    Gianluca Stringhini
    Jeremy Blackburn
    Summer Long
    Savvas Zannettou
    Stephanie Greenberg
    Barry Bradlyn
    Manoel Horta Ribeiro
    Emiliano De Cristofaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Evolution of the Manosphere Across the Web

    We make available data related to subreddit and standalone forums from the manosphere.

    We also make available Perspective API annotations for all posts.

    You can find the code in GitHub.

    Please cite this paper if you use this data:

    @article{ribeiroevolution2021, title={The Evolution of the Manosphere Across the Web}, author={Ribeiro, Manoel Horta and Blackburn, Jeremy and Bradlyn, Barry and De Cristofaro, Emiliano and Stringhini, Gianluca and Long, Summer and Greenberg, Stephanie and Zannettou, Savvas}, booktitle = {{Proceedings of the 15th International AAAI Conference on Weblogs and Social Media (ICWSM'21)}}, year={2021} }

    1. Reddit data

    We make available data for forums and for relevant subreddits (56 of them, as described in subreddit_descriptions.csv). These are available, 1 line per post in each subreddit Reddit in /ndjson/reddit.ndjson. A sample for example is:

    { "author": "Handheld_Gaming", "date_post": 1546300852, "id_post": "abcusl", "number_post": 9.0, "subreddit": "Braincels", "text_post": "Its been 2019 for almost 1 hour And I am at a party with 120 people, half of them being foids. The last year had been the best in my life. I actually was happy living hope because I was redpilled to the death.

    Now that I am blackpilled I see that I am the shortest of all men and that I am the only one with a recessed jaw.

    Its over. Its only thanks to my age old friendship with chads and my social skills I had developed in the past year that a lot of men like me a lot as a friend.

    No leg lengthening syrgery is gonna save me. Ignorance was a bliss. Its just horror now seeing that everyone can make out wirth some slin hoe at the party.

    I actually feel so unbelivably bad for turbomanlets. Life as an unattractive manlet is a pain, I cant imagine the hell being an ugly turbomanlet is like. I would have roped instsntly if I were one. Its so unfair.

    Tallcels are fakecels and they all can (and should) suck my cock.

    If I were 17cm taller my life would be a heaven and I would be the happiest man alive.

    Just cope and wait for affordable body tranpslants.", "thread": "t3_abcusl" }

    1. Forums

    We here describe the .sqlite and .ndjson files that contain the data from the following forums.

    (avfm) --- https://d2ec906f9aea-003845.vbulletin.net (incels) --- https://incels.co/ (love_shy) --- http://love-shy.com/lsbb/ (redpilltalk) --- https://redpilltalk.com/ (mgtow) --- https://www.mgtow.com/forums/ (rooshv) --- https://www.rooshvforum.com/ (pua_forum) --- https://www.pick-up-artist-forum.com/ (the_attraction) --- http://www.theattractionforums.com/

    The files are in folders /sqlite/ and /ndjson.

    2.1 .sqlite

    All the tables in the sqlite. datasets follow a very simple {key:value} format. Each key is a thread name (for example /threads/housewife-is-like-a-job.123835/) and each value is a python dictionary or a list. This file contains three tables:

    idx each key is the relative address to a thread and maps to a post. Each post is represented by a dict:

    "type": (list) in some forums you can add a descriptor such as [RageFuel] to each topic, and you may also have special types of posts, like sticked/pool/locked posts.
    "title": (str) title of the thread; "link": (str) link to the thread; "author_topic": (str) username that created the thread; "replies": (int) number of replies, may differ from number of posts due to difference in crawling date; "views": (int) number of views; "subforum": (str) name of the subforum; "collected": (bool) indicates if raw posts have been collected; "crawled_idx_at": (str) datetime of the collection.

    processed_posts each key is the relative address to a thread and maps to a list with posts (in order). Each post is represented by a dict:

    "author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.

    raw_posts each key is the relative address to a thread and maps to a list with unprocessed posts (in order). Each post is represented by a dict:

    "post_raw": (binary) raw html binary; "crawled_at": (str) datetime of the collection.

    2.2 .ndjson

    Each line consists of a json object representing a different comment with the following fields:

    "author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.

    1. Perspective

    We also run each post and reddit post through perspective, the files are located in the /perspective/ folder. They are compressed with gzip. One example output

    { "id_post": 5200, "hate_output": { "text": "I still can\u2019t wrap my mind around both of those articles about these c~~~s sleeping with poor Haitian Men. Where\u2019s the uproar?, where the hell is the outcry?, the \u201cpig\u201d comments or the \u201ccreeper comments\u201d. F~~~ing hell, if roles were reversed and it was an article about Men going to Europe where under 18 sex in legal, you better believe they would crucify the writer of that article and DEMAND an apology by the paper that wrote it.. This is exactly what I try and explain to people about the double standards within our modern society. A bunch of older women, wanna get their kicks off by sleeping with poor Men, just before they either hit or are at menopause age. F~~~ing unreal, I\u2019ll never forget going to Sweden and Norway a few years ago with one of my buddies and his girlfriend who was from there, the legal age of consent in Norway is 16 and in Sweden it\u2019s 15. I couldn\u2019t believe it, but my friend told me \u201c hey, it\u2019s normal here\u201d . Not only that but the age wasn\u2019t a big different in other European countries as well. One thing i learned very quickly was how very Misandric Sweden as well as Denmark were.", "TOXICITY": 0.6079781, "SEVERE_TOXICITY": 0.53744453, "INFLAMMATORY": 0.7279288, "PROFANITY": 0.58842486, "INSULT": 0.5511079, "OBSCENE": 0.9830818, "SPAM": 0.17009115 } }

    1. Working with sqlite

    A nice way to read some of the files of the dataset is using SqliteDict, for example:

    from sqlitedict import SqliteDict processed_posts = SqliteDict("./data/forums/incels.sqlite", tablename="processed_posts")

    for key, posts in processed_posts.items(): for post in posts: # here you could do something with each post in the dataset pass

    1. Helpers

    Additionally, we provide two .sqlite files that are helpers used in the analyses. These are related to reddit, and not to the forums! They are:

    channel_dict.sqlite a sqlite where each key corresponds to a subreddit and values are lists of dictionaries users who posted on it, along with timestamps.

    author_dict.sqlite a sqlite where each key corresponds to an author and values are lists of dictionaries of the subreddits they posted on, along with timestamps.

    These are used in the paper for the migration analyses.

    1. Examples and particularities for forums

    Although we did our best to clean the data and be consistent across forums, this is not always possible. In the following subsections we talk about the particularities of each forum, directions to improve the parsing which were not pursued as well as give some examples on how things work in each forum.

    6.1 incels

    Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.

    types: for the incel forums the special types associated with each thread in the idx table are “Sticky”, “Pool”, “Closed”, and the custom types added by users, such as [LifeFuel]. These last ones are all in brackets. You can see some examples of these in the on the example thread page.

    quotes: quotes in this forum were quite nice and thus, all quotations are deterministic.

    6.2 LoveShy

    Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.

    types: no types were parsed. There are some rules in the forum, but not significant.

    quotes: quotes were obtained from exact text+author match, or author match + a jaccard

  20. P

    ++How do I book multi-passenger flights on KLM Airlines? Dataset

    • paperswithcode.com
    Updated Jun 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Jun 18, 2022
    Authors
    HUI ZHANG; Shenglong Zhou; Geoffrey Ye Li; Naihua Xiu
    Description

    Booking multi-passenger flights on KLM Airlines is easier than many people expect, especially if you plan your process well. ☎️+1(888)796-1797 Begin by visiting the official KLM Airlines website or using their mobile app to start the flight search. Select your ☎️+1(888)796-1797 departure and destination cities, then choose the number of travelers. This is an essential step since the KLM system adapts fare ☎️+1(888)796-1797 availability based on the total number of passengers selected.

    When you're entering details for a group, ensure that all passengers' names match exactly as they appear on identification. ☎️+1(888)796-1797 Passport names are crucial to avoid rebooking complications later. KLM allows up to nine passengers in one reservation through the ☎️+1(888)796-1797 standard booking interface. If you're booking for more than nine, the system will redirect you to the group travel section. ☎️+1(888)796-1797

    This feature is perfect for families, friends, or small business groups traveling together. When selecting flights, try to be ☎️+1(888)796-1797 flexible with travel times, especially if the group is large. Seat availability may vary, and a slight adjustment in departure time ☎️+1(888)796-1797 can allow your entire group to be on the same flight. KLM always aims to seat passengers in the same ☎️+1(888)796-1797 section, but that depends on early booking and current occupancy.

    Adding traveler information takes time, so gather everyone's full name, date of birth, passport number, and special requirements ahead. ☎️+1(888)796-1797 During this step, the website will prompt you to enter passenger details one by one. Double-check each entry before ☎️+1(888)796-1797 clicking continue. Errors can result in delays or changes later on. Once passenger info is added, proceed to seat selection. ☎️+1(888)796-1797

    KLM offers the option to pre-select seats. This is highly recommended for group travelers. Choosing seats early ensures that everyone ☎️+1(888)796-1797 sits together, especially on long-haul or international flights. For a smoother experience, consider Standard or Comfort seats which offer extra space ☎️+1(888)796-1797 and legroom. These upgrades are affordable and beneficial for groups who want a unified and comfortable journey. ☎️+1(888)796-1797

    Once seat selection is complete, review the baggage options available for each passenger. Not all fare categories include checked baggage. ☎️+1(888)796-1797 It’s common for economy fares to exclude it, so verify if anyone in the group needs to add luggage. ☎️+1(888)796-1797 Doing this ahead of time is usually cheaper than adding bags at the airport. You'll also want to consider whether ☎️+1(888)796-1797 meal preferences, wheelchair assistance, or special services need to be added for any group member.

    After customizing the reservation, it’s time to move on to payment. ☎️+1(888)796-1797 KLM accepts multiple payment options including credit cards, debit cards, and in some countries, bank transfers. When booking for several ☎️+1(888)796-1797 travelers, the total price may appear higher than expected. This is normal, as airline pricing is tiered based on seat ☎️+1(888)796-1797 availability and group size. Booking early helps reduce the total cost significantly.

    Use loyalty points or a travel card if you have one. ☎️+1(888)796-1797 Flying Blue members can earn and redeem miles for all group passengers under the same booking. You’ll need the ☎️+1(888)796-1797 member’s number at the time of booking. This is a great way to maximize benefits across multiple passengers. Be ☎️+1(888)796-1797 sure to keep track of each individual’s miles after the trip.

    Once booked, KLM will email a single itinerary for the whole group. ☎️+1(888)796-1797 Save this email or print it for easy access. Changes can still be made after booking, depending on fare ☎️+1(888)796-1797 rules. If someone in the group can no longer travel, you'll need to contact support for modification policies. ☎️+1(888)796-1797

    Speaking of which, calling KLM directly can simplify group bookings, especially if your trip is complex. ☎️+1(888)796-1797 Phone support can assist with round-trip planning, multiple destinations, and fare combinations that aren’t visible online. ☎️+1(888)796-1797 You can even request special discounts or ask about group pricing when more than ten passengers are traveling. ☎️+1(888)796-1797

    After booking, make sure each passenger downloads the KLM app. ☎️+1(888)796-1797 The app allows mobile check-in, digital boarding passes, live updates, and gate alerts. Encourage your group to check in ☎️+1(888)796-1797 online 24 hours before the flight for a smoother airport experience. Digital boarding saves time, especially for larger travel parties. ☎️+1(888)796-1797

    If any changes are needed, most economy and flexible fares allow free rebooking or limited cancellation. ☎️+1(888)796-1797 Always review the ticket class to avoid surprises later. Even if you're modifying one passenger, the whole group’s ☎️+1(888)796-1797 itinerary might need adjustment. That’s why group leaders should stay in control of all booking emails and receipts. ☎️+1(888)796-1797

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
Organization logo

E-commerce - Users of a French C2C fashion store

Explore user behaviour of 98K users of a successful website

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Dataset provided by
Kaggle
Authors
Jeffrey Mvutu Mabilama
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
French
Description

Foreword

This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

My Telegram bot will answer your queries and allow you to contact me.

Context

There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

  • For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

This dataset is part of a preview of a much larger dataset. Please contact me for more.

Content

The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Questions you might want to answer using this dataset:

  • Are e-commerce users interested in social network feature ?
  • Are my users active enough (compared to those of this dataset) ?
  • How likely are people from other countries to sign up in a C2C website ?
  • How many users are likely to drop off after years of using my service ?

Example works:

  • Report(s) made using SQL queries can be found on the data.world page of the dataset.
  • Notebooks may be found on the Kaggle page of the dataset.

License

CC-BY-NC-SA 4.0

For other licensing options, contact me.

Search
Clear search
Close search
Google apps
Main menu