Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).
My Telegram bot will answer your queries and allow you to contact me.
There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.
Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).
This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.
If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.
This dataset is part of a preview of a much larger dataset. Please contact me for more.
The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.
Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Questions you might want to answer using this dataset:
Example works:
For other licensing options, contact me.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number
Explore the dataset and potentially gain valuable insight into your data science project through interesting features. The dataset was developed for a portfolio optimization graduate project I was working on. The goal was to the monetize risk of company deleveraging by associated with changes in economic data. Applications of the dataset may include. To see the data in action visit my analytics page. Analytics Page & Dashboard and to access all 295,000+ records click here.
For any questions, you may reach us at research_development@goldenoakresearch.com. For immediate assistance, you may reach me on at 585-626-2965. Please Note: the number is my personal number and email is preferred
Note: in total there are 75 fields the following are just themes the fields fall under Home Owner Costs: Sum of utilities, property taxes.
2012-2016 ACS 5-Year Documentation was provided by the U.S. Census Reports. Retrieved May 2, 2018, from
Providing you the potential to monetize risk and optimize your investment portfolio through quality economic features at unbeatable price. Access all 295,000+ records on an incredibly small scale, see links below for more details:
https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.
The League of Women Voters conducts surveys of Texas County voting websites. The data and further reading is available here (under County Website Reports). Any mistakes or errors found here are mine and the data on the LWV website is the authoritative data - I have no affiliation with the LWV but wanted to make the datasets more accessible.
I cleaned some of the data (split numeric and text ratings from one column to two columns) and made a few edits to values that appeared to be typos based on context - these will be noted in the description of each set. Column names were shortened in some cases and "NA" was added to empty cells. Each survey used slightly different questions, thought both 2016 sets appear to use the same ones and the 2017 is very similar.
Abbreviations used include SOS for the Texas Secretary of State website and 203 refers to Section 203 of the federal Voting Rights Act (for information, see this 2016 report).
Each dataset has at least these columns: county name, fips, date, total points, overall evaluation, perc calc na, and perc calc num.
Over 90% of Lufthansa travelers book flights online, and many forget to include their frequent flyer number. ☎️+1 (844) 459-5676 Fortunately, adding it afterward is very simple. ☎️+1 (844) 459-5676
If you’ve already completed your Lufthansa booking and forgot your Miles & More number, don’t worry. ☎️+1 (844) 459-5676 You can still claim the miles you’ve earned. ☎️+1 (844) 459-5676
The fastest way to add your frequent flyer number is by visiting Lufthansa’s official website. ☎️+1 (844) 459-5676 Log in to “My Bookings” with your credentials. ☎️+1 (844) 459-5676
Use your six-digit booking code and last name to locate your itinerary. ☎️+1 (844) 459-5676 Once inside, you’ll see an option labeled “Add Frequent Flyer.” ☎️+1 (844) 459-5676
Click that section, enter your Miles & More number, and save the changes. ☎️+1 (844) 459-5676 Lufthansa automatically updates the reservation and assigns mileage points. ☎️+1 (844) 459-5676
If you booked your Lufthansa ticket through a third-party site, this method still works. ☎️+1 (844) 459-5676 Just make sure your name matches the frequent flyer profile. ☎️+1 (844) 459-5676
Not a Miles & More member yet? Over 30 million people have joined for free benefits. ☎️+1 (844) 459-5676 Visit Lufthansa.com and enroll instantly to start. ☎️+1 (844) 459-5676
Adding a frequent flyer number before departure ensures you don’t miss out on mileage points. ☎️+1 (844) 459-5676 These points can be redeemed for upgrades or rewards. ☎️+1 (844) 459-5676
Some travelers forget to update partner programs. ☎️+1 (844) 459-5676 Lufthansa partners with airlines in Star Alliance, so you can add other loyalty numbers. ☎️+1 (844) 459-5676
To use United MileagePlus, Singapore KrisFlyer, or Air Canada Aeroplan, follow the same process. ☎️+1 (844) 459-5676 Enter the number in your booking details section. ☎️+1 (844) 459-5676
You can also call Lufthansa’s customer service to have a live agent help. ☎️+1 (844) 459-5676 Just dial and request to link your loyalty program. ☎️+1 (844) 459-5676
Be ready to provide your reservation code and frequent flyer number during the call. ☎️+1 (844) 459-5676 The agent will confirm if the change was successful. ☎️+1 (844) 459-5676
If you’ve already taken the flight but forgot to include your number, it’s still okay. ☎️+1 (844) 459-5676 Lufthansa allows retroactive credit for up to 6 months. ☎️+1 (844) 459-5676
Visit the Miles & More website and navigate to “Claim Missing Miles.” ☎️+1 (844) 459-5676 Enter flight details, ticket number, and your membership ID. ☎️+1 (844) 459-5676
Processing typically takes 3–4 weeks, and your account reflects miles earned from past trips. ☎️+1 (844) 459-5676 Lufthansa will notify you when points are posted. ☎️+1 (844) 459-5676
Using your frequent flyer number not only earns miles, but also enables perks. ☎️+1 (844) 459-5676 These include early check-in, baggage allowance, and access to lounges. ☎️+1 (844) 459-5676
Higher tier members also get benefits like priority boarding and mileage multipliers. ☎️+1 (844) 459-5676 If you’re loyal to Lufthansa, don’t skip the number. ☎️+1 (844) 459-5676
You can add your frequent flyer number at the airport, too. ☎️+1 (844) 459-5676 Just head to a Lufthansa kiosk or speak to a representative. ☎️+1 (844) 459-5676
Show them your ID and loyalty card. They’ll update the booking with your number. ☎️+1 (844) 459-5676 Don’t forget to ask for confirmation before departure. ☎️+1 (844) 459-5676
Another method is using the Lufthansa mobile app. ☎️+1 (844) 459-5676 Go to your upcoming trip and tap “Edit traveler info” to enter the number. ☎️+1 (844) 459-5676
To avoid forgetting again, save your number in your Lufthansa profile. ☎️+1 (844) 459-5676 This ensures future bookings auto-populate with your loyalty data. ☎️+1 (844) 459-5676
Also note that miles can only be credited once per flight. ☎️+1 (844) 459-5676 You cannot earn miles on multiple programs for the same trip. ☎️+1 (844) 459-5676
If you accidentally entered the wrong frequent flyer number, call Lufthansa support. ☎️+1 (844) 459-5676 An agent can delete the incorrect info and correct it. ☎️+1 (844) 459-5676
In summary, yes—you can definitely add your frequent flyer number after booking. ☎️+1 (844) 459-5676 Just use online tools or call Lufthansa for help. ☎️+1 (844) 459-5676
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Face Recognition, Face Detection, Male Photo Dataset 👨
If you are interested in biometric data - visit our website to learn more and buy the dataset :)
110,000+ photos of 74,000+ men from 141 countries. The dataset includes photos of people's faces. All people presented in the dataset are men. The dataset contains a variety of images capturing individuals from diverse backgrounds and age groups. Our dataset will diversify your data by adding more photos of men of… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/male-selfie-image-dataset.
Upvote! The database contains +40,000 records on US Gross Rent & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 325,272 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to upvote. Upvote right now, please. Enjoy!
Get the full free database with coupon code: FreeDatabase, See directions at the bottom of the description... And make sure to upvote :) coupon ends at 2:00 pm 8-23-2017
The data set originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.
Only proper citing is required please see the documentation for details. Have Fun!!!
Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.
For any questions, you may reach us at research_development@goldenoakresearch.com. For immediate assistance, you may reach me on at 585-626-2965
please note: it is my personal number and email is preferred
Check our data's accuracy: Census Fact Checker
Don't settle. Go big and win big. Optimize your potential**. Access all gross rent records and more on a scale roughly equivalent to a neighborhood, see link below:
A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.
The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.
Citing the RAVDESS
The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.
Academic paper citation
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
Personal use citation
Include a link to this Zenodo page - https://zenodo.org/record/1188976
Commercial Licenses
Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.
Contact Information
If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.
Example Videos
Watch a sample of the RAVDESS speech and song videos.
Emotion Classification Users
If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].
Construction and Validation
Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.
The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.
Contents
Audio-only files
Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):
Audio-Visual and Video-only files
Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:
File Summary
In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).
File naming convention
Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:
Filename identifiers
Filename example: 02-01-06-01-02-01-12.mp4
License information
The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0
Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.
Related Data sets
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
By California Health and Human Services [source]
Welcome to the California Health and Human Services Agency's Open Data Portal! Here, you can explore and utilize information from one of the state's most valuable assets: the non-confidential data set of Medi-Cal Fee-for-Service (FFS) program providers.
This dataset provides insight into Medi-Cal FFS enrollment. The information was retrieved from the Provider Master File (PMF), which is maintained by the Provider Enrollment Division (PED). With this dataset, you will gain insights into provider number, legal name, type description, specialty description and other geographical data points such as county code, attention line address parts , landmark coordinate points (longitude/latitude) and more!
The goal with this Open Data Portal initiative is to empower Californians with:
- Increased public access to high quality health & human service data;
- Stemmed creativity & innovation in research;
- The ability to make informed decisions about our health & services providers;
- Transparency in government policy expenditure measures.
Our hope is that you'll use these tools for responsible data analytics exploration on not just Medi-Cal FFS provision but on any related subject matter that interest& benefit your community at large. Good luck & happy researching!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Creating a mobile application or website to help people easily and quickly find their nearest Medi-Cal FFS providers based on location, specialty and provider type.
- Developing analytics tools to help organizations understand the concentrations of providers across the state in order to inform decision making when considering regional expansion and improving service accessibility.
- Developing a tool that visualizes specialty diversity across the state to identify areas with low provider density while helping inform strategies aimed at increasing access to care for communities with high needs populations
If you use this dataset in your research, please credit the original authors. Data Source
License: Open Database License (ODbL) v1.0 - You are free to: - Share - copy and redistribute the material in any medium or format. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices. - No Derivatives - If you remix, transform, or build upon the material, you may not distribute the modified material. - No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
File: Profile_of_Enrolled_Medi-Cal_Fee-for-Service_FFS_Providers_as_of_May_1_2016.csv | Column name | Description | |:----------------------------|:---------------------------------------------------------------| | NPI | National Provider Identifier (Number) | | SERVICE LOCATION NUMBER | Unique identifier for the provider's service location (Number) | | LEGAL NAME | Legal name of the provider (Text) | | TYPE DESCRIPTION | Type of provider (Text) | | SPECIALTY DESCRIPTION | Specialty of the provider (Text) | | OUT OF STATE INDICATOR | Indicates if the provider is located out of state (Boolean) | | IN/OUT OF STATE | Indicates if the provider is located in or out of state (Text) | | COUNTY CODE | County code of the provider's service location (Number) | | COUNTY NAME | County name of the provider's service location (Text) | | ADDRESS ATTENTION | Attention line of the provider's address (Text) | | ADDRESS LINE 1 | First l...
An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.
The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.
Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.
There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.
X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.
Variables Description
* Prospect ID - A unique ID with which the customer is identified.
* Lead Number - A lead number assigned to each lead procured.
* Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc.
* Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc.
* Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not.
* Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not.
* Converted - The target variable. Indicates whether a lead has been successfully converted or not.
* TotalVisits - The total number of visits made by the customer on the website.
* Total Time Spent on Website - The total time spent by the customer on the website.
* Page Views Per Visit - Average number of pages on the website viewed during the visits.
* Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc.
* Country - The country of the customer.
* Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form.
* How did you hear about X Education - The source from which the customer heard about X Education.
* What is your current occupation - Indicates whether the customer is a student, umemployed or employed.
* What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course.
* Search - Indicating whether the customer had seen the ad in any of the listed items.
* Magazine
* Newspaper Article
* X Education Forums
* Newspaper
* Digital Advertisement
* Through Recommendations - Indicates whether the customer came in through recommendations.
* Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses.
* Tags - Tags assigned to customers indicating the current status of the lead.
* Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead.
* Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content.
* Get updates on DM Content - Indicates whether the customer wants updates on the DM Content.
* Lead Profile - A lead level assigned to each customer based on their profile.
* City - The city of the customer.
* Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile
* Asymmetric Profile Index
* Asymmetric Activity Score
* Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not.
* a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not.
* Last Notable Activity - The last notable activity performed by the student.
UpGrad Case Study
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
From 1934 to 1963, San Francisco was infamous for housing some of the world's most notorious criminals on the inescapable island of Alcatraz. Today, the city is known more for its tech scene than its criminal past. But, with rising wealth inequality, housing shortages, and a proliferation of expensive digital toys riding BART to work, there is no scarcity of crime in the city by the bay. From Sunset to SOMA, and Marina to Excelsior, this dataset provides nearly 12 years of crime reports from across all of San Francisco's neighborhoods.
This dataset was featured in our completed playground competition entitled San Francisco Crime Classification. The goals of the competition were to:
predict the category of crime that occurred, given the time and location
visualize the city and crimes (see Mapping and Visualizing Violent Crime for inspiration)
This dataset contains incidents derived from SFPD Crime Incident Reporting system. The data ranges from 1/1/2003 to 5/13/2015. The training set and test set rotate every week, meaning week 1,3,5,7... belong to test set, week 2,4,6,8 belong to training set. There are 9 variables:
Dates - timestamp of the crime incident
Category - category of the crime incident (only in train.csv).
Descript - detailed description of the crime incident (only in train.csv)
DayOfWeek - the day of the week
PdDistrict - name of the Police Department District
Resolution - how the crime incident was resolved (only in train.csv)
Address - the approximate street address of the crime incident
X - Longitude
Y - Latitude
This dataset is part of our completed playground competition entitled San Francisco Crime Classification. Visit the competition page if you are interested in checking out past discussions, competition leaderboard, or more details regarding the competition. If you are curious to see how your results rank compared to others', you can still make a submission at the competition submission page!
The original dataset is from SF OpenData, the central clearinghouse for data published by the City and County of San Francisco.
https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario
Data includes: board and school information, grade 3 and 6 EQAO student achievements for reading, writing and mathematics, and grade 9 mathematics EQAO and OSSLT. Data excludes private schools, Education and Community Partnership Programs (ECPP), summer, night and continuing education schools.
How Are We Protecting Privacy?
Results for OnSIS and Statistics Canada variables are suppressed based on school population size to better protect student privacy. In order to achieve this additional level of protection, the Ministry has used a methodology that randomly rounds a percentage either up or down depending on school enrolment. In order to protect privacy, the ministry does not publicly report on data when there are fewer than 10 individuals represented.
The information in the School Information Finder is the most current available to the Ministry of Education at this time, as reported by schools, school boards, EQAO and Statistics Canada. The information is updated as frequently as possible.
This information is also available on the Ministry of Education's School Information Finder website by individual school.
Descriptions for some of the data types can be found in our glossary.
School/school board and school authority contact information are updated and maintained by school boards and may not be the most current version. For the most recent information please visit: https://data.ontario.ca/dataset/ontario-public-school-contact-information.
This page contains data for the immigration system statistics up to March 2023.
For current immigration system data, visit ‘Immigration system statistics data tables’.
https://assets.publishing.service.gov.uk/media/64625e6894f6df0010f5eaab/asylum-applications-datasets-mar-2023.xlsx">Asylum applications, initial decisions and resettlement (MS Excel Spreadsheet, 9.13 MB)
Asy_D01: Asylum applications raised, by nationality, age, sex, UASC, applicant type, and location of application
Asy_D02: Outcomes of asylum applications at initial decision, and refugees resettled in the UK, by nationality, age, sex, applicant type, and UASC
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625ec394f6df0010f5eaac/asylum-applications-awaiting-decision-datasets-mar-2023.xlsx">Asylum applications awaiting a decision (MS Excel Spreadsheet, 1.26 MB)
Asy_D03: Asylum applications awaiting an initial decision or further review, by nationality and applicant type
This is not the latest data
https://assets.publishing.service.gov.uk/media/62fa17698fa8f50b54374371/outcome-analysis-asylum-applications-datasets-jun-2022.xlsx">Outcome analysis of asylum applications (MS Excel Spreadsheet, 410 KB)
Asy_D04: The initial decision and final outcome of all asylum applications raised in a period, by nationality
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625ef1427e41000cb437cb/age-disputes-datasets-mar-2023.xlsx">Age disputes (MS Excel Spreadsheet, 178 KB)
Asy_D05: Age disputes raised and outcomes of age disputes
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625f0ca09dfc000c3c17cf/asylum-appeals-lodged-datasets-mar-2023.xlsx">Asylum appeals lodged and determined (MS Excel Spreadsheet, 817 KB)
Asy_D06: Asylum appeals raised at the First-Tier Tribunal, by nationality and sex
Asy_D07: Outcomes of asylum appeals raised at the First-Tier Tribunal, by nationality and sex
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625f29427e41000cb437cd/asylum-claims-certified-section-94-datasets-mar-2023.xlsx"> Asylum claims certified under Section 94 (MS Excel Spreadsheet, 150 KB)
Asy_D08: Initial decisions on asylum applications certified under Section 94, by nationality
This is not the latest data
https://assets.publishing.service.gov.uk/media/6463a618d3231e000c32da99/asylum-seekers-receipt-support-datasets-mar-2023.xlsx">Asylum seekers in receipt of support (MS Excel Spreadsheet, 2.16 MB)
Asy_D09: Asylum seekers in receipt of support at end of period, by nationality, support type, accommodation type, and UK region
This is not the latest data
https://assets.publishing.service.gov.uk/media/63ecd7388fa8f5612a396c40/applications-section-95-support-datasets-dec-2022.xlsx">Applications for section 95 su
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Introduction: The dataset used for this experiment is real and authentic. The dataset is acquired from UCI machine learning repository website [13]. The title of the dataset is ‘Crime and Communities’. It is prepared using real data from socio-economic data from 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crimedata from the 1995 FBI UCR [13]. This dataset contains a total number of 147 attributes and 2216 instances.
The per capita crimes variables were calculated using population values included in the 1995 FBI data (which differ from the 1990 Census values).
The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The crime attributes (N=18) that could be predicted are the 8 crimes considered 'Index Crimes' by the FBI)(Murders, Rape, Robbery, .... ), per capita (actually per 100,000 population) versions of each, and Per Capita Violent Crimes and Per Capita Nonviolent Crimes)
predictive variables : 125 non-predictive variables : 4 potential goal/response variables : 18
http://archive.ics.uci.edu/ml/datasets/Communities%20and%20Crime%20Unnormalized
U. S. Department of Commerce, Bureau of the Census, Census Of Population And Housing 1990 United States: Summary Tape File 1a & 3a (Computer Files),
U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)
U.S. Department of Justice, Bureau of Justice Statistics, Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)
U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States (Computer File) (1995)
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Data available in the dataset may not act as a complete source of information for identifying factors that contribute to more violent and non-violent crimes as many relevant factors may still be missing.
However, I would like to try and answer the following questions answered.
Analyze if number of vacant and occupied houses and the period of time the houses were vacant had contributed to any significant change in violent and non-violent crime rates in communities
How has unemployment changed crime rate(violent and non-violent) in the communities?
Were people from a particular age group more vulnerable to crime?
Does ethnicity play a role in crime rate?
Has education played a role in bringing down the crime rate?
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Brazilian? You can read a Portuguese version of this article here.
Last year, while I was attending a data science course in Germany, my country was impeaching its president. My colleagues asked me to explain what was happening in Brazil and the possible political outcomes in South America. Although I was able to give a general context and tell multiple arguments in favor and against the impeachment, deep inside, my answer was "I really don't know".
Understanding what happens in Politics is something that takes a lot of effort and research. When I decided I had to use my tech skills to make myself a better citizen, I dived into government data and started Operation Serenata de Amor.
After reporting hundreds of politicians for small acts of corruption and learning how to encourage the population to engage in the democratic processes, my studies drove me to understand the legislative activity.
Brazilians elect 594 citizens to be their representatives in the National Congress. How can we be sure that they are not defending their own interests or those who paid for their campaigns? My way, as a data scientist, is to ask the data.
The National Congress of Brazil is composed of a Lower (Chamber of Deputies) and an Upper House (Federal Senate). In the first version of this dataset, you are going to find data only from the Chamber of Deputies. With 513 representatives, 86% of the congresspeople, I hope you have enough data to explore for some time.
Would be impossible for me, a citizen without government ties, to collect this data without the help of public servants. I processed 9,717 fixed-width files and 73 XML's made officially available by the Chamber of Deputies and created 5 CSV's containing the same information. Multiple fields of the same file telling the same thing (e.g. body_id
, body_name
and body_abbreviation
) were removed.
Data on session attendance, votes, and propositions since past century were collected and scripted in a reproducible manner. The data collection and pre-processing scripts are available in a GitHub repository, under an open source license.
Everything was collected from the Chamber of Deputies website at December 27, 2017, containing the whole legislative activity of the year. Presence and votes date from 1999, propositions go as far as 1946.
When in question about the legislative process and how the sessions work in real world, the Internal Regulation of the Chamber of Deputies is the best Portuguese documentation for research. It's free!
Since the data was collected from a government website and the Brazilian law states that access to this information is free to any citizen, I am placing my own work published here in Public Domain.
I'd like to thank the hundreds of people financially supporting the work of Operation Serenata de Amor and those responsible for passing the Information Access bill in 2011.
The legislative activity should tell the history while it's happening. How much has the Congress changed over the past decades? Do the congresspeople maintain the same political views or they vary on a weekly basis? Do people vote together with their state or party peers? How often? Can you model an algorithm to tell us the real parties inside Brazilian Congress?
The goal of project "Thorsten-Voice" is to provide voice datasets and TTS models for free and high quality german artificial voice. This dataset "Thorsten-Voice dataset 2022.10" is a neutrally spoken voice dataset recorded by Thorsten Müller, audio optimized by Dominik Kreutz and licenced under CC0 to provide it for anybody without any financial or licence struggle. "I contribute my personal voice as a person believing in a world where all people are equal. No matter of gender, sexual orientation, religion, skin color and geocoordinates of birth location. A global world where everybody is warmly welcome on any place on this planet and open and free knowledge and education is available to everyone." (Thorsten Müller) Dataset details: ljspeech file and directory structure 12.450 recorded phrases (wav files) more than 11 hours of pure audio samplerate 22.050Hz mono normalized to -24dB no silence at beginning/ending avg spoken chars per second: 17,5 See more details on my Github page or Thorsten-Voice project website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Evolution of the Manosphere Across the Web
We make available data related to subreddit and standalone forums from the manosphere.
We also make available Perspective API annotations for all posts.
You can find the code in GitHub.
Please cite this paper if you use this data:
@article{ribeiroevolution2021, title={The Evolution of the Manosphere Across the Web}, author={Ribeiro, Manoel Horta and Blackburn, Jeremy and Bradlyn, Barry and De Cristofaro, Emiliano and Stringhini, Gianluca and Long, Summer and Greenberg, Stephanie and Zannettou, Savvas}, booktitle = {{Proceedings of the 15th International AAAI Conference on Weblogs and Social Media (ICWSM'21)}}, year={2021} }
We make available data for forums and for relevant subreddits (56 of them, as described in subreddit_descriptions.csv). These are available, 1 line per post in each subreddit Reddit in /ndjson/reddit.ndjson. A sample for example is:
{ "author": "Handheld_Gaming", "date_post": 1546300852, "id_post": "abcusl", "number_post": 9.0, "subreddit": "Braincels", "text_post": "Its been 2019 for almost 1 hour And I am at a party with 120 people, half of them being foids. The last year had been the best in my life. I actually was happy living hope because I was redpilled to the death.
Now that I am blackpilled I see that I am the shortest of all men and that I am the only one with a recessed jaw.
Its over. Its only thanks to my age old friendship with chads and my social skills I had developed in the past year that a lot of men like me a lot as a friend.
No leg lengthening syrgery is gonna save me. Ignorance was a bliss. Its just horror now seeing that everyone can make out wirth some slin hoe at the party.
I actually feel so unbelivably bad for turbomanlets. Life as an unattractive manlet is a pain, I cant imagine the hell being an ugly turbomanlet is like. I would have roped instsntly if I were one. Its so unfair.
Tallcels are fakecels and they all can (and should) suck my cock.
If I were 17cm taller my life would be a heaven and I would be the happiest man alive.
Just cope and wait for affordable body tranpslants.", "thread": "t3_abcusl" }
We here describe the .sqlite and .ndjson files that contain the data from the following forums.
(avfm) --- https://d2ec906f9aea-003845.vbulletin.net (incels) --- https://incels.co/ (love_shy) --- http://love-shy.com/lsbb/ (redpilltalk) --- https://redpilltalk.com/ (mgtow) --- https://www.mgtow.com/forums/ (rooshv) --- https://www.rooshvforum.com/ (pua_forum) --- https://www.pick-up-artist-forum.com/ (the_attraction) --- http://www.theattractionforums.com/
The files are in folders /sqlite/ and /ndjson.
2.1 .sqlite
All the tables in the sqlite. datasets follow a very simple {key:value} format. Each key is a thread name (for example /threads/housewife-is-like-a-job.123835/) and each value is a python dictionary or a list. This file contains three tables:
idx each key is the relative address to a thread and maps to a post. Each post is represented by a dict:
"type": (list) in some forums you can add a descriptor such as
[RageFuel] to each topic, and you may also have special
types of posts, like sticked/pool/locked posts.
"title": (str) title of the thread;
"link": (str) link to the thread;
"author_topic": (str) username that created the thread;
"replies": (int) number of replies, may differ from number of
posts due to difference in crawling date;
"views": (int) number of views;
"subforum": (str) name of the subforum;
"collected": (bool) indicates if raw posts have been collected;
"crawled_idx_at": (str) datetime of the collection.
processed_posts each key is the relative address to a thread and maps to a list with posts (in order). Each post is represented by a dict:
"author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.
raw_posts each key is the relative address to a thread and maps to a list with unprocessed posts (in order). Each post is represented by a dict:
"post_raw": (binary) raw html binary; "crawled_at": (str) datetime of the collection.
2.2 .ndjson
Each line consists of a json object representing a different comment with the following fields:
"author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.
We also run each post and reddit post through perspective, the files are located in the /perspective/ folder. They are compressed with gzip. One example output
{ "id_post": 5200, "hate_output": { "text": "I still can\u2019t wrap my mind around both of those articles about these c~~~s sleeping with poor Haitian Men. Where\u2019s the uproar?, where the hell is the outcry?, the \u201cpig\u201d comments or the \u201ccreeper comments\u201d. F~~~ing hell, if roles were reversed and it was an article about Men going to Europe where under 18 sex in legal, you better believe they would crucify the writer of that article and DEMAND an apology by the paper that wrote it.. This is exactly what I try and explain to people about the double standards within our modern society. A bunch of older women, wanna get their kicks off by sleeping with poor Men, just before they either hit or are at menopause age. F~~~ing unreal, I\u2019ll never forget going to Sweden and Norway a few years ago with one of my buddies and his girlfriend who was from there, the legal age of consent in Norway is 16 and in Sweden it\u2019s 15. I couldn\u2019t believe it, but my friend told me \u201c hey, it\u2019s normal here\u201d . Not only that but the age wasn\u2019t a big different in other European countries as well. One thing i learned very quickly was how very Misandric Sweden as well as Denmark were.", "TOXICITY": 0.6079781, "SEVERE_TOXICITY": 0.53744453, "INFLAMMATORY": 0.7279288, "PROFANITY": 0.58842486, "INSULT": 0.5511079, "OBSCENE": 0.9830818, "SPAM": 0.17009115 } }
A nice way to read some of the files of the dataset is using SqliteDict, for example:
from sqlitedict import SqliteDict processed_posts = SqliteDict("./data/forums/incels.sqlite", tablename="processed_posts")
for key, posts in processed_posts.items(): for post in posts: # here you could do something with each post in the dataset pass
Additionally, we provide two .sqlite files that are helpers used in the analyses. These are related to reddit, and not to the forums! They are:
channel_dict.sqlite a sqlite where each key corresponds to a subreddit and values are lists of dictionaries users who posted on it, along with timestamps.
author_dict.sqlite a sqlite where each key corresponds to an author and values are lists of dictionaries of the subreddits they posted on, along with timestamps.
These are used in the paper for the migration analyses.
Although we did our best to clean the data and be consistent across forums, this is not always possible. In the following subsections we talk about the particularities of each forum, directions to improve the parsing which were not pursued as well as give some examples on how things work in each forum.
6.1 incels
Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.
types: for the incel forums the special types associated with each thread in the idx table are “Sticky”, “Pool”, “Closed”, and the custom types added by users, such as [LifeFuel]. These last ones are all in brackets. You can see some examples of these in the on the example thread page.
quotes: quotes in this forum were quite nice and thus, all quotations are deterministic.
6.2 LoveShy
Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.
types: no types were parsed. There are some rules in the forum, but not significant.
quotes: quotes were obtained from exact text+author match, or author match + a jaccard
Booking multi-passenger flights on KLM Airlines is easier than many people expect, especially if you plan your process well. ☎️+1(888)796-1797 Begin by visiting the official KLM Airlines website or using their mobile app to start the flight search. Select your ☎️+1(888)796-1797 departure and destination cities, then choose the number of travelers. This is an essential step since the KLM system adapts fare ☎️+1(888)796-1797 availability based on the total number of passengers selected.
When you're entering details for a group, ensure that all passengers' names match exactly as they appear on identification. ☎️+1(888)796-1797 Passport names are crucial to avoid rebooking complications later. KLM allows up to nine passengers in one reservation through the ☎️+1(888)796-1797 standard booking interface. If you're booking for more than nine, the system will redirect you to the group travel section. ☎️+1(888)796-1797
This feature is perfect for families, friends, or small business groups traveling together. When selecting flights, try to be ☎️+1(888)796-1797 flexible with travel times, especially if the group is large. Seat availability may vary, and a slight adjustment in departure time ☎️+1(888)796-1797 can allow your entire group to be on the same flight. KLM always aims to seat passengers in the same ☎️+1(888)796-1797 section, but that depends on early booking and current occupancy.
Adding traveler information takes time, so gather everyone's full name, date of birth, passport number, and special requirements ahead. ☎️+1(888)796-1797 During this step, the website will prompt you to enter passenger details one by one. Double-check each entry before ☎️+1(888)796-1797 clicking continue. Errors can result in delays or changes later on. Once passenger info is added, proceed to seat selection. ☎️+1(888)796-1797
KLM offers the option to pre-select seats. This is highly recommended for group travelers. Choosing seats early ensures that everyone ☎️+1(888)796-1797 sits together, especially on long-haul or international flights. For a smoother experience, consider Standard or Comfort seats which offer extra space ☎️+1(888)796-1797 and legroom. These upgrades are affordable and beneficial for groups who want a unified and comfortable journey. ☎️+1(888)796-1797
Once seat selection is complete, review the baggage options available for each passenger. Not all fare categories include checked baggage. ☎️+1(888)796-1797 It’s common for economy fares to exclude it, so verify if anyone in the group needs to add luggage. ☎️+1(888)796-1797 Doing this ahead of time is usually cheaper than adding bags at the airport. You'll also want to consider whether ☎️+1(888)796-1797 meal preferences, wheelchair assistance, or special services need to be added for any group member.
After customizing the reservation, it’s time to move on to payment. ☎️+1(888)796-1797 KLM accepts multiple payment options including credit cards, debit cards, and in some countries, bank transfers. When booking for several ☎️+1(888)796-1797 travelers, the total price may appear higher than expected. This is normal, as airline pricing is tiered based on seat ☎️+1(888)796-1797 availability and group size. Booking early helps reduce the total cost significantly.
Use loyalty points or a travel card if you have one. ☎️+1(888)796-1797 Flying Blue members can earn and redeem miles for all group passengers under the same booking. You’ll need the ☎️+1(888)796-1797 member’s number at the time of booking. This is a great way to maximize benefits across multiple passengers. Be ☎️+1(888)796-1797 sure to keep track of each individual’s miles after the trip.
Once booked, KLM will email a single itinerary for the whole group. ☎️+1(888)796-1797 Save this email or print it for easy access. Changes can still be made after booking, depending on fare ☎️+1(888)796-1797 rules. If someone in the group can no longer travel, you'll need to contact support for modification policies. ☎️+1(888)796-1797
Speaking of which, calling KLM directly can simplify group bookings, especially if your trip is complex. ☎️+1(888)796-1797 Phone support can assist with round-trip planning, multiple destinations, and fare combinations that aren’t visible online. ☎️+1(888)796-1797 You can even request special discounts or ask about group pricing when more than ten passengers are traveling. ☎️+1(888)796-1797
After booking, make sure each passenger downloads the KLM app. ☎️+1(888)796-1797 The app allows mobile check-in, digital boarding passes, live updates, and gate alerts. Encourage your group to check in ☎️+1(888)796-1797 online 24 hours before the flight for a smoother airport experience. Digital boarding saves time, especially for larger travel parties. ☎️+1(888)796-1797
If any changes are needed, most economy and flexible fares allow free rebooking or limited cancellation. ☎️+1(888)796-1797 Always review the ticket class to avoid surprises later. Even if you're modifying one passenger, the whole group’s ☎️+1(888)796-1797 itinerary might need adjustment. That’s why group leaders should stay in control of all booking emails and receipts. ☎️+1(888)796-1797
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).
My Telegram bot will answer your queries and allow you to contact me.
There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.
Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).
This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.
If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.
This dataset is part of a preview of a much larger dataset. Please contact me for more.
The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.
Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Questions you might want to answer using this dataset:
Example works:
For other licensing options, contact me.