14 datasets found

b
LinkedIn Usage and Revenue Statistics (2026)
businessofapps.com
content.clixoni.com
Updated Nov 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2018). LinkedIn Usage and Revenue Statistics (2026) [Dataset]. https://www.businessofapps.com/data/linkedin-statistics/
Explore at:
Dataset updated
Nov 12, 2018
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
LinkedIn is the world’s preeminent social network for professionals. Members create CVs, list their current and previous job roles, skills and education. The business network is also a recruiting...
LinkedIn Compatibility Dataset: 50K Profiles
kaggle.com
zip
Updated Dec 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Likitha Gedipudi (2025). LinkedIn Compatibility Dataset: 50K Profiles [Dataset]. https://www.kaggle.com/datasets/likithagedipudi/linkedin-compatibility-dataset-50k-profiles
Explore at:
zip(291244458 bytes)Available download formats
Dataset updated
Dec 20, 2025
Authors
Likitha Gedipudi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context: Professional networking is inefficient - 90% of LinkedIn connections provide minimal mutual value. This dataset enables AI models to predict networking compatibility and recommend high-value connections before they're made.

Dataset Overview: 50,000 professional profiles paired into 500,000+ compatibility-scored combinations with detailed feature breakdowns. Synthetically generated, ML-ready data for building recommendation systems.

Files & Schema: - profiles.csv (50,000 rows) - profile_id - Unique identifier - name, email, location - Demographics - current_role, current_company - Current position - industry - Industry category - years_experience - Total years of experience - seniority_level - entry/mid/senior/executive - skills - JSON array of skills - experience - JSON work history - education - JSON education history - connections - Network size - goals, needs, can_offer - Professional objectives (JSON) - compatibility_pairs.csv (500,000+ rows) - pair_id - Unique pair identifier - profile_a_id, profile_b_id - Profile IDs - compatibility_score - Overall match (0-100) - skill_match_score - Skill overlap - skill_complementarity_score - How skills complement - network_value_a_to_b - Network value A→B provides - network_value_b_to_a - Network value B→A provides - career_alignment_score - Mentorship/learning potential - experience_gap - Years experience difference - industry_match - Industry similarity - geographic_score - Location proximity - seniority_match - Seniority compatibility - mutual_benefit_explanation - Human-readable reasoning
Number of LinkedIn users in the United Kingdom 2019-2028
statista.com
Updated Dec 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of LinkedIn users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/3236/social-media-usage-in-the-uk/
Explore at:
Dataset updated
Dec 17, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United Kingdom
Description
The number of LinkedIn users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 1.5 million users (+4.51 percent). After the eighth consecutive increasing year, the LinkedIn user base is estimated to reach 34.7 million users and therefore a new peak in 2028. User figures, shown here with regards to the platform LinkedIn, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
News Popularity in Multiple Social Media Platforms
kaggle.com
zip
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikhil John (2020). News Popularity in Multiple Social Media Platforms [Dataset]. https://www.kaggle.com/nikhiljohnk/news-popularity-in-multiple-social-media-platforms
Explore at:
zip(10881978 bytes)Available download formats
Dataset updated
Oct 28, 2020
Authors
Nikhil John
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Social Media has been taking up everything on the Internet. People getting the latest news, useful resources, life partner and what not. In a world where Social media plays a big role in giving news, we must also know that news which affects our sentiments are going to get spread like a wildfire. Based on the Headline and the title, and according to the date given and the Social media platforms, you have to predict how it has affected the human sentiment scores. You have to predict the column “SentimentTitle” and “SentimentHeadline”.

Content

This is a subset of the dataset of the same name available in the UCI Machine Learning Repository The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine.

Dataset Information

The attributes for each of the dataset are : - IDLink (numeric): Unique identifier of news items - Title (string): Title of the news item according to the official media sources - Headline (string): Headline of the news item according to the official media sources - Source (string): Original news outlet that published the news item - Topic (string): Query topic used to obtain the items in the official media sources - Publish-Date (timestamp): Date and time of the news items' publication - Facebook (numeric): Final value of the news items' popularity according to the social media source Facebook - Google-Plus (numeric): Final value of the news items' popularity according to the social media source Google+ - LinkedIn (numeric): Final value of the news items' popularity according to the social media source LinkedIn - SentimentTitle: Sentiment score of the title, Higher the score, better is the impact or +ve sentiment and vice-versa. (Target Variable 1) - SentimentHeadline: Sentiment score of the text in the news items' headline. Higher the score, better is the impact or +ve sentiment. (Target Variable 2)
f
How to make data reusable
nfdimetaportal.fokus.fraunhofer.de
meta4ds.fokus.fraunhofer.de
+1more
html
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NFDI4Cat (2024). How to make data reusable [Dataset]. https://nfdimetaportal.fokus.fraunhofer.de/datasets/b3qjcey2ne0?locale=en
Explore at:
htmlAvailable download formats
Dataset updated
May 15, 2024
Dataset authored and provided by
NFDI4Cat
Description
To promote progress in catalysis-related research, sharing FAIR (findable, accessible, interoperable, reusable) data is essential. Shared data can inspire new understanding, prevent duplication of work and even allows for new insights through artificial intelligence-based approaches. While not all data in catalysis-related research can be shared openly, it is important to make sure that the data can be understood independent from the individual research to retain its longterm value. This requires a comprehensive description with metadata, but also technical aspects such as data formats need to be kept in mind. Join us on an excursion into making data FAIR and discover the first steps to ensure that your research data will retain its value. Stay tuned for more exciting content, and thank you for being a part of our growing community!

Check out our website: https://nfdi4cat.org/

Follow us: https://in.linkedin.com/company/nfdi4cat https://twitter.com/NFDI4Cat

NFDI4Cat #catalysis #fairdata #metadata
Billionaires Statistics Dataset (2023)
kaggle.com
zip
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2023). Billionaires Statistics Dataset (2023) [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/billionaires-statistics-dataset
Explore at:
zip(142700 bytes)Available download formats
Dataset updated
Sep 29, 2023
Authors
Nidula Elgiriyewithana ⚡
Description
Description

This dataset contains statistics on the world's billionaires, including information about their businesses, industries, and personal details. It provides insights into the wealth distribution, business sectors, and demographics of billionaires worldwide.

Key Features

rank: The ranking of the billionaire in terms of wealth.

finalWorth: The final net worth of the billionaire in U.S. dollars.

category: The category or industry in which the billionaire's business operates.

personName: The full name of the billionaire.

age: The age of the billionaire.

country: The country in which the billionaire resides.

city: The city in which the billionaire resides.

source: The source of the billionaire's wealth.

industries: The industries associated with the billionaire's business interests.

countryOfCitizenship: The country of citizenship of the billionaire.

organization: The name of the organization or company associated with the billionaire.

selfMade: Indicates whether the billionaire is self-made (True/False).

status: "D" represents self-made billionaires (Founders/Entrepreneurs) and "U" indicates inherited or unearned wealth.

gender: The gender of the billionaire.

birthDate: The birthdate of the billionaire.

lastName: The last name of the billionaire.

firstName: The first name of the billionaire.

title: The title or honorific of the billionaire.

date: The date of data collection.

state: The state in which the billionaire resides.

residenceStateRegion: The region or state of residence of the billionaire.

birthYear: The birth year of the billionaire.

birthMonth: The birth month of the billionaire.

birthDay: The birth day of the billionaire.

cpi_country: Consumer Price Index (CPI) for the billionaire's country.

cpi_change_country: CPI change for the billionaire's country.

gdp_country: Gross Domestic Product (GDP) for the billionaire's country.

gross_tertiary_education_enrollment: Enrollment in tertiary education in the billionaire's country.

gross_primary_education_enrollment_country: Enrollment in primary education in the billionaire's country.

life_expectancy_country: Life expectancy in the billionaire's country.

tax_revenue_country_country: Tax revenue in the billionaire's country.

total_tax_rate_country: Total tax rate in the billionaire's country.

population_country: Population of the billionaire's country.

latitude_country: Latitude coordinate of the billionaire's country.

longitude_country: Longitude coordinate of the billionaire's country.

Potential Use Cases

Wealth distribution analysis: Explore the distribution of billionaires' wealth across different industries, countries, and regions.

Demographic analysis: Investigate the age, gender, and birthplace demographics of billionaires.

Self-made vs. inherited wealth: Analyze the proportion of self-made billionaires and those who inherited their wealth.

Economic indicators: Study correlations between billionaire wealth and economic indicators such as GDP, CPI, and tax rates.

Geospatial analysis: Visualize the geographical distribution of billionaires and their wealth on a map.

Trends over time: Track changes in billionaire demographics and wealth over the years.

If this was helpful, a vote is appreciated 🙂❤️

Borsa Italiana Listino 2023-07-11, upd 2025-11-26

kaggle.com

zip

Updated Nov 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Roberto Lofaro (2025). Borsa Italiana Listino 2023-07-11, upd 2025-11-26 [Dataset]. https://www.kaggle.com/datasets/robertolofaro/borsa-italiana-listino-as-of-20221119

Explore at:

zip(40589 bytes)Available download formats

Dataset updated

Nov 26, 2025

Authors

Roberto Lofaro

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This dataset contains information in preparation of forthcoming publication: - extracted from public open data accessible via web (see "Production Notes" section at the end for details) - overall aim: comparing company data pre- and post-COVID, i.e. evolution from 2019 to 2022 (balance sheet due July 2023)

As the project progresses, more material will both added to this dataset, and within the dedicated GitHub repository

On 2025-06-02, as part of a side-project related to the same data source, derived from the scripts created previously to retrieve YahooFinance data, a new script and associated list focused on the companies within the MIB40 index.

Please refer to Github for more information, and to access the CSV and associated Jupyter Notebook.

General description: see linkedin post

Rationale of dataset and the associated project: Reading pre- and post-COVID corporate narratives, the Italian case: a dataset in fieri

See associated notebook (more charts will be added as further information willl be integrated)

Structure of the file: listino_catalog_kaggle.csv

The first file contained in this dataset is the list of stocks and warrants presented on the website of Borsa Italian as of 2023-07-11, specifically the following structure (structure latest updated on 2025-11-26, see notes below):

column	name	datatype	description
1	#	numeric	position index
2	stock	text	name of the company, as per Borsa Italiana website
3	link	URL	URL link to the page
4	market	text	subsection of the "listino", as per Borsa Italiana website
5	ISIN	text	stock identification code, starting with a 2-char country code, followed by 10 digits
6	profile	URL	URL link to the profile page for the stock (if filled by the company)
7	detailspresent	char	Y=if a page with details was linked, N=details page not present
8	withinstudy	string	only for ISINs starting with IT where there was a value within the profile URL: blank if retained within the study, "MissingReports" if financial reports are partial or not available, "NotCoveringPeriod" if some financial reports 2019-2021 are missing
9	covidstudy	string	within those selected in column 8, further restricted, based on data available, companies for a study comparing pre- and post-Covid financial and operational information; values: Y = within the study / N = excluded due to data / outofscope = not within the scope
10	industry	string	na = not available: if a value is present = as listed by industry on BorsaItaliana.it
11	subindustry	string	na = not available: if a value is present = as listed by subindustry within the industry on BorsaItaliana.it
12	2019accounts	string	languages of the 2019 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
13	2021accounts	string	languages of the 2021 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
14	UsedforENG	string	string: Y if used for the text-based part of the study, i.e. those that have EN in both "2019accounts" and "2021accounts"
15	YahooFinanceURL	URL	using the ISIN as main point of reference, the link to YahooFinance page presenting financials; where non was available, "na"
16	checkvs2021yahoo	string	included=data reconciliation successful and company included in sample; bankassfin=company excluded but included in future study on bank/assurance/finance; excluded=company excluded for other reasons
17	MIB40	string	string: Y if within the MIB40 Index; otherwise null

Note: * this table is kept as a CSV source, which was build on 2023-07-12 using the information extracted on 2023-07-11 from the Borsa Italiana website (specifically, the "listino A-Z" 30 pages available) * only the latest version of this dataset is always visible * it has been updated on 2023-08-04, adding column 8 ("withinstudy") after retrieving the financial reports for all the companies on Borsa Milano that fulfill the condition described in the table able for column 8 * it has been updated on 2023-09-03, adding column 9 ("covidstudy") after identifying which companies are part of the study (i.e. beside the other conditions, annual reports for 2019 and 2021 are available) * it has been updated on 2023-11-02, adding column 10 ("industry") and 11 ("subi...

Pinterest users in the United Kingdom 2019-2028
statista.com
Updated Dec 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Pinterest users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/3236/social-media-usage-in-the-uk/
Explore at:
Dataset updated
Dec 17, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United Kingdom
Description
The number of Pinterest users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 0.3 million users (+3.14 percent). After the ninth consecutive increasing year, the Pinterest user base is estimated to reach 9.88 million users and therefore a new peak in 2028. Notably, the number of Pinterest users of was continuously increasing over the past years.User figures, shown here regarding the platform pinterest, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Instagram users in the United Kingdom 2019-2028
statista.com
Updated Dec 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Instagram users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/3236/social-media-usage-in-the-uk/
Explore at:
Dataset updated
Dec 17, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United Kingdom
Description
The number of Instagram users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 2.1 million users (+7.02 percent). After the ninth consecutive increasing year, the Instagram user base is estimated to reach 32 million users and therefore a new peak in 2028. Notably, the number of Instagram users of was continuously increasing over the past years.User figures, shown here with regards to the platform instagram, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
SAP FI Anomaly Detection - Prepared Data & Models
kaggle.com
zip
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
aidsmlProjects (2025). SAP FI Anomaly Detection - Prepared Data & Models [Dataset]. https://www.kaggle.com/datasets/aidsmlprojects/sap-fi-anomaly-detection-prepared-data-and-models
Explore at:
zip(9285 bytes)Available download formats
Dataset updated
Apr 30, 2025
Authors
aidsmlProjects
Description
Intelligent SAP Financial Integrity Monitor

Project Status: Proof-of-Concept (POC) - Capstone Project

Overview

This project demonstrates a proof-of-concept system for detecting financial document anomalies within core SAP FI/CO data, specifically leveraging the New General Ledger table (FAGLFLEXA) and document headers (BKPF). It addresses the challenge that standard SAP reporting and rule-based checks often struggle to identify subtle, complex, or novel irregularities in high-volume financial postings.

The solution employs a Hybrid Anomaly Detection strategy, combining unsupervised Machine Learning models with expert-defined SAP business rules. Findings are prioritized using a multi-faceted scoring system and presented via an interactive dashboard built with Streamlit for efficient investigation.

This project was developed as a capstone, showcasing the application of AI/ML techniques to enhance financial controls within an SAP context, bridging deep SAP domain knowledge with modern data science practices.

Author: Anitha R (https://www.linkedin.com/in/anithaswamy)

Dataset Origin: Kaggle SAP Dataset by Sunitha Siva License:Other (specified in description)-No description available.

Motivation

Financial integrity is critical. Undetected anomalies in SAP FI/CO postings can lead to: * Inaccurate financial reporting * Significant reconciliation efforts * Potential audit failures or compliance issues * Masking of operational errors or fraud

Standard SAP tools may not catch all types of anomalies, especially complex or novel patterns. This project explores how AI/ML can augment traditional methods to provide more robust and efficient financial monitoring.

Key Features

Data Cleansing & Preparation: Rigorous process to handle common SAP data extract issues (duplicates, financial imbalance), prioritizing FAGLFLEXA for reliability.

Exploratory Data Analysis (EDA): Uncovered baseline patterns in posting times, user activity, amounts, and process context.

Feature Engineering: Created 16 context-aware features (FE_...) to quantify potential deviations from normalcy based on EDA and SAP knowledge.

Hybrid Anomaly Detection:

Ensemble ML: Utilized unsupervised models: Isolation Forest (IF), Local Outlier Factor (LOF) (via Scikit-learn), and an Autoencoder (AE) (via TensorFlow/Keras).

Expert Rules (HRFs): Implemented highly customizable High-Risk Flags based on percentile thresholds and SAP logic (e.g., weekend posting, missing cost center).

Multi-Faceted Prioritization: Combined ML model consensus (Model_Anomaly_Count) and HRF counts (HRF_Count) into a Priority_Tier for focusing investigation efforts.

Contextual Anomaly Reason: Generated a Review_Focus text description summarizing why an item was flagged.

Interactive Dashboard (Streamlit):

File upload for anomaly/feature data.

Overview KPIs (including multi-currency "Value at Risk by CoCode").

Comprehensive filtering capabilities.

Dynamic visualizations (User/Doc Type/HRF frequency, Time Trends).

Interactive AgGrid table for anomaly list investigation.

Detailed drill-down view for selected anomalies.

Methodology Overview

The project followed a structured approach:

Phase 1: Data Quality Assessment & Preparation: Cleaned and validated raw BKPF and FAGLFLEXA data extracts. Discarded BSEG due to imbalances. Removed duplicates.

Phase 2: Exploratory Data Analysis & Feature Engineering: Analyzed cleaned data patterns and engineered 16 features quantifying anomaly indicators. Resulted in sap_engineered_features.csv.

Phase 3: Baseline Anomaly Detection & Evaluation: Scaled features, applied IF and LOF models, evaluated initial results.

Phase 4: Advanced Modeling & Prioritization: Trained Autoencoder model, combined all model outputs and HRFs, implemented prioritization logic, generated context, and created the final anomaly list.

Phase 5: UI Development: Built the Streamlit dashboard for interactive analysis and investigation.

(For detailed methodology, please refer to the Comprehensive_Project_Report.pdf in the /docs folder - if you include it).

Technology Stack

Core Language: Python 3.x

Data Manipulation & Analysis: Pandas, NumPy

Machine Learning: Scikit-learn (IsolationForest, LocalOutlierFactor, StandardScaler), TensorFlow/Keras (Autoencoder)

Visualization: Matplotlib, Seaborn, Plotly Express

Dashboard: Streamlit, streamlit-aggrid

Utilities: Joblib (for saving scaler)

Libraries:

Model/Scaler Saving

joblib==1.4.2

Data I/O Efficiency (Optional but good practice if used)

pyarrow==19.0.1

Machine L...
All T20 International Matches Datasets since 2005
kaggle.com
zip
Updated May 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Azad (2023). All T20 International Matches Datasets since 2005 [Dataset]. https://www.kaggle.com/adityaazad79/all-t20-international-datasets
Explore at:
zip(1651578 bytes)Available download formats
Dataset updated
May 11, 2023
Authors
Aditya Azad
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This repo contains dataset of the T20 International matches since 2005 to 11th May 2023.

All the datasets can also be viewed over github.

Connect and follow me over linkedIn

You may use the dataset for your EDA, visualisations, BI projects, Performance analysis, Predictive modeling, Sponsorship and marketing, etc.

There are three types of datasets in the repo.

Batting_PerMatchData_T20*.csv - The dataset contains information about the performance of each batsman in each inning of the match.

Bowling_PerMatchData_T20*.csv - The dataset contains information about the performance of each bowler in each inning of the match.

Summary_T20*.csv - The dataset contains information every T20 cricket match played.

The datasets have been categorised yearwise and are presnt in their respective yearwise_folders.

I have also combined the datasets into a single one for each type of datasets which is present at the home of the repo.

All the column names and their explanations.

1. Dataset : Batting_PerMatchData_T20*.csv

match : This column represents name of the teams playing the match.

teamInnings: This column represents the team batting in the match.

battingPos: This column represents the batting position of the batsman in the innings. The batsman at the top of the order usually has a lower batting position, and the batsman at the bottom of the order has a higher batting position.

batsmanName: This column represents the name of the batsman who is currently batting in the match.

runs: This column represents the number of runs scored by the batsman in the current innings.

balls: This column represents the number of balls faced by the batsman in the current innings.

4s: This column represents the number of boundaries hit by the batsman that have crossed the boundary rope and scored four runs.

6s: This column represents the number of sixes hit by the batsman.

SR: This column represents the batting strike rate of the batsman in the current innings. It is calculated as the number of runs scored by the batsman per 100 balls faced.

out/not_out: This column represents whether the batsman is out or not out. If the batsman is not out at the end of the innings, the value in this column would be "not out" else "out".

match_id: This column represents the unique identifier of the cricket match being played, which may be used to join this table with other tables containing additional information about the match.

2. Dataset : Bowling_PerMatchData_T20*.csv

match : This column represents name of the teams playing the match.

bowlingTeam: This column represents the team that is currently bowling in the match.

bowlerName: This column represents the name of the bowler who is currently bowling in the match.

overs: This column represents the number of overs bowled by the bowler in the match. One over consists of six legal deliveries (excluding wides and no-balls).

maiden: This column represents the number of maiden overs bowled by the bowler.

runs: This column represents the total number of runs conceded by the bowler in the match.

wickets: This column represents the total number of wickets taken by the bowler in the match.

economy: This column represents the economy rate of the bowler in the match. It is calculated as the average number of runs conceded per over.

0s: This column represents the number of dot balls bowled by the bowler in the match.

4s: This column represents the number of boundaries hit by the batsman off the bowler that have crossed the boundary rope and scored four runs.

6s: This column represents the number of sixes hit by the batsman off the bowler.

wides: This column represents the number of deliveries that is bowled by the bowler outside the batsman's reach and is judged to be too wide for the batsman to play.

noBalls: This column represents the number of deliveries bowled by the bowler that is illegal for some reason, such as the bowler overstepping the crease, throwing the ball rather than bowling it, or bowling a bouncer that goes above the batsman's head.

match_id: This column represents the unique identifier of the cricket match being played.

3. Dataset : Summary_T20*.csv

Team 1: This column represents one of the teams playing in the cricket match.

Team 2: This column represents the other team playing in the cricket match.

Winner: This column represents the winning team of the cricket match. It could be either Team 1 or Team 2.

Margin: This column represents the margin of victory for the winning team. It could be represented in terms of runs, wickets or balls remaining ...
Pakistan Automobile Market: PakWheels Dataset 2024
kaggle.com
zip
Updated May 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azhar Saleem (2024). Pakistan Automobile Market: PakWheels Dataset 2024 [Dataset]. https://www.kaggle.com/azharsaleem/pakistan-automobile-market-pakwheels-dataset
Explore at:
zip(1143918 bytes)Available download formats
Dataset updated
May 11, 2024
Authors
Azhar Saleem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Pakistan
Description
👨‍💻 Author: Azhar Saleem

"https://github.com/azharsaleem18" target="_blank"> https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github" alt="GitHub Profile"> "https://www.kaggle.com/azharsaleem" target="_blank"> https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle" alt="Kaggle Profile"> "https://www.linkedin.com/in/azhar-saleem/" target="_blank"> https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin" alt="LinkedIn Profile">
"https://www.youtube.com/@AzharSaleem19" target="_blank"> https://img.shields.io/badge/YouTube-Profile-red?style=for-the-badge&logo=youtube" alt="YouTube Profile"> "https://www.facebook.com/azhar.saleem1472/" target="_blank"> https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook" alt="Facebook Profile"> "https://www.tiktok.com/@azhar_saleem18" target="_blank"> https://img.shields.io/badge/TikTok-Profile-blue?style=for-the-badge&logo=tiktok" alt="TikTok Profile">
"https://twitter.com/azhar_saleem18" target="_blank"> https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter" alt="Twitter Profile"> "https://www.instagram.com/azhar_saleem18/" target="_blank"> https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram" alt="Instagram Profile"> "mailto:azharsaleem6@gmail.com"> https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=gmail" alt="Email Contact">

Dataset Description

This dataset represents a comprehensive collection of vehicle listings from PakWheels.com, Pakistan's largest automobile website, as of 2024. It includes detailed information about various aspects of vehicles available for sale across Pakistan, including their prices, models, mileage, engine capacity, and age. This data offers a snapshot of the current automobile market in Pakistan, providing insights into vehicle valuation trends, consumer preferences, and market dynamics.

The dataset is designed for anyone interested in the Pakistani automobile market, whether they are buyers, sellers, car enthusiasts, analysts, or researchers. It provides a foundational dataset for a wide range of analytical and predictive tasks.

Columns Description

url: The URL of the vehicle listing on PakWheels.com. This can be used to reference the original listing for more detailed information or verification.

title: The title of the listing, which typically includes the make, model, and year of the vehicle.

price: The listed price of the vehicle in Pakistani Rupees. This is crucial for market analysis and price prediction models.

city: The city in Pakistan where the vehicle is located, useful for regional market analysis.

model: The manufacturing year of the vehicle. This is often a determinant of price depreciation.

mileage: The total mileage of the vehicle in kilometers, which affects the vehicle’s condition and price.

fuel_type: Type of fuel the vehicle uses (e.g., Petrol, Diesel, Hybrid), important for segment analysis based on fuel economy and environmental impact.

transmission: The type of transmission (e.g., Automatic, Manual), which influences the driving experience and market value.

registered: The city in which the vehicle is registered. This can impact the resale value and legality of the sale.

color: The color of the vehicle, which can be a factor in buyer preference.

assembly: Whether the vehicle was locally assembled or imported, affecting taxes, duties, and price.

engine_capacity: The engine capacity of the vehicle in cubic centimeters (cc), which is a key indicator of performance.

post_date: The date the listing was posted, useful for tracking market trends over time.

price_category: Categorical representation of the vehicle's price range (e.g., Low, Medium, High).

mileage_category: Categorical assessment of the vehicle's mileage (e.g., Low, Medium, High).

post_day_of_week: The day of the week the listing was posted, which might influence how quickly a vehicle sells.

vehicle_age: The age of the vehicle calculated from the model year to the current year, affecting depreciation and market value.

Usage and Potential Applications in Data Analysis

Market Analysis

Pricing Trends: Analyze how the prices of vehicles change based on age, mileage, and other factors. This can help in un...
Turkish agriculture dataset for LLM finetuning
kaggle.com
zip
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
batuhan kalem (2024). Turkish agriculture dataset for LLM finetuning [Dataset]. https://www.kaggle.com/datasets/batuhankalem/turkish-agriculture-dataset-for-llm-finetuning
Explore at:
zip(3265927 bytes)Available download formats
Dataset updated
Oct 16, 2024
Authors
batuhan kalem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Turkish Education LLM Finetune Dataset

This dataset is designed to fine-tune the T3 AI Turkish LLM. It was created by Barathan Aslan, Ömer Faruk Çelik, and Batuhan Kalem for the T3 AI Hackathon. The dataset focuses on Turkish Agriculture.

Contributors: Barathan Aslan (https://www.linkedin.com/in/barathan-aslan-715897218/) Batuhan Kalem (https://www.linkedin.com/in/batuhankalem/) Ömer Faruk Çelik (https://www.linkedin.com/in/ömerfarukçelik/)

Dataset Creation

Question-answer pairs were generated using Gemini 1.5 Flash with multiple chains of prompts. Scoring and quality assessment were performed using Gemini 1.5 Pro.

Recommendation: For optimal fine-tuning results, we suggest excluding rows with a score value lower than 6.

How to Use

Dataset provided can be used for:

Fine-tuning the T3 AI Turkish LLM. Natural language processing (NLP) tasks focused on the Turkish language. The datasets are scored based on the quality and relevance of the content, with higher scores indicating better quality.

Additionally it should be noted that:

-1 represents the "Safety" category. -2 indicates rows that were "Not Scored."
Turkish education dataset for LLM finetuning
kaggle.com
zip
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
batuhan kalem (2024). Turkish education dataset for LLM finetuning [Dataset]. https://www.kaggle.com/datasets/batuhankalem/turkish-education-dataset-for-llm-finetuning
Explore at:
zip(4715873 bytes)Available download formats
Dataset updated
Oct 16, 2024
Authors
batuhan kalem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Turkish Education LLM Finetune Dataset

This dataset is designed to fine-tune the T3 AI Turkish LLM. It was created by Barathan Aslan, Ömer Faruk Çelik, and Batuhan Kalem for the T3 AI Hackathon. The dataset focuses on Turkish Education Sytem.

Contributors: Barathan Aslan (https://www.linkedin.com/in/barathan-aslan-715897218/) Batuhan Kalem (https://www.linkedin.com/in/batuhankalem/) Ömer Faruk Çelik (https://www.linkedin.com/in/ömerfarukçelik/)

Dataset Creation

Question-answer pairs were generated using Gemini 1.5 Flash with multiple chains of prompts. Scoring and quality assessment were performed using Gemini 1.5 Pro.

Recommendation: For optimal fine-tuning results, we suggest excluding rows with a score value lower than 6.

How to Use

Dataset provided can be used for:

Fine-tuning the T3 AI Turkish LLM. Natural language processing (NLP) tasks focused on the Turkish language. The datasets are scored based on the quality and relevance of the content, with higher scores indicating better quality.

Additionally it should be noted that:

-1 represents the "Safety" category. -2 indicates rows that were "Not Scored."
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Business of Apps (2018). LinkedIn Usage and Revenue Statistics (2026) [Dataset]. https://www.businessofapps.com/data/linkedin-statistics/

LinkedIn Usage and Revenue Statistics (2026)

Explore at:

45 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 12, 2018

Dataset authored and provided by

Business of Apps

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

LinkedIn is the world’s preeminent social network for professionals. Members create CVs, list their current and previous job roles, skills and education. The business network is also a recruiting...

Clear search

Close search

Google apps

Main menu

LinkedIn Usage and Revenue Statistics (2026)

LinkedIn Compatibility Dataset: 50K Profiles

Number of LinkedIn users in the United Kingdom 2019-2028

News Popularity in Multiple Social Media Platforms

Context

Content

Dataset Information

How to make data reusable

NFDI4Cat #catalysis #fairdata #metadata

Billionaires Statistics Dataset (2023)

Description

Key Features

Potential Use Cases

Borsa Italiana Listino 2023-07-11, upd 2025-11-26

Structure of the file: listino_catalog_kaggle.csv

Pinterest users in the United Kingdom 2019-2028

Instagram users in the United Kingdom 2019-2028

SAP FI Anomaly Detection - Prepared Data & Models

Intelligent SAP Financial Integrity Monitor

Overview

Motivation

Key Features

Methodology Overview

Technology Stack

Model/Scaler Saving

Data I/O Efficiency (Optional but good practice if used)

pyarrow==19.0.1

Machine L...

All T20 International Matches Datasets since 2005

This repo contains dataset of the T20 International matches since 2005 to 11th May 2023.

All the datasets can also be viewed over github.

Connect and follow me over linkedIn

You may use the dataset for your EDA, visualisations, BI projects, Performance analysis, Predictive modeling, Sponsorship and marketing, etc.

There are three types of datasets in the repo.

The datasets have been categorised yearwise and are presnt in their respective yearwise_folders.

I have also combined the datasets into a single one for each type of datasets which is present at the home of the repo.

All the column names and their explanations.

1. Dataset : Batting_PerMatchData_T20*.csv

2. Dataset : Bowling_PerMatchData_T20*.csv

3. Dataset : Summary_T20*.csv

Pakistan Automobile Market: PakWheels Dataset 2024

👨‍💻 Author: Azhar Saleem

Dataset Description

Columns Description

Usage and Potential Applications in Data Analysis

Market Analysis

Turkish agriculture dataset for LLM finetuning

Turkish Education LLM Finetune Dataset

Dataset Creation

How to Use

Turkish education dataset for LLM finetuning

Turkish Education LLM Finetune Dataset

Dataset Creation

How to Use

LinkedIn Usage and Revenue Statistics (2026)