14 datasets found
  1. b

    LinkedIn Usage and Revenue Statistics (2026)

    • businessofapps.com
    • content.clixoni.com
    Updated Nov 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business of Apps (2018). LinkedIn Usage and Revenue Statistics (2026) [Dataset]. https://www.businessofapps.com/data/linkedin-statistics/
    Explore at:
    Dataset updated
    Nov 12, 2018
    Dataset authored and provided by
    Business of Apps
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    LinkedIn is the world’s preeminent social network for professionals. Members create CVs, list their current and previous job roles, skills and education. The business network is also a recruiting...

  2. LinkedIn Compatibility Dataset: 50K Profiles

    • kaggle.com
    zip
    Updated Dec 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Likitha Gedipudi (2025). LinkedIn Compatibility Dataset: 50K Profiles [Dataset]. https://www.kaggle.com/datasets/likithagedipudi/linkedin-compatibility-dataset-50k-profiles
    Explore at:
    zip(291244458 bytes)Available download formats
    Dataset updated
    Dec 20, 2025
    Authors
    Likitha Gedipudi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context: Professional networking is inefficient - 90% of LinkedIn connections provide minimal mutual value. This dataset enables AI models to predict networking compatibility and recommend high-value connections before they're made.

    Dataset Overview: 50,000 professional profiles paired into 500,000+ compatibility-scored combinations with detailed feature breakdowns. Synthetically generated, ML-ready data for building recommendation systems.

    Files & Schema: - profiles.csv (50,000 rows) - profile_id - Unique identifier - name, email, location - Demographics - current_role, current_company - Current position - industry - Industry category - years_experience - Total years of experience - seniority_level - entry/mid/senior/executive - skills - JSON array of skills - experience - JSON work history - education - JSON education history - connections - Network size - goals, needs, can_offer - Professional objectives (JSON) - compatibility_pairs.csv (500,000+ rows) - pair_id - Unique pair identifier - profile_a_id, profile_b_id - Profile IDs - compatibility_score - Overall match (0-100) - skill_match_score - Skill overlap - skill_complementarity_score - How skills complement - network_value_a_to_b - Network value A→B provides - network_value_b_to_a - Network value B→A provides - career_alignment_score - Mentorship/learning potential - experience_gap - Years experience difference - industry_match - Industry similarity - geographic_score - Location proximity - seniority_match - Seniority compatibility - mutual_benefit_explanation - Human-readable reasoning

  3. Number of LinkedIn users in the United Kingdom 2019-2028

    • statista.com
    Updated Dec 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Number of LinkedIn users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/3236/social-media-usage-in-the-uk/
    Explore at:
    Dataset updated
    Dec 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United Kingdom
    Description

    The number of LinkedIn users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 1.5 million users (+4.51 percent). After the eighth consecutive increasing year, the LinkedIn user base is estimated to reach 34.7 million users and therefore a new peak in 2028. User figures, shown here with regards to the platform LinkedIn, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  4. News Popularity in Multiple Social Media Platforms

    • kaggle.com
    zip
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil John (2020). News Popularity in Multiple Social Media Platforms [Dataset]. https://www.kaggle.com/nikhiljohnk/news-popularity-in-multiple-social-media-platforms
    Explore at:
    zip(10881978 bytes)Available download formats
    Dataset updated
    Oct 28, 2020
    Authors
    Nikhil John
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Social Media has been taking up everything on the Internet. People getting the latest news, useful resources, life partner and what not. In a world where Social media plays a big role in giving news, we must also know that news which affects our sentiments are going to get spread like a wildfire. Based on the Headline and the title, and according to the date given and the Social media platforms, you have to predict how it has affected the human sentiment scores. You have to predict the column “SentimentTitle” and “SentimentHeadline”.

    Content

    This is a subset of the dataset of the same name available in the UCI Machine Learning Repository The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine.

    Dataset Information

    The attributes for each of the dataset are : - IDLink (numeric): Unique identifier of news items - Title (string): Title of the news item according to the official media sources - Headline (string): Headline of the news item according to the official media sources - Source (string): Original news outlet that published the news item - Topic (string): Query topic used to obtain the items in the official media sources - Publish-Date (timestamp): Date and time of the news items' publication - Facebook (numeric): Final value of the news items' popularity according to the social media source Facebook - Google-Plus (numeric): Final value of the news items' popularity according to the social media source Google+ - LinkedIn (numeric): Final value of the news items' popularity according to the social media source LinkedIn - SentimentTitle: Sentiment score of the title, Higher the score, better is the impact or +ve sentiment and vice-versa. (Target Variable 1) - SentimentHeadline: Sentiment score of the text in the news items' headline. Higher the score, better is the impact or +ve sentiment. (Target Variable 2)

  5. f

    How to make data reusable

    • nfdimetaportal.fokus.fraunhofer.de
    • meta4ds.fokus.fraunhofer.de
    • +1more
    html
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NFDI4Cat (2024). How to make data reusable [Dataset]. https://nfdimetaportal.fokus.fraunhofer.de/datasets/b3qjcey2ne0?locale=en
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 15, 2024
    Dataset authored and provided by
    NFDI4Cat
    Description

    To promote progress in catalysis-related research, sharing FAIR (findable, accessible, interoperable, reusable) data is essential. Shared data can inspire new understanding, prevent duplication of work and even allows for new insights through artificial intelligence-based approaches. While not all data in catalysis-related research can be shared openly, it is important to make sure that the data can be understood independent from the individual research to retain its longterm value. This requires a comprehensive description with metadata, but also technical aspects such as data formats need to be kept in mind. Join us on an excursion into making data FAIR and discover the first steps to ensure that your research data will retain its value. Stay tuned for more exciting content, and thank you for being a part of our growing community!

    Check out our website: https://nfdi4cat.org/

    Follow us: https://in.linkedin.com/company/nfdi4cat https://twitter.com/NFDI4Cat

    NFDI4Cat #catalysis #fairdata #metadata

  6. Billionaires Statistics Dataset (2023)

    • kaggle.com
    zip
    Updated Sep 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Billionaires Statistics Dataset (2023) [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/billionaires-statistics-dataset
    Explore at:
    zip(142700 bytes)Available download formats
    Dataset updated
    Sep 29, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    Description

    Description

    This dataset contains statistics on the world's billionaires, including information about their businesses, industries, and personal details. It provides insights into the wealth distribution, business sectors, and demographics of billionaires worldwide.

    DOI

    Key Features

    • rank: The ranking of the billionaire in terms of wealth.
    • finalWorth: The final net worth of the billionaire in U.S. dollars.
    • category: The category or industry in which the billionaire's business operates.
    • personName: The full name of the billionaire.
    • age: The age of the billionaire.
    • country: The country in which the billionaire resides.
    • city: The city in which the billionaire resides.
    • source: The source of the billionaire's wealth.
    • industries: The industries associated with the billionaire's business interests.
    • countryOfCitizenship: The country of citizenship of the billionaire.
    • organization: The name of the organization or company associated with the billionaire.
    • selfMade: Indicates whether the billionaire is self-made (True/False).
    • status: "D" represents self-made billionaires (Founders/Entrepreneurs) and "U" indicates inherited or unearned wealth.
    • gender: The gender of the billionaire.
    • birthDate: The birthdate of the billionaire.
    • lastName: The last name of the billionaire.
    • firstName: The first name of the billionaire.
    • title: The title or honorific of the billionaire.
    • date: The date of data collection.
    • state: The state in which the billionaire resides.
    • residenceStateRegion: The region or state of residence of the billionaire.
    • birthYear: The birth year of the billionaire.
    • birthMonth: The birth month of the billionaire.
    • birthDay: The birth day of the billionaire.
    • cpi_country: Consumer Price Index (CPI) for the billionaire's country.
    • cpi_change_country: CPI change for the billionaire's country.
    • gdp_country: Gross Domestic Product (GDP) for the billionaire's country.
    • gross_tertiary_education_enrollment: Enrollment in tertiary education in the billionaire's country.
    • gross_primary_education_enrollment_country: Enrollment in primary education in the billionaire's country.
    • life_expectancy_country: Life expectancy in the billionaire's country.
    • tax_revenue_country_country: Tax revenue in the billionaire's country.
    • total_tax_rate_country: Total tax rate in the billionaire's country.
    • population_country: Population of the billionaire's country.
    • latitude_country: Latitude coordinate of the billionaire's country.
    • longitude_country: Longitude coordinate of the billionaire's country.

    Potential Use Cases

    • Wealth distribution analysis: Explore the distribution of billionaires' wealth across different industries, countries, and regions.
    • Demographic analysis: Investigate the age, gender, and birthplace demographics of billionaires.
    • Self-made vs. inherited wealth: Analyze the proportion of self-made billionaires and those who inherited their wealth.
    • Economic indicators: Study correlations between billionaire wealth and economic indicators such as GDP, CPI, and tax rates.
    • Geospatial analysis: Visualize the geographical distribution of billionaires and their wealth on a map.
    • Trends over time: Track changes in billionaire demographics and wealth over the years.

    If this was helpful, a vote is appreciated 🙂❤️

  7. Borsa Italiana Listino 2023-07-11, upd 2025-11-26

    • kaggle.com
    zip
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roberto Lofaro (2025). Borsa Italiana Listino 2023-07-11, upd 2025-11-26 [Dataset]. https://www.kaggle.com/datasets/robertolofaro/borsa-italiana-listino-as-of-20221119
    Explore at:
    zip(40589 bytes)Available download formats
    Dataset updated
    Nov 26, 2025
    Authors
    Roberto Lofaro
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains information in preparation of forthcoming publication: - extracted from public open data accessible via web (see "Production Notes" section at the end for details) - overall aim: comparing company data pre- and post-COVID, i.e. evolution from 2019 to 2022 (balance sheet due July 2023)

    As the project progresses, more material will both added to this dataset, and within the dedicated GitHub repository

    On 2025-06-02, as part of a side-project related to the same data source, derived from the scripts created previously to retrieve YahooFinance data, a new script and associated list focused on the companies within the MIB40 index.

    Please refer to Github for more information, and to access the CSV and associated Jupyter Notebook.

    General description: see linkedin post

    Rationale of dataset and the associated project: Reading pre- and post-COVID corporate narratives, the Italian case: a dataset in fieri

    See associated notebook (more charts will be added as further information willl be integrated)

    Structure of the file: listino_catalog_kaggle.csv

    The first file contained in this dataset is the list of stocks and warrants presented on the website of Borsa Italian as of 2023-07-11, specifically the following structure (structure latest updated on 2025-11-26, see notes below):

    columnnamedatatypedescription
    1#numericposition index
    2stocktextname of the company, as per Borsa Italiana website
    3linkURLURL link to the page
    4markettextsubsection of the "listino", as per Borsa Italiana website
    5ISINtextstock identification code, starting with a 2-char country code, followed by 10 digits
    6profileURLURL link to the profile page for the stock (if filled by the company)
    7detailspresentcharY=if a page with details was linked, N=details page not present
    8withinstudystringonly for ISINs starting with IT where there was a value within the profile URL: blank if retained within the study, "MissingReports" if financial reports are partial or not available, "NotCoveringPeriod" if some financial reports 2019-2021 are missing
    9covidstudystringwithin those selected in column 8, further restricted, based on data available, companies for a study comparing pre- and post-Covid financial and operational information; values: Y = within the study / N = excluded due to data / outofscope = not within the scope
    10industrystringna = not available: if a value is present = as listed by industry on BorsaItaliana.it
    11subindustrystringna = not available: if a value is present = as listed by subindustry within the industry on BorsaItaliana.it
    122019accountsstringlanguages of the 2019 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
    132021accountsstringlanguages of the 2021 accounts for companies whose "covidstudy" (column 9) is "Y"; if both English and Italian are available, EN is listed
    14UsedforENGstringstring: Y if used for the text-based part of the study, i.e. those that have EN in both "2019accounts" and "2021accounts"
    15YahooFinanceURLURLusing the ISIN as main point of reference, the link to YahooFinance page presenting financials; where non was available, "na"
    16checkvs2021yahoostringincluded=data reconciliation successful and company included in sample; bankassfin=company excluded but included in future study on bank/assurance/finance; excluded=company excluded for other reasons
    17MIB40stringstring: Y if within the MIB40 Index; otherwise null

    Note: * this table is kept as a CSV source, which was build on 2023-07-12 using the information extracted on 2023-07-11 from the Borsa Italiana website (specifically, the "listino A-Z" 30 pages available) * only the latest version of this dataset is always visible * it has been updated on 2023-08-04, adding column 8 ("withinstudy") after retrieving the financial reports for all the companies on Borsa Milano that fulfill the condition described in the table able for column 8 * it has been updated on 2023-09-03, adding column 9 ("covidstudy") after identifying which companies are part of the study (i.e. beside the other conditions, annual reports for 2019 and 2021 are available) * it has been updated on 2023-11-02, adding column 10 ("industry") and 11 ("subi...

  8. Pinterest users in the United Kingdom 2019-2028

    • statista.com
    Updated Dec 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Pinterest users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/3236/social-media-usage-in-the-uk/
    Explore at:
    Dataset updated
    Dec 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United Kingdom
    Description

    The number of Pinterest users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 0.3 million users (+3.14 percent). After the ninth consecutive increasing year, the Pinterest user base is estimated to reach 9.88 million users and therefore a new peak in 2028. Notably, the number of Pinterest users of was continuously increasing over the past years.User figures, shown here regarding the platform pinterest, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  9. Instagram users in the United Kingdom 2019-2028

    • statista.com
    Updated Dec 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Instagram users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/3236/social-media-usage-in-the-uk/
    Explore at:
    Dataset updated
    Dec 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United Kingdom
    Description

    The number of Instagram users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 2.1 million users (+7.02 percent). After the ninth consecutive increasing year, the Instagram user base is estimated to reach 32 million users and therefore a new peak in 2028. Notably, the number of Instagram users of was continuously increasing over the past years.User figures, shown here with regards to the platform instagram, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  10. SAP FI Anomaly Detection - Prepared Data & Models

    • kaggle.com
    zip
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aidsmlProjects (2025). SAP FI Anomaly Detection - Prepared Data & Models [Dataset]. https://www.kaggle.com/datasets/aidsmlprojects/sap-fi-anomaly-detection-prepared-data-and-models
    Explore at:
    zip(9285 bytes)Available download formats
    Dataset updated
    Apr 30, 2025
    Authors
    aidsmlProjects
    Description

    Intelligent SAP Financial Integrity Monitor

    Project Status: Proof-of-Concept (POC) - Capstone Project

    Overview

    This project demonstrates a proof-of-concept system for detecting financial document anomalies within core SAP FI/CO data, specifically leveraging the New General Ledger table (FAGLFLEXA) and document headers (BKPF). It addresses the challenge that standard SAP reporting and rule-based checks often struggle to identify subtle, complex, or novel irregularities in high-volume financial postings.

    The solution employs a Hybrid Anomaly Detection strategy, combining unsupervised Machine Learning models with expert-defined SAP business rules. Findings are prioritized using a multi-faceted scoring system and presented via an interactive dashboard built with Streamlit for efficient investigation.

    This project was developed as a capstone, showcasing the application of AI/ML techniques to enhance financial controls within an SAP context, bridging deep SAP domain knowledge with modern data science practices.

    Author: Anitha R (https://www.linkedin.com/in/anithaswamy)

    Dataset Origin: Kaggle SAP Dataset by Sunitha Siva License:Other (specified in description)-No description available.

    Motivation

    Financial integrity is critical. Undetected anomalies in SAP FI/CO postings can lead to: * Inaccurate financial reporting * Significant reconciliation efforts * Potential audit failures or compliance issues * Masking of operational errors or fraud

    Standard SAP tools may not catch all types of anomalies, especially complex or novel patterns. This project explores how AI/ML can augment traditional methods to provide more robust and efficient financial monitoring.

    Key Features

    • Data Cleansing & Preparation: Rigorous process to handle common SAP data extract issues (duplicates, financial imbalance), prioritizing FAGLFLEXA for reliability.
    • Exploratory Data Analysis (EDA): Uncovered baseline patterns in posting times, user activity, amounts, and process context.
    • Feature Engineering: Created 16 context-aware features (FE_...) to quantify potential deviations from normalcy based on EDA and SAP knowledge.
    • Hybrid Anomaly Detection:
      • Ensemble ML: Utilized unsupervised models: Isolation Forest (IF), Local Outlier Factor (LOF) (via Scikit-learn), and an Autoencoder (AE) (via TensorFlow/Keras).
      • Expert Rules (HRFs): Implemented highly customizable High-Risk Flags based on percentile thresholds and SAP logic (e.g., weekend posting, missing cost center).
    • Multi-Faceted Prioritization: Combined ML model consensus (Model_Anomaly_Count) and HRF counts (HRF_Count) into a Priority_Tier for focusing investigation efforts.
    • Contextual Anomaly Reason: Generated a Review_Focus text description summarizing why an item was flagged.
    • Interactive Dashboard (Streamlit):
      • File upload for anomaly/feature data.
      • Overview KPIs (including multi-currency "Value at Risk by CoCode").
      • Comprehensive filtering capabilities.
      • Dynamic visualizations (User/Doc Type/HRF frequency, Time Trends).
      • Interactive AgGrid table for anomaly list investigation.
      • Detailed drill-down view for selected anomalies.

    Methodology Overview

    The project followed a structured approach:

    1. Phase 1: Data Quality Assessment & Preparation: Cleaned and validated raw BKPF and FAGLFLEXA data extracts. Discarded BSEG due to imbalances. Removed duplicates.
    2. Phase 2: Exploratory Data Analysis & Feature Engineering: Analyzed cleaned data patterns and engineered 16 features quantifying anomaly indicators. Resulted in sap_engineered_features.csv.
    3. Phase 3: Baseline Anomaly Detection & Evaluation: Scaled features, applied IF and LOF models, evaluated initial results.
    4. Phase 4: Advanced Modeling & Prioritization: Trained Autoencoder model, combined all model outputs and HRFs, implemented prioritization logic, generated context, and created the final anomaly list.
    5. Phase 5: UI Development: Built the Streamlit dashboard for interactive analysis and investigation.

    (For detailed methodology, please refer to the Comprehensive_Project_Report.pdf in the /docs folder - if you include it).

    Technology Stack

    • Core Language: Python 3.x
    • Data Manipulation & Analysis: Pandas, NumPy
    • Machine Learning: Scikit-learn (IsolationForest, LocalOutlierFactor, StandardScaler), TensorFlow/Keras (Autoencoder)
    • Visualization: Matplotlib, Seaborn, Plotly Express
    • Dashboard: Streamlit, streamlit-aggrid
    • Utilities: Joblib (for saving scaler)

    Libraries:

    Model/Scaler Saving

    joblib==1.4.2

    Data I/O Efficiency (Optional but good practice if used)

    pyarrow==19.0.1

    Machine L...

  11. All T20 International Matches Datasets since 2005

    • kaggle.com
    zip
    Updated May 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Azad (2023). All T20 International Matches Datasets since 2005 [Dataset]. https://www.kaggle.com/adityaazad79/all-t20-international-datasets
    Explore at:
    zip(1651578 bytes)Available download formats
    Dataset updated
    May 11, 2023
    Authors
    Aditya Azad
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This repo contains dataset of the T20 International matches since 2005 to 11th May 2023.

    All the datasets can also be viewed over github.

    Connect and follow me over linkedIn

    You may use the dataset for your EDA, visualisations, BI projects, Performance analysis, Predictive modeling, Sponsorship and marketing, etc.

    There are three types of datasets in the repo.

    1. Batting_PerMatchData_T20*.csv - The dataset contains information about the performance of each batsman in each inning of the match.

    2. Bowling_PerMatchData_T20*.csv - The dataset contains information about the performance of each bowler in each inning of the match.

    3. Summary_T20*.csv - The dataset contains information every T20 cricket match played.

    The datasets have been categorised yearwise and are presnt in their respective yearwise_folders.

    I have also combined the datasets into a single one for each type of datasets which is present at the home of the repo.

    All the column names and their explanations.

    1. Dataset : Batting_PerMatchData_T20*.csv

    • match : This column represents name of the teams playing the match.

    • teamInnings: This column represents the team batting in the match.

    • battingPos: This column represents the batting position of the batsman in the innings. The batsman at the top of the order usually has a lower batting position, and the batsman at the bottom of the order has a higher batting position.

    • batsmanName: This column represents the name of the batsman who is currently batting in the match.

    • runs: This column represents the number of runs scored by the batsman in the current innings.

    • balls: This column represents the number of balls faced by the batsman in the current innings.

    • 4s: This column represents the number of boundaries hit by the batsman that have crossed the boundary rope and scored four runs.

    • 6s: This column represents the number of sixes hit by the batsman.

    • SR: This column represents the batting strike rate of the batsman in the current innings. It is calculated as the number of runs scored by the batsman per 100 balls faced.

    • out/not_out: This column represents whether the batsman is out or not out. If the batsman is not out at the end of the innings, the value in this column would be "not out" else "out".

    • match_id: This column represents the unique identifier of the cricket match being played, which may be used to join this table with other tables containing additional information about the match.

    2. Dataset : Bowling_PerMatchData_T20*.csv

    • match : This column represents name of the teams playing the match.

    • bowlingTeam: This column represents the team that is currently bowling in the match.

    • bowlerName: This column represents the name of the bowler who is currently bowling in the match.

    • overs: This column represents the number of overs bowled by the bowler in the match. One over consists of six legal deliveries (excluding wides and no-balls).

    • maiden: This column represents the number of maiden overs bowled by the bowler.

    • runs: This column represents the total number of runs conceded by the bowler in the match.

    • wickets: This column represents the total number of wickets taken by the bowler in the match.

    • economy: This column represents the economy rate of the bowler in the match. It is calculated as the average number of runs conceded per over.

    • 0s: This column represents the number of dot balls bowled by the bowler in the match.

    • 4s: This column represents the number of boundaries hit by the batsman off the bowler that have crossed the boundary rope and scored four runs.

    • 6s: This column represents the number of sixes hit by the batsman off the bowler.

    • wides: This column represents the number of deliveries that is bowled by the bowler outside the batsman's reach and is judged to be too wide for the batsman to play.

    • noBalls: This column represents the number of deliveries bowled by the bowler that is illegal for some reason, such as the bowler overstepping the crease, throwing the ball rather than bowling it, or bowling a bouncer that goes above the batsman's head.

    • match_id: This column represents the unique identifier of the cricket match being played.

    3. Dataset : Summary_T20*.csv

    • Team 1: This column represents one of the teams playing in the cricket match.

    • Team 2: This column represents the other team playing in the cricket match.

    • Winner: This column represents the winning team of the cricket match. It could be either Team 1 or Team 2.

    • Margin: This column represents the margin of victory for the winning team. It could be represented in terms of runs, wickets or balls remaining ...

  12. Pakistan Automobile Market: PakWheels Dataset 2024

    • kaggle.com
    zip
    Updated May 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azhar Saleem (2024). Pakistan Automobile Market: PakWheels Dataset 2024 [Dataset]. https://www.kaggle.com/azharsaleem/pakistan-automobile-market-pakwheels-dataset
    Explore at:
    zip(1143918 bytes)Available download formats
    Dataset updated
    May 11, 2024
    Authors
    Azhar Saleem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Pakistan
    Description

    👨‍💻 Author: Azhar Saleem

    "https://github.com/azharsaleem18" target="_blank"> https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github" alt="GitHub Profile"> "https://www.kaggle.com/azharsaleem" target="_blank"> https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle" alt="Kaggle Profile"> "https://www.linkedin.com/in/azhar-saleem/" target="_blank"> https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin" alt="LinkedIn Profile">
    "https://www.youtube.com/@AzharSaleem19" target="_blank"> https://img.shields.io/badge/YouTube-Profile-red?style=for-the-badge&logo=youtube" alt="YouTube Profile"> "https://www.facebook.com/azhar.saleem1472/" target="_blank"> https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook" alt="Facebook Profile"> "https://www.tiktok.com/@azhar_saleem18" target="_blank"> https://img.shields.io/badge/TikTok-Profile-blue?style=for-the-badge&logo=tiktok" alt="TikTok Profile">
    "https://twitter.com/azhar_saleem18" target="_blank"> https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter" alt="Twitter Profile"> "https://www.instagram.com/azhar_saleem18/" target="_blank"> https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram" alt="Instagram Profile"> "mailto:azharsaleem6@gmail.com"> https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=gmail" alt="Email Contact">

    Dataset Description

    This dataset represents a comprehensive collection of vehicle listings from PakWheels.com, Pakistan's largest automobile website, as of 2024. It includes detailed information about various aspects of vehicles available for sale across Pakistan, including their prices, models, mileage, engine capacity, and age. This data offers a snapshot of the current automobile market in Pakistan, providing insights into vehicle valuation trends, consumer preferences, and market dynamics.

    The dataset is designed for anyone interested in the Pakistani automobile market, whether they are buyers, sellers, car enthusiasts, analysts, or researchers. It provides a foundational dataset for a wide range of analytical and predictive tasks.

    Columns Description

    1. url: The URL of the vehicle listing on PakWheels.com. This can be used to reference the original listing for more detailed information or verification.
    2. title: The title of the listing, which typically includes the make, model, and year of the vehicle.
    3. price: The listed price of the vehicle in Pakistani Rupees. This is crucial for market analysis and price prediction models.
    4. city: The city in Pakistan where the vehicle is located, useful for regional market analysis.
    5. model: The manufacturing year of the vehicle. This is often a determinant of price depreciation.
    6. mileage: The total mileage of the vehicle in kilometers, which affects the vehicle’s condition and price.
    7. fuel_type: Type of fuel the vehicle uses (e.g., Petrol, Diesel, Hybrid), important for segment analysis based on fuel economy and environmental impact.
    8. transmission: The type of transmission (e.g., Automatic, Manual), which influences the driving experience and market value.
    9. registered: The city in which the vehicle is registered. This can impact the resale value and legality of the sale.
    10. color: The color of the vehicle, which can be a factor in buyer preference.
    11. assembly: Whether the vehicle was locally assembled or imported, affecting taxes, duties, and price.
    12. engine_capacity: The engine capacity of the vehicle in cubic centimeters (cc), which is a key indicator of performance.
    13. post_date: The date the listing was posted, useful for tracking market trends over time.
    14. price_category: Categorical representation of the vehicle's price range (e.g., Low, Medium, High).
    15. mileage_category: Categorical assessment of the vehicle's mileage (e.g., Low, Medium, High).
    16. post_day_of_week: The day of the week the listing was posted, which might influence how quickly a vehicle sells.
    17. vehicle_age: The age of the vehicle calculated from the model year to the current year, affecting depreciation and market value.

    Usage and Potential Applications in Data Analysis

    Market Analysis

    • Pricing Trends: Analyze how the prices of vehicles change based on age, mileage, and other factors. This can help in un...
  13. Turkish agriculture dataset for LLM finetuning

    • kaggle.com
    zip
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    batuhan kalem (2024). Turkish agriculture dataset for LLM finetuning [Dataset]. https://www.kaggle.com/datasets/batuhankalem/turkish-agriculture-dataset-for-llm-finetuning
    Explore at:
    zip(3265927 bytes)Available download formats
    Dataset updated
    Oct 16, 2024
    Authors
    batuhan kalem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Turkish Education LLM Finetune Dataset

    This dataset is designed to fine-tune the T3 AI Turkish LLM. It was created by Barathan Aslan, Ömer Faruk Çelik, and Batuhan Kalem for the T3 AI Hackathon. The dataset focuses on Turkish Agriculture.

    Contributors: Barathan Aslan (https://www.linkedin.com/in/barathan-aslan-715897218/) Batuhan Kalem (https://www.linkedin.com/in/batuhankalem/) Ömer Faruk Çelik (https://www.linkedin.com/in/ömerfarukçelik/)

    Dataset Creation

    Question-answer pairs were generated using Gemini 1.5 Flash with multiple chains of prompts. Scoring and quality assessment were performed using Gemini 1.5 Pro.

    Recommendation: For optimal fine-tuning results, we suggest excluding rows with a score value lower than 6.

    How to Use

    Dataset provided can be used for:

    Fine-tuning the T3 AI Turkish LLM. Natural language processing (NLP) tasks focused on the Turkish language. The datasets are scored based on the quality and relevance of the content, with higher scores indicating better quality.

    Additionally it should be noted that:

    -1 represents the "Safety" category. -2 indicates rows that were "Not Scored."

  14. Turkish education dataset for LLM finetuning

    • kaggle.com
    zip
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    batuhan kalem (2024). Turkish education dataset for LLM finetuning [Dataset]. https://www.kaggle.com/datasets/batuhankalem/turkish-education-dataset-for-llm-finetuning
    Explore at:
    zip(4715873 bytes)Available download formats
    Dataset updated
    Oct 16, 2024
    Authors
    batuhan kalem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Turkish Education LLM Finetune Dataset

    This dataset is designed to fine-tune the T3 AI Turkish LLM. It was created by Barathan Aslan, Ömer Faruk Çelik, and Batuhan Kalem for the T3 AI Hackathon. The dataset focuses on Turkish Education Sytem.

    Contributors: Barathan Aslan (https://www.linkedin.com/in/barathan-aslan-715897218/) Batuhan Kalem (https://www.linkedin.com/in/batuhankalem/) Ömer Faruk Çelik (https://www.linkedin.com/in/ömerfarukçelik/)

    Dataset Creation

    Question-answer pairs were generated using Gemini 1.5 Flash with multiple chains of prompts. Scoring and quality assessment were performed using Gemini 1.5 Pro.

    Recommendation: For optimal fine-tuning results, we suggest excluding rows with a score value lower than 6.

    How to Use

    Dataset provided can be used for:

    Fine-tuning the T3 AI Turkish LLM. Natural language processing (NLP) tasks focused on the Turkish language. The datasets are scored based on the quality and relevance of the content, with higher scores indicating better quality.

    Additionally it should be noted that:

    -1 represents the "Safety" category. -2 indicates rows that were "Not Scored."

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Business of Apps (2018). LinkedIn Usage and Revenue Statistics (2026) [Dataset]. https://www.businessofapps.com/data/linkedin-statistics/

LinkedIn Usage and Revenue Statistics (2026)

Explore at:
45 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 12, 2018
Dataset authored and provided by
Business of Apps
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

LinkedIn is the world’s preeminent social network for professionals. Members create CVs, list their current and previous job roles, skills and education. The business network is also a recruiting...

Search
Clear search
Close search
Google apps
Main menu