34 datasets found
  1. Countries with the most Facebook users 2024

    • statista.com
    • ai-chatbox.pro
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Countries with the most Facebook users 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Which county has the most Facebook users?

                  There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.
    
                  Facebook – the most used social media
    
                  Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.
    
                  Facebook usage by device
                  As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.
    
  2. Facebook users in the United States 2019-2028

    • statista.com
    • ai-chatbox.pro
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Facebook users in the United States 2019-2028 [Dataset]. https://www.statista.com/statistics/408971/number-of-us-facebook-users/
    Explore at:
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    The number of Facebook users in the United States was forecast to continuously increase between 2024 and 2028 by in total 12.6 million users (+5.04 percent). After the ninth consecutive increasing year, the Facebook user base is estimated to reach 262.8 million users and therefore a new peak in 2028. Notably, the number of Facebook users of was continuously increasing over the past years.User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  3. Meta updated stocks complete dataset

    • kaggle.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Atif Latif (2025). Meta updated stocks complete dataset [Dataset]. https://www.kaggle.com/datasets/matiflatif/meta-stocks-complete-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M Atif Latif
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains daily stock data for Meta Platforms, Inc. (META), formerly Facebook Inc., from May 19, 2012, to January 20, 2025. It offers a comprehensive view of Meta’s stock performance and market fluctuations during a period of significant growth, acquisitions, and technological advancements. This dataset is valuable for financial analysis, market prediction, machine learning projects, and evaluating the impact of Meta’s business decisions on its stock price.

    Content

    The dataset includes the following key features:

    Open: Stock price at the start of the trading day. High: Highest stock price during the trading day. Low: Lowest stock price during the trading day. Close: Stock price at the end of the trading day. Adj Close: Adjusted closing price, accounting for corporate actions like stock splits, dividends, and other financial adjustments. Volume: Total number of shares traded during the trading day.

    Variables

    Date: The date of the trading day, formatted as YYYY-MM-DD. Open: The stock price at the start of the trading day. High: The highest price reached by the stock during the trading day. Low: The lowest price reached by the stock during the trading day. Close: The stock price at the end of the trading day. Adj Close: The adjusted closing price, which reflects corporate actions like stock splits and dividend payouts. Volume: The total number of shares traded on that specific day.

    Acknowledgements

    This dataset was sourced from reliable public APIs such as Yahoo Finance or Alpha Vantage. It is provided for educational and research purposes and is not affiliated with Meta Platforms, Inc. Users are encouraged to adhere to the terms of use of the original data provider.

  4. 📊 Meta Kaggle| Kaggle Users' Stats

    • kaggle.com
    zip
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2025). 📊 Meta Kaggle| Kaggle Users' Stats [Dataset]. https://www.kaggle.com/datasets/bwandowando/meta-kaggle-users-stats/suggestions
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jun 4, 2025
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Image

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Ff84a67b64934ccfdd6fd4bfc24db094d%2F_982f849a-87df-44ff-94ff-3fc97c6198aa-small2.jpeg?generation=1738169001850229&alt=media" alt="">

    History

    • 03Mar2025- when determining last content shared, I am now using the latest version of Model, Dataset, and Notebook, rather than the creation date of the very first version. I also added the reaction counts which was a new csv added in the MetaKaggle dataset. The discussion can be found here . I also added versions created for Model, Notebook, and Dataset to properly track users that are updating their datasets.
    • 04Feb2025- Fixed the issue on ModelUpvotesGiven and ModelUpvotesReceived values being identical

    Context

    User aggregated stats and data using the Official Meta Kaggle dataset

    Note

    Expect some discrepancies between the counts seen in your profile, because, aside from there is a lag of one to two days before a new dataset is published, some information such as Kaggle staffs' upvotes and private competitions are not included. But for almost all members, the figures should reconcile

    Notebook updater

    📊 (Scheduled) Meta Kaggle Users' Stats

    Image

    Generated with Bing image generator

  5. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(151045619431 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  6. h

    Supporting data for "A Meta-Intervention: Quantifying the Impact of Social...

    • datahub.hku.hk
    • figshare.com
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingzhe Quan (2025). Supporting data for "A Meta-Intervention: Quantifying the Impact of Social Media Information on Adherence to Non-Pharmaceutical Interventions" [Dataset]. http://doi.org/10.25442/hku.29068061.v1
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset provided by
    HKU Data Repository
    Authors
    Mingzhe Quan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset supports a research project in the field of digital medicine, which aims to quantify the impact of disseminating scientific information on social media—as a form of "meta-intervention"—on public adherence to Non-Pharmaceutical Interventions (NPIs) during health crises such as the COVID-19 pandemic. The research encompasses multiple sub-studies and pilot experiments, drawing data from various global and China-specific social media platforms.The data included in this submission has been collected from several sources:From Sina Weibo and Tencent WeChat, 189 online poll datasets were collected, involving a total of 1,391,706 participants. These participants are users of Sina Weibo or Tencent WeChat.From Twitter, 187 tweets published by scientists (verified with a blue checkmark) related to COVID-19 were collected.From Xiaohongshu and Bilibili, textual content from 143 user posts/videos concerning COVID-19, along with associated user comments and specific user responses to a question, were gathered.It is important to note that while the broader research project also utilized a 3TB Reddit corpus hosted on Academic Torrents (academictorrents.com), this specific Reddit dataset is publicly available directly from Academic Torrents and is not included in this particular DataHub submission. The submitted dataset comprises publicly available data, formatted as Excel files (.xlsx), and includes the following:Filename: scientists' discourse (source from screenshot of tweets)Description: This file contains screenshots of tweets published by scientists on Twitter concerning COVID-19 research, its current status, and related topics. It also includes a coded analysis of the textual content from these tweets. Specific details regarding the coding scheme can be found in the readme.txt file.Filename: The links of online polls (Weibo & WeChat)Description: This data file includes information from online polls conducted on Weibo and WeChat after December 7, 2022. These polls, often initiated by verified users (who may or may not be science popularizers), aimed to track the self-reported proportion of participants testing positive for COVID-19 (via PCR or rapid antigen test) or remaining negative, particularly during periods of rapid Omicron infection spread. The file contains links to the original polls, links to the social media accounts that published these polls, and relevant metadata about both the poll-creating accounts and the online polls themselves.Filename: Online posts & comments (From Xiaohongshu & Bilibili)Description: This file contains textual content from COVID-19 related posts and videos published by users on the Xiaohongshu and Bilibili platforms. It also includes user-generated comments reacting to these posts/videos, as well as user responses to a specific question posed within the context of the original content.Key Features of this Dataset:Data Type: Mixed, including textual data, screenshots of social media posts, web links to original sources, and coded metadata.Source Platforms: Twitter (global), Weibo/WeChat (primarily China), Xiaohongshu (China), and Bilibili (video-sharing platform, primarily China).Use Case: This dataset is intended for the analysis of public discourse, the dissemination of scientific information, and user engagement patterns across different cultural contexts and social media platforms, particularly in relation to public health information.

  7. US Stock Market and Commodities Data (2020-2024)

    • kaggle.com
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Ehsan (2024). US Stock Market and Commodities Data (2020-2024) [Dataset]. https://www.kaggle.com/datasets/muhammadehsan02/us-stock-market-and-commodities-data-2020-2024/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Ehsan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The US_Stock_Data.csv dataset offers a comprehensive view of the US stock market and related financial instruments, spanning from January 2, 2020, to February 2, 2024. This dataset includes 39 columns, covering a broad spectrum of financial data points such as prices and volumes of major stocks, indices, commodities, and cryptocurrencies. The data is presented in a structured CSV file format, making it easily accessible and usable for various financial analyses, market research, and predictive modeling. This dataset is ideal for anyone looking to gain insights into the trends and movements within the US financial markets during this period, including the impact of major global events.

    Key Features and Data Structure

    The dataset captures daily financial data across multiple assets, providing a well-rounded perspective of market dynamics. Key features include:

    • Commodities: Prices and trading volumes for natural gas, crude oil, copper, platinum, silver, and gold.
    • Cryptocurrencies: Prices and volumes for Bitcoin and Ethereum, including detailed 5-minute interval data for Bitcoin.
    • Stock Market Indices: Data for major indices such as the S&P 500 and Nasdaq 100.
    • Individual Stocks: Prices and volumes for major companies including Apple, Tesla, Microsoft, Google, Nvidia, Berkshire Hathaway, Netflix, Amazon, and Meta.

    The dataset’s structure is designed for straightforward integration into various analytical tools and platforms. Each column is dedicated to a specific asset's daily price or volume, enabling users to perform a wide range of analyses, from simple trend observations to complex predictive models. The inclusion of intraday data for Bitcoin provides a detailed view of market movements.

    Applications and Usability

    This dataset is highly versatile and can be utilized for various financial research purposes:

    • Market Analysis: Track the performance of key assets, compare volatility, and study correlations between different financial instruments.
    • Risk Assessment: Analyze the impact of commodity price movements on related stock prices and evaluate market risks.
    • Educational Use: Serve as a resource for teaching market trends, asset correlation, and the effects of global events on financial markets.

    The dataset’s daily updates ensure that users have access to the most current data, which is crucial for real-time analysis and decision-making. Whether for academic research, market analysis, or financial modeling, the US_Stock_Data.csv dataset provides a valuable foundation for exploring the complexities of financial markets over the specified period.

    Acknowledgements:

    This dataset would not be possible without the contributions of Dhaval Patel, who initially curated the US stock market data spanning from 2020 to 2024. Full credit goes to Dhaval Patel for creating and maintaining the dataset. You can find the original dataset here: US Stock Market 2020 to 2024.

  8. t

    Crossroad Camera Dataset - Mobility Aid Users

    • repository.tugraz.at
    zip
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ludwig Mohr; Nadezda Kirillova; Horst Possegger; Horst Bischof; Ludwig Mohr; Nadezda Kirillova; Horst Possegger; Horst Bischof (2025). Crossroad Camera Dataset - Mobility Aid Users [Dataset]. http://doi.org/10.3217/2gat1-pev27
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2025
    Dataset provided by
    Graz University of Technology
    Authors
    Ludwig Mohr; Nadezda Kirillova; Horst Possegger; Horst Bischof; Ludwig Mohr; Nadezda Kirillova; Horst Possegger; Horst Bischof
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Oct 2022
    Description

    The most vulnerable group of traffic participants are pedestrians using mobility aids. While there has been significant progress in the robustness and reliability of camera based general pedestrian detection systems, pedestrians reliant on mobility aids are highly underrepresented in common datasets for object detection and classification.

    To bridge this gap and enable research towards robust and reliable detection systems which may be employed in traffic monitoring, scheduling, and planning, we present this dataset of a pedestrian crossing scenario taken from an elevated traffic monitoring perspective together with ground truth annotations (Yolo format [1]). Classes present in the dataset are pedestrian (without mobility aids), as well as pedestrians using wheelchairs, rollators/wheeled walkers, crutches, and walking canes. The dataset comes with official training, validation, and test splits.

    An in-depth description of the dataset can be found in [2]. If you make use of this dataset in your work, research or publication, please cite this work as:

    @inproceedings{mohr2023mau,
    author = {Mohr, Ludwig and Kirillova, Nadezda and Possegger, Horst and Bischof, Horst},
    title = {{A Comprehensive Crossroad Camera Dataset of Mobility Aid Users}},
    booktitle = {Proceedings of the 34th British Machine Vision Conference ({BMVC}2023)},
    year = {2023}
    }

    Archive mobility.zip contains the full detection dataset in Yolo format with images, ground truth labels and meta data, archive mobility_class_hierarchy.zip contains labels and meta files (Yolo format) for training with class hierarchy using e.g. the modified version of Yolo v5/v8 available under [3].
    To use this dataset with Yolo, you will need to download and extract the zip archive and change the path entry in dataset.yaml to the directory where you extracted the archive to.

    [1] https://github.com/ultralytics/ultralytics
    [2] coming soon
    [3] coming soon

  9. n

    Data from: Using multiple imputation to estimate missing data in...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Nov 25, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E. Hance Ellington; Guillaume Bastille-Rousseau; Cayla Austin; Kristen N. Landolt; Bruce A. Pond; Erin E. Rees; Nicholas Robar; Dennis L. Murray (2015). Using multiple imputation to estimate missing data in meta-regression [Dataset]. http://doi.org/10.5061/dryad.m2v4m
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 25, 2015
    Dataset provided by
    Trent University
    University of Prince Edward Island
    Authors
    E. Hance Ellington; Guillaume Bastille-Rousseau; Cayla Austin; Kristen N. Landolt; Bruce A. Pond; Erin E. Rees; Nicholas Robar; Dennis L. Murray
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. There is a growing need for scientific synthesis in ecology and evolution. In many cases, meta-analytic techniques can be used to complement such synthesis. However, missing data is a serious problem for any synthetic efforts and can compromise the integrity of meta-analyses in these and other disciplines. Currently, the prevalence of missing data in meta-analytic datasets in ecology and the efficacy of different remedies for this problem have not been adequately quantified. 2. We generated meta-analytic datasets based on literature reviews of experimental and observational data and found that missing data were prevalent in meta-analytic ecological datasets. We then tested the performance of complete case removal (a widely used method when data are missing) and multiple imputation (an alternative method for data recovery) and assessed model bias, precision, and multi-model rankings under a variety of simulated conditions using published meta-regression datasets. 3. We found that complete case removal led to biased and imprecise coefficient estimates and yielded poorly specified models. In contrast, multiple imputation provided unbiased parameter estimates with only a small loss in precision. The performance of multiple imputation, however, was dependent on the type of data missing. It performed best when missing values were weighting variables, but performance was mixed when missing values were predictor variables. Multiple imputation performed poorly when imputing raw data which was then used to calculate effect size and the weighting variable. 4. We conclude that complete case removal should not be used in meta-regression, and that multiple imputation has the potential to be an indispensable tool for meta-regression in ecology and evolution. However, we recommend that users assess the performance of multiple imputation by simulating missing data on a subset of their data before implementing it to recover actual missing data.
  10. Z

    AIT Alert Data Set

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landauer, Max (2024). AIT Alert Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8263180
    Explore at:
    Dataset updated
    Oct 14, 2024
    Dataset provided by
    Landauer, Max
    Skopik, Florian
    Wurzenberger, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the AIT Alert Data Set (AIT-ADS), a collection of synthetic alerts suitable for evaluation of alert aggregation, alert correlation, alert filtering, and attack graph generation approaches. The alerts were forensically generated from the AIT Log Data Set V2 (AIT-LDSv2) and origin from three intrusion detection systems, namely Suricata, Wazuh, and AMiner. The data sets comprise eight scenarios, each of which has been targeted by a multi-step attack with attack steps such as scans, web application exploits, password cracking, remote command execution, privilege escalation, etc. Each scenario and attack chain has certain variations so that attack manifestations and resulting alert sequences vary in each scenario; this means that the data set allows to develop and evaluate approaches that compute similarities of attack chains or merge them into meta-alerts. Since only few benchmark alert data sets are publicly available, the AIT-ADS was developed to address common issues in the research domain of multi-step attack analysis; specifically, the alert data set contains many false positives caused by normal user behavior (e.g., user login attempts or software updates), heterogeneous alert formats (although all alerts are in JSON format, their fields are different for each IDS), repeated executions of attacks according to an attack plan, collection of alerts from diverse log sources (application logs and network traffic) and all components in the network (mail server, web server, DNS, firewall, file share, etc.), and labels for attack phases. For more information on how this alert data set was generated, check out our paper accompanying this data set [1] or our GitHub repository. More information on the original log data set, including a detailed description of scenarios and attacks, can be found in [2].

    The alert data set contains two files for each of the eight scenarios, and a file for their labels:

    _aminer.json contains alerts from AMiner IDS

    _wazuh.json contains alerts from Wazuh IDS and Suricata IDS

    labels.csv contains the start and end times of attack phases in each scenario

    Beside false positive alerts, the alerts in the AIT-ADS correspond to the following attacks:

    Scans (nmap, WPScan, dirb)

    Webshell upload (CVE-2020-24186)

    Password cracking (John the Ripper)

    Privilege escalation

    Remote command execution

    Data exfiltration (DNSteal) and stopped service

    The total number of alerts involved in the data set is 2,655,821, of which 2,293,628 origin from Wazuh, 306,635 origin from Suricata, and 55,558 origin from AMiner. The numbers of alerts in each scenario are as follows. fox: 473,104; harrison: 593,948; russellmitchell: 45,544; santos: 130,779; shaw: 70,782; wardbeck: 91,257; wheeler: 616,161; wilson: 634,246.

    Acknowledgements: Partially funded by the European Defence Fund (EDF) projects AInception (101103385) and NEWSROOM (101121403), and the FFG project PRESENT (FO999899544). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. The European Union cannot be held responsible for them.

    If you use the AIT-ADS, please cite the following publications:

    [1] Landauer, M., Skopik, F., Wurzenberger, M. (2024): Introducing a New Alert Data Set for Multi-Step Attack Analysis. Proceedings of the 17th Cyber Security Experimentation and Test Workshop. [PDF]

    [2] Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): Maintainable Log Datasets for Evaluation of Intrusion Detection Systems. IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. [PDF]

  11. Z

    Data from: KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quaranta, Luigi (2024). KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4468522
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Calefato, Fabio
    Quaranta, Luigi
    Lanubile, Filippo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    KGTorrent is a dataset of Python Jupyter notebooks from the Kaggle platform.

    The dataset is accompanied by a MySQL database containing metadata about the notebooks and the activity of Kaggle users on the platform. The information to build the MySQL database has been derived from Meta Kaggle, a publicly available dataset containing Kaggle metadata.

    In this package, we share the complete KGTorrent dataset (consisting of the dataset itself plus its companion database), as well as the specific version of Meta Kaggle used to build the database.

    More specifically, the package comprises the following three compressed archives:

    KGT_dataset.tar.bz2, the dataset of Jupyter notebooks;

    KGTorrent_dump_10-2020.sql.tar.bz2, the dump of the MySQL companion database;

    MetaKaggle27Oct2020.tar.bz2, a copy of the Meta Kaggle version used to build the database.

    Moreover, we include KGTorrent_logical_schema.pdf, the logical schema of the KGTorrent MySQL database.

  12. m

    Standardized Hudup dataset based on Film Trust data

    • data.mendeley.com
    • dataverse.harvard.edu
    • +1more
    Updated Feb 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loc Nguyen (2021). Standardized Hudup dataset based on Film Trust data [Dataset]. http://doi.org/10.17632/2jbwdpz2ty.1
    Explore at:
    Dataset updated
    Feb 16, 2021
    Authors
    Loc Nguyen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standardized Hudup dataset receives information from raw data, which is composed of ten units such as “hdp_config”, “hdp_account”, “hdp_attribute_map”, “hdp_nominal”, “hdp_user”, “hdp_item”, “hdp_rating”, “hdp_context_template”, “hdp_context”, and “hdp_sample”. Each unit has particular functions, which is described in the section of data description. Hudup dataset is meta-data which models any raw data with abstract level. The raw data which is source of Hudup dataset here is Film Trust data. It is possible to consider that Hudup dataset is secondary data whereas Film Trust is primary data. The raw rating data Film Trust has 35,497 ratings from 1,508 users on 2,071 films (items), which is available at https://guoguibing.github.io/librec/datasets/filmtrust.zip.

  13. H

    Standardized Hudup dataset based on Movielens 1m

    • dataverse.harvard.edu
    • data.mendeley.com
    Updated Feb 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loc Nguyen (2021). Standardized Hudup dataset based on Movielens 1m [Dataset]. http://doi.org/10.7910/DVN/F1VQFJ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Loc Nguyen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Standardized Hudup dataset receives information from raw data, which is composed of ten units such as “hdp_config”, “hdp_account”, “hdp_attribute_map”, “hdp_nominal”, “hdp_user”, “hdp_item”, “hdp_rating”, “hdp_context_template”, “hdp_context”, and “hdp_sample”. Each unit has particular functions, which is described in the section of data description. Hudup dataset is meta-data which models any raw data with abstract level. The default raw data which is source of Hudup dataset here is Movielens 1M. It is possible to consider that Hudup dataset is secondary data whereas Movielens is primary data. The raw rating data Movielens (GroupLens, 1998) 1M has 1,000,209 ratings from 6,040 users on 3,900 movies (items), which is available at https://files.grouplens.org/datasets/movielens/ml-1m.zip.

  14. Coastal final ecosystem goods and services (FEGS) and habitats meta-analysis...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Coastal final ecosystem goods and services (FEGS) and habitats meta-analysis data file [Dataset]. https://catalog.data.gov/dataset/coastal-final-ecosystem-goods-and-services-fegs-and-habitats-meta-analysis-data-file
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Coastal ecosystem goods and services (EGS) have steadily gained traction in the scientific literature over the last few decades, providing a wealth of information about underlying coastal habitat dependencies. This meta-analysis summarizes relationships between coastal habitats and final ecosystem goods and services (FEGS) users. Through a “weight of evidence” approach synthesizing information from published literature, we assessed habitat classes most relevant to coastal users. Approximately 2800 coastal EGS journal articles were identified by online search engines, of which 16% addressed linkages between specific coastal habitats and FEGS users, and were retained for subsequent analysis. Recreational (83%) and industrial (35%) users were most cited in literature, with experiential-users/hikers and commercial fishermen most prominent in each category, respectively. Recreational users were linked to the widest diversity of coastal habitat subclasses (i.e., 22 of 26). Whereas, mangroves and emergent wetlands were most relevant for property owners. We urge EGS studies to continue surveying local users and identifying habitat dependencies, as these steps are important precursors for developing appropriate coastal FEGS metrics and facilitating local valuation. In addition, understanding how habitats contribute to human well-being may assist communities in prioritizing restoration and evaluating development scenarios in the context of future ecosystem service delivery. This dataset is associated with the following publication: Littles, C., C. Jackson, T. DeWitt, and M. Harwell. Linking People to Coastal Habitats: A meta-analysis of final ecosystem goods and services (FEGS) on the coast. Ocean & Coastal Management. Elsevier, Shannon, IRELAND, 165: 356-369, (2018).

  15. f

    The summary of datasets.

    • plos.figshare.com
    xlsx
    Updated Apr 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yixiao Zhai; Jiannan Chao; Yizheng Wang; Pinglu Zhang; Furong Tang; Quan Zou (2024). The summary of datasets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011988.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 11, 2024
    Dataset provided by
    PLOS Computational Biology
    Authors
    Yixiao Zhai; Jiannan Chao; Yizheng Wang; Pinglu Zhang; Furong Tang; Quan Zou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accurate multiple sequence alignment (MSA) is imperative for the comprehensive analysis of biological sequences. However, a notable challenge arises as no single MSA tool consistently outperforms its counterparts across diverse datasets. Users often have to try multiple MSA tools to achieve optimal alignment results, which can be time-consuming and memory-intensive. While the overall accuracy of certain MSA results may be lower, there could be local regions with the highest alignment scores, prompting researchers to seek a tool capable of merging these locally optimal results from multiple initial alignments into a globally optimal alignment. In this study, we introduce Two Pointers Meta-Alignment (TPMA), a novel tool designed for the integration of nucleic acid sequence alignments. TPMA employs two pointers to partition the initial alignments into blocks containing identical sequence fragments. It selects blocks with the high sum of pairs (SP) scores to concatenate them into an alignment with an overall SP score superior to that of the initial alignments. Through tests on simulated and real datasets, the experimental results consistently demonstrate that TPMA outperforms M-Coffee in terms of aSP, Q, and total column (TC) scores across most datasets. Even in cases where TPMA’s scores are comparable to M-Coffee, TPMA exhibits significantly lower running time and memory consumption. Furthermore, we comprehensively assessed all the MSA tools used in the experiments, considering accuracy, time, and memory consumption. We propose accurate and fast combination strategies for small and large datasets, which streamline the user tool selection process and facilitate large-scale dataset integration. The dataset and source code of TPMA are available on GitHub (https://github.com/malabz/TPMA).

  16. Top-ranked kaggler DAILY user activity (updated)

    • kaggle.com
    Updated Jul 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    piby4 (2020). Top-ranked kaggler DAILY user activity (updated) [Dataset]. https://www.kaggle.com/tomtillo/top-ranked-kaggle-user-activity-1-1000-ranks/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    piby4
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    LAST UPDATED : 20th JULY 2020

    Context

    • Do the top Kagglers comment more ??
    • Do they do the competition submissions mostly during weekends ?
    • Who are the most active kagglers from the top-ranked users ?

    A user activity is defined as

    • Making a competition submission
    • Running a script
    • Commenting on a topic
    • Creating a new dataset / updating one.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F285393%2F76ddd60b7a0afd22fadf3ed21510d52b%2Factivity_map.png?generation=1595260268658485&alt=media" alt="">

    Content

    This dataset consists of 4 sub-datasets **USER_ACTIVITY.csv ** Contains the user activity on a day-username level - submissions - comments - script runs - dataset updates

    competitions_1000_ranks.csv Top 1000 ranked kagglers ( competitions ) username - rank

    discussion_top1000_ranks.csv Top 1000 ranked kagglers ( discussions) username - rank

    scripts_top1000_ranks.csv Top 1000 ranked kagglers ( kernels ) username - rank

    userid_username_mapping.csv "kaggle id - kaggle username mapping file

    Frequency of Update

    This dataset will be updated every Monday

    Acknowledgements

    The main USER_ACTIVITY data set has been acquired from the kaggle's user activity tab ( from the user's home page ) Also other meta has been acquired from metakaggle ( public dataset)

    Inspiration

    Do the top kagglers show some pattern in they submissions, comments , dataset updates or script runs ???

  17. H

    Ethiopia - Facebook Users

    • data.humdata.org
    xlsx
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    3iS (2025). Ethiopia - Facebook Users [Dataset]. https://data.humdata.org/dataset/ethiopia-facebook-users
    Explore at:
    xlsx(48620)Available download formats
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    3iS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ethiopia
    Description

    This database contains regional estimates of Facebook users based on data from the Facebook Marketing API. It includes information on the number of individuals aged 18 and older who have accessed Facebook in the past month, with data separated by region. These estimates are intended for trend identification and triangulation purposes and are not designed to match official census data or other government sources.

    This data can be used as a proxy of internet access.

    It should be noted that there could be duplicates across different regions, and the data is anonymized by Meta.

  18. c

    Whites writing whiteness dataset

    • datacatalogue.cessda.eu
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanley, L (2025). Whites writing whiteness dataset [Dataset]. http://doi.org/10.5255/UKDA-SN-852673
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset provided by
    University of Edinburgh
    Authors
    Stanley, L
    Time period covered
    Apr 1, 2013 - Dec 31, 2016
    Area covered
    South Africa, United Kingdom
    Variables measured
    Event/process, Group, Individual, Text unit
    Measurement technique
    The principal data collection method has been archival research. It has involved detailed work on over 30 major family and related collections, working on entire collections as well as in close detail on a sample of one in five documents across these collections.
    Description

    Linked databases of research records of primary documents in named archive collections. Some 30 major collections have been worked on, producing a dataset of over 47,000 records of letters in family and related collections, with the dataset consisting of these 30 interrelated databases. A purpose-designed Virtual Research Environment (VRE) manages the epistolary data and provides tools to assist its analysis. Research questions include: In what ways was whiteness and its ‘others’ re/configured over time? How did people represent such things over time in their letter writing? What resistances and accommodations occurred in different areas of the country and from what people and networks? An important meta-concern is, how can long-term social change best be investigated and what are the problems and possibilities of letter writing in this. In addition to scholarly publications arising from the WWW research, the complete dataset with an extensive editorial apparatus is provided for secondary analysis purposes, published through HRI Online at the University of Sheffield, the U.K.'s leading publisher of primary research materials in the humanities and social sciences (see Related Resources).

    Whites Writing Whiteness investigates how ideas about ‘race’ in South Africa changed from the 1770s to the 1970s and the role of whiteness in this. It is a qualitative longitudinal research project and its primary data is letter-writing within multi-generational family networks, located in South African archive collections. Such collections are the focus because a supremely serial form of data, consequently enabling detailed investigation of change as it unfolded over the long period the research interrogates. They represent different ethnic origins, language groups, economic circumstances and areas of the country and their contents are not seen in a referential way, as sources of true or distorted facts, but as inscribing a complex representational order.

  19. d

    Finsheet - Stock Price in Excel and Google Sheet

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Do, Tuan (2023). Finsheet - Stock Price in Excel and Google Sheet [Dataset]. http://doi.org/10.7910/DVN/ZD9XVF
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Do, Tuan
    Description

    This dataset contains the valuation template the researcher can use to retrieve real-time Excel stock price and stock price in Google Sheets. The dataset is provided by Finsheet, the leading financial data provider for spreadsheet users. To get more financial data, visit the website and explore their function. For instance, if a researcher would like to get the last 30 years of income statement for Meta Platform Inc, the syntax would be =FS_EquityFullFinancials("FB", "ic", "FY", 30) In addition, this syntax will return the latest stock price for Caterpillar Inc right in your spreadsheet. =FS_Latest("CAT") If you need assistance with any of the function, feel free to reach out to their customer support team. To get starter, install their Excel and Google Sheets add-on.

  20. Pharmaceutical Tablets Dataset

    • kaggle.com
    Updated Jul 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TruMedicines (2017). Pharmaceutical Tablets Dataset [Dataset]. https://www.kaggle.com/trumedicines/pharmaceutical-tablets-dataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    TruMedicines
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    TruMedicines has trained a deep convolutional neural network to autoencode and retrieve a saved image, from a large image dataset based on the random pattern of dots on the surface of the pharmaceutical tablet (pill). Using a mobile phone app a user can query the image datebase and verify the query pill is not counterfeit and is authentic, additional meta data can be displayed to the user: manf date, manf location, drug expiration date, drug strength, adverse reactions etc.

    Content

    TruMedicines Pharmaceutical images of 252 speckled pill images. We have convoluted the images to create 20,000 training database by: rotations, grey scale, black and white, added noise, non-pill images, images are 292px x 292px in jpeg format

    In this playground competition, Kagglers are challenged to develop deep Convolutional Neural Network and hash codes to accurately identify images of pills and quickly retrieved from our database. Jpeg images of pills can be autoencoded using a CNN and retrieved using a CNN hashing code index. Our Android app takes a phone of a pill and sends a query to the image database for a match, then returns meta data abut the pill: manf date, expiration date, ingredients, adverse reactions etc. Techniques from computer vision alongside other current technologies can make recognition of non-counterfeit, medications cheaper, faster, and more reliable.

    Acknowledgements

    Special Thanks to Microsoft Paul Debaun and Steve Borg and NWCadence, Bellevue WA for their assistance

    Inspiration

    TruMedicines is using machine learning on a mobile app to stop the spread of counterfeit medicines around the world. Every year the World Health Organization WHO estimates 1 million people die or become disabled due to counterfeit medicine.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stacy Jo Dixon, Countries with the most Facebook users 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Organization logo

Countries with the most Facebook users 2024

Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description

Which county has the most Facebook users?

              There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.

              Facebook – the most used social media

              Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.

              Facebook usage by device
              As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.
Search
Clear search
Close search
Google apps
Main menu