23 datasets found
  1. Password Strength and Vulnerability Dataset

    • kaggle.com
    zip
    Updated Jul 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Utkarsh Singh (2023). Password Strength and Vulnerability Dataset [Dataset]. https://www.kaggle.com/datasets/utkarshx27/passwords
    Explore at:
    zip(6094 bytes)Available download formats
    Dataset updated
    Jul 31, 2023
    Authors
    Utkarsh Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data is sourced from Information is Beautiful, with the graphic coming from the same group here.

    There's lots of additional information about password quality & strength in the source Doc. Please note that the "strength" column in this dataset is relative to these common aka "bad" passwords and YOU SHOULDN'T USE ANY OF THEM!

    Wikipedia has a nice article on password strength as well.

    Data Dictionary

    passwords.csv

    variableclassdescription
    rankdoublepopularity in their database of released passwords
    passwordcharacterActual text of the password
    categorycharacterWhat category does the password fall in to?
    valuedoubleTime to crack by online guessing
    time_unitcharacterTime unit to match with value
    offline_crack_secdoubleTime to crack offline in seconds
    rank_altdoubleRank 2
    strengthdoubleStrength = quality of password where 10 is highest, 1 is lowest, please note that these are relative to these generally bad passwords
    font_sizedoubleUsed to create the graphic for KIB
  2. a

    CrackStation's Password Cracking Dictionary

    • academictorrents.com
    bittorrent
    Updated Mar 22, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Defuse Security (2018). CrackStation's Password Cracking Dictionary [Dataset]. https://academictorrents.com/details/fd62cc1d79f595cbe1de6356fb13c2165994e469
    Explore at:
    bittorrent(4500756826)Available download formats
    Dataset updated
    Mar 22, 2018
    Dataset authored and provided by
    Defuse Security
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The list contains every wordlist, dictionary, and password database leak that I could find on the internet (and I spent a LOT of time looking). It also contains every word in the Wikipedia databases (pages-articles, retrieved 2010, all languages) as well as lots of books from Project Gutenberg. It also includes the passwords from some low-profile database breaches that were being sold in the underground years ago. The format of the list is a standard text file sorted in non-case-sensitive alphabetical order. Lines are separated with a newline " " character. You can test the list without downloading it by giving SHA256 hashes to the free hash cracker or to @PlzCrack on twitter. Here s a tool for computing hashes easily. Here are the results of cracking LinkedIn s and eHarmony s password hash leaks with the list. The list is responsible for cracking about 30% of all hashes given to CrackStation s free hash cracker, but that figure should be taken with a grain of salt because s

  3. Bruteforce Database - Password dictionaries

    • kaggle.com
    zip
    Updated Feb 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taranveer Singh Anttal (2020). Bruteforce Database - Password dictionaries [Dataset]. https://www.kaggle.com/taranvee/bruteforce-database-password-dictionaries
    Explore at:
    zip(42314895 bytes)Available download formats
    Dataset updated
    Feb 19, 2020
    Authors
    Taranveer Singh Anttal
    Description

    Password dictionaries:

    8-more-passwords.txt sorting only passwords with more than 8 characters, removed all numeric passes, removed consecutive characters (3 characters or more), removed all-lowercase passwords, passwords without a capital letter and also a number (61,682 passwords). 7-more-passwords.txt it consists of passwords 7 characters or more, and numeric passwords removed (528,136 passwords). 1000000_password_seclists.txt 1,000,000 password from SecLists bitcoin-brainwallet.lst bitcoin-brainwallet with 394,748 lines usernames.txt collection username of/at US - 403,355 lines us-cities.txt list 20,580 cities at US facebook-firstnames.txt 4,347,667 of Facebook first names. 2151220-passwords.txt 2,151,220 passwords from dazzlepod.com subdomains-10000.txt 10,000 subdomain for domain scanner. 38650-password-sktorrent.txt 38,650 passwords from sktorrent.eu. uniqpass_v16_password.txt UNIQPASS is a large password list for use with John the Ripper (JtR) wordlist mode to translate large number of hashes, e.g. MD5 hashes, into cleartext passwords indo-cities.txt list 102 cities at Indonesia 38650-username-sktorrent.txt 38,650 usernames from sktorrent.eu. forced-browsing every wordlist you need for forced browsing.

    MIT License

    Copyright (c) 2015 Van-Duyet Le

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  4. Ultimate Cybersecurity Password & Username Dataset

    • kaggle.com
    zip
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hrterhrter (2024). Ultimate Cybersecurity Password & Username Dataset [Dataset]. https://www.kaggle.com/datasets/programmerrdai/brute-force-database
    Explore at:
    zip(42311134 bytes)Available download formats
    Dataset updated
    Jun 9, 2024
    Authors
    hrterhrter
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    8-more-passwords.txt: Contains passwords with more than 8 characters. Excludes numeric-only passwords, consecutive characters (3 or more), all-lowercase passwords, and passwords without at least one capital letter and one number. Total: 61,682 passwords. 7-more-passwords.txt: Includes passwords with 7 characters or more. Numeric passwords are removed. Total: 528,136 passwords. 1000000_password_seclists.txt: A collection of 1,000,000 passwords from SecLists. bitcoin-brainwallet.lst: Bitcoin brainwallet with 394,748 entries. usernames.txt: Collection of 403,355 usernames from the US. us-cities.txt: List of 20,580 cities in the US. facebook-firstnames.txt: Contains 4,347,667 first names from Facebook. 2151220-passwords.txt: Collection of 2,151,220 passwords from dazzlepod.com. subdomains-10000.txt: List of 10,000 subdomains for domain scanning. 38650-password-sktorrent.txt: Contains 38,650 passwords from sktorrent.eu. uniqpass_v16_password.txt: UNIQPASS is a large password list for use with John the Ripper (JtR) in wordlist mode to convert large numbers of hashes, such as MD5, into cleartext passwords. indo-cities.txt: List of 102 cities in Indonesia. 38650-username-sktorrent.txt: Contains 38,650 usernames from sktorrent.eu. forced-browsing: Contains every wordlist needed for forced browsing.

  5. ❗RockYou2024.txt| 10B Common Passwords List

    • kaggle.com
    zip
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). ❗RockYou2024.txt| 10B Common Passwords List [Dataset]. https://www.kaggle.com/datasets/bwandowando/common-password-list-rockyou2024-txt
    Explore at:
    zip(56987494791 bytes)Available download formats
    Dataset updated
    Jul 10, 2024
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Image

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Ff29f742e3d48f66bf0eccf60abf631d1%2Frockyo2.png?generation=1720539563047126&alt=media" alt="">

    Kaggle Previous Version of RockYou.txt

    https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">

    The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.

    Files

    I separated the single 160GB txt file into smaller files with filenames based on first character to make it easier to utilize for those with less powerful machines.

    • letters (A-Z)
    • digits (0-9)
    • dollarsymbol ($)
    • symbols (other symbols)
    • others (those that cant be categorized by any of those above)

    Note

    • The original 160GB file was written with an encoding of utf8, I used the same encoding for the files above.
    • The contents of the files above are UNSORTED
    • The contents are NOT DEDUPLICATED

    History

    Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.

    https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">

    RockYou2024.txt

    With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.

    From The New RockYou2024 Collection has been published!

    Source

    I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.

    Original TxtFile

    In case you'd like to process the RockYou2024.txt yourself, you can find it here ❗Original RockYou2024.txt zip file

    Strong Passwords Only

    In case you'd like to see only the "Strong Passwords", you can find it here ❗180 Million "Strong Passwords" in RockYou2024.txt

    Cover Image

    Generated with Bing Image Generator

  6. Charset distribution of leaked user passwords worldwide 2017

    • statista.com
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Charset distribution of leaked user passwords worldwide 2017 [Dataset]. https://www.statista.com/statistics/744183/worldwide-character-set-distribution-of-passwords/
    Explore at:
    Dataset updated
    Jan 9, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Aug 2017
    Area covered
    Worldwide
    Description

    The statistic shows the distribution of password character sets found among various databases leaked online as of 2017. From 320 million hashed passwords that were analyzed, 49 percent were found to be a mix of numbers and lowercase alphabetic characters. Just two percent of passwords were a mix of numbers, upper- and lowercase alphabetic characters, and symbols.

  7. P

    Password Management Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Password Management Market Report [Dataset]. https://www.promarketreports.com/reports/password-management-market-7993
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 11, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Password Management Market was valued at USD 2 Billion in 2023 and is projected to reach USD 6.37 Billion by 2032, with an expected CAGR of 18% during the forecast period. Recent developments include: July 2022: Google updated its password managers by integrating various highly requested features to help consumers, like auto-login, credential saving, and password generation. This led to enhanced market growth owing to the higher utilization of the Google Chrome browser for web surfing and remote working., June 2022: Lookout Inc. acquired SaferPass, offering simple and secure password managers for enterprises and individuals. The acquisition helps in delivering proactive security platforms to safeguard user data and privacy while broadening the business footprint., January 2022: Keepers Security launched Secrets Manager, which secured infrastructure credentials like API keys, certificates, access keys, and database passwords. The solution included cloud-based integration with a zero-knowledge security model similar to their enterprise password management platform..

  8. ❗RockYou2024.txt| 180 Million "Strong Passwords"

    • kaggle.com
    zip
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). ❗RockYou2024.txt| 180 Million "Strong Passwords" [Dataset]. https://www.kaggle.com/datasets/bwandowando/strong-passwords-in-rockyou2024-txt
    Explore at:
    zip(876464069 bytes)Available download formats
    Dataset updated
    Jul 11, 2024
    Authors
    BwandoWando
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Image

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4a05853cd3e61cc5414534f8c8a82c32%2Fstrongpassword2.png?generation=1720631443593955&alt=media" alt="">

    Description

    I extracted all entries from the RockYou2024.txt with the following characteristics - Between 8 to 32 characters - Has at least one upper-case character - Has at least one small-case character - Has at least one digit - Has at least one punctuation mark

    Note

    • The contents per file are DEDUPLICATED
    • The contents per file are SORTED
    • White spaces between characters are INCLUDED

    Kaggle Previous Version of RockYou.txt

    https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">

    The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.

    Note

    • The original 160GB file was written with an encoding of utf8, I used the same encoding for the files above.

    History

    Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.

    https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">

    RockYou2024.txt

    With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.

    From The New RockYou2024 Collection has been published!

    Source

    I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.

    Variations of Dataset

    Reminder

    Use responsibly

    Cover Image

    Generated with Bing Image Generator

  9. p

    Dark Web Gmail Credentials Database

    • passwordrevelator.net
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Various Hacking Groups (2023). Dark Web Gmail Credentials Database [Dataset]. https://www.passwordrevelator.net/en/passbreaker
    Explore at:
    Dataset updated
    Oct 29, 2023
    Dataset authored and provided by
    Various Hacking Groups
    Description

    Over 3.2 billion email addresses with passwords posted on Dark Web from massive Google platform data leaks

  10. Leaked passwords of the VimeWorld minecraft server

    • kaggle.com
    zip
    Updated Dec 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PocoMoco (2023). Leaked passwords of the VimeWorld minecraft server [Dataset]. https://www.kaggle.com/datasets/tempuserpavelbiz/leaked-passwords-of-the-vimeworld-minecraft-server/discussion
    Explore at:
    zip(88294251 bytes)Available download formats
    Dataset updated
    Dec 9, 2023
    Authors
    PocoMoco
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About data 🎈

    The Minecraft database of the VimeWorld server, containing passwords in md5 and bcrypt hashes. I bruteforce passwords only for md5 hashes, as they are the easiest to crack. In total, I managed to decrypt just over 90% of all md5 hashes.

    The structure parquet file 🟡

    1. username - player nickname;
    2. username_cc - content of certain characters in the nickname, namely:
      • d - digits
      • l - lowercase letters
      • u - uppercase letters
      • s - special characters
    3. username_len - nickname length
    4. password_value - password
    5. password_mask - password mask (p 2.)
    6. password_cc - (p 2.)
    7. password_len - password length
  11. G

    Secretless Database Connectivity Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Secretless Database Connectivity Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/secretless-database-connectivity-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Oct 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Secretless Database Connectivity Market Outlook



    According to our latest research, the global Secretless Database Connectivity market size reached USD 1.47 billion in 2024, reflecting robust demand for secure, seamless, and scalable database access solutions across industries. The market is poised to expand at a CAGR of 17.8% from 2025 to 2033, with the forecasted market size projected to reach USD 6.13 billion by 2033. This remarkable growth is primarily driven by the urgent need to address rising cybersecurity threats, regulatory pressures, and the increasing adoption of cloud-native architectures that demand advanced, secretless approaches to database connectivity.




    A significant growth factor for the Secretless Database Connectivity market is the escalating sophistication of cyberattacks targeting database credentials and access points. Traditional methods of database authentication, which rely on static secrets such as passwords and API keys, are increasingly vulnerable to breaches and leaks. The shift towards secretless architecture, where credentials are abstracted and managed dynamically without exposing them to applications or users, offers a powerful mitigation against these risks. Enterprises are recognizing the value of secretless solutions in reducing the attack surface, enhancing compliance with regulations like GDPR and HIPAA, and simplifying credential management processes. This awareness is fueling market adoption across sectors, particularly in industries handling sensitive or regulated data.




    Another pivotal driver is the rapid proliferation of cloud-native technologies, microservices, and containerized environments. Modern application architectures demand agile, scalable, and automated approaches to database connectivity, which traditional secret management tools often struggle to deliver. Secretless Database Connectivity enables seamless integration with DevOps pipelines, supports dynamic scaling, and eliminates the operational burden of credential rotation and distribution. Organizations pursuing digital transformation and cloud migration initiatives are increasingly turning to secretless solutions to achieve continuous delivery, operational efficiency, and improved developer productivity. This alignment with broader IT modernization trends is expected to sustain high growth rates in the market.




    Furthermore, the evolving regulatory landscape is compelling organizations to adopt more robust security postures, including secretless access mechanisms. Regulatory frameworks across North America, Europe, and Asia Pacific are imposing stricter requirements for data privacy, access control, and auditability. Secretless Database Connectivity solutions provide comprehensive logging, access policies, and real-time monitoring capabilities that help organizations demonstrate compliance and avoid costly penalties. As enterprises face mounting pressure to safeguard customer data and intellectual property, the adoption of secretless approaches is becoming a strategic imperative, further accelerating market expansion.




    From a regional perspective, North America currently dominates the Secretless Database Connectivity market, accounting for the largest revenue share in 2024 due to the presence of leading technology vendors, early adoption of advanced cybersecurity solutions, and a highly regulated business environment. However, the Asia Pacific region is expected to exhibit the fastest growth rate over the forecast period, driven by rapid digitalization, increasing cloud adoption, and rising awareness of data security best practices among enterprises and government agencies. Europe also represents a significant market, underpinned by stringent data protection regulations and a mature IT ecosystem. The interplay of these regional dynamics is shaping the global competitive landscape and creating new opportunities for vendors and service providers.





    Component Analysis



    The Secretless Database Connectivity market by component is segmented into software, hardware, and services. The softw

  12. e

    Individuals who have used the Internet in the last 12 months and have a...

    • data.europa.eu
    html, unknown
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE (2024). Individuals who have used the Internet in the last 12 months and have a digital certificate or a one-time smsPASS password generator and reasons why they do not have one, by age groups and sex, Slovenia, 2019 [Dataset]. https://data.europa.eu/data/datasets/surs2982325s?locale=en
    Explore at:
    html, unknownAvailable download formats
    Dataset updated
    Jun 11, 2024
    Dataset authored and provided by
    VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE
    Area covered
    Slovenia
    Description

    This database automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source database entitled “Individuals who have used the Internet in the last 12 months and have a digital certificate or certificate or one-time password generator smsPASS and the reasons why they do not have them, by age class and sex, Slovenia, 2019”.

    Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.

  13. Data from: SQL Injection Attack Netflow

    • zenodo.org
    • portalcientifico.unileon.es
    • +3more
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. http://doi.org/10.5281/zenodo.6907252
    Explore at:
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ignacio Crespo; Ignacio Crespo; Adrián Campazas; Adrián Campazas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

    NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    Datasets

    The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

    The datasets contain both benign and malicious traffic. All collected datasets are balanced.

    The version of NetFlow used to build the datasets is 5.

    DatasetAimSamplesBenign-malicious
    traffic ratio
    D1Training400,00350%
    D2Test57,23950%

    Infrastructure and implementation

    Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

    DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

    Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

    The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

    The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

    ParametersDescription
    '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'Enumerate users, password hashes, privileges, roles, databases, tables and columns
    --level=5Increase the probability of a false positive identification
    --risk=3Increase the probability of extracting data
    --random-agentSelect the User-Agent randomly
    --batchNever ask for user input, use the default behavior
    --answers="follow=Y"Predefined answers to yes

    Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

    The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
    The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

    However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

    To run the MySQL server we ran MariaDB version 10.4.12.
    Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

  14. ❗RockYou2024.txt| Original zip file

    • kaggle.com
    zip
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). ❗RockYou2024.txt| Original zip file [Dataset]. https://www.kaggle.com/datasets/bwandowando/original-rockyou2024-text-file-11-parts
    Explore at:
    zip(45855445391 bytes)Available download formats
    Dataset updated
    Jul 10, 2024
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Image

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd4dd9853c2214e89f179cfb72f85be9b%2Fhacker2.png?generation=1720601229197012&alt=media" alt="">

    Kaggle Previous Version of RockYou.txt

    https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">

    The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.

    Files

    This is the original RockYou2024.txt file just Zipped and spliced into 11 parts.

    History

    Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.

    https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">

    RockYou2024.txt

    With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.

    From The New RockYou2024 Collection has been published!

    Source

    I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.

    Cover Image

    Generated with Bing Image Generator

  15. Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nurmi, Juha; Niemelä, Mikko; Brumley, Billy (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Cyber Intelligence Househttps://cyberintelligencehouse.com/
    Tampere University
    Authors
    Nurmi, Juha; Niemelä, Mikko; Brumley, Billy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

    Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

    We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

    1. MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

    2. VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

    3. AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

    Credits Authors

    Billy Bob Brumley (Tampere University, Tampere, Finland)

    Juha Nurmi (Tampere University, Tampere, Finland)

    Mikko Niemelä (Cyber Intelligence House, Singapore)

    Funding

    This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

    Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.

  16. List of Top Data Breaches (2004 - 2021)

    • kaggle.com
    zip
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hishaam Armghan (2021). List of Top Data Breaches (2004 - 2021) [Dataset]. https://www.kaggle.com/datasets/hishaamarmghan/list-of-top-data-breaches-2004-2021
    Explore at:
    zip(5975 bytes)Available download formats
    Dataset updated
    Sep 3, 2021
    Authors
    Hishaam Armghan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a dataset containing all the major data breaches in the world from 2004 to 2021

    As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

    This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

    Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

    Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

  17. m

    Dataset of 7632 images of 53 Devanagari Alphabet Images Across 144 Spatial...

    • data.mendeley.com
    Updated May 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SANJAY PATE (2025). Dataset of 7632 images of 53 Devanagari Alphabet Images Across 144 Spatial Positions in 5×5 Grids with 2×2 Sub grid Localization for Grid based Graphical Password Authentication. [Dataset]. http://doi.org/10.17632/4x2jtpmtvg.1
    Explore at:
    Dataset updated
    May 13, 2025
    Authors
    SANJAY PATE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a structured image dataset designed to facilitate research in spatial localization, pattern recognition, and character classification. It contains high-resolution images of 53 distinct alphabet characters, each systematically placed within a standardized 5×5 grid layout. Each 5×5 grid consists of 25 individual cells. Within each grid, we define 16 overlapping 2×2 sub-grids. These sub-grids serve as local regions of interest for fine-grained spatial analysis. In each 2×2 sub-grid, there are 9 specific positional locations where an alphabet image can be placed—cantered within or slightly offset relative to the subgrid to provide a range of spatial variation. This results in a total of 144 unique placement positions for each character across the entire 5×5 grid. For every alphabet character, the dataset includes an image placed in each of these 144 locations, leading to a comprehensive total of 7,632 labeled samples (53 characters × 144 positions). All samples are consistent in size and format, and the position of each character is precisely annotated to facilitate supervised learning tasks. The Devanagari 53 Alphabet dataset is ideal for training and evaluating models on tasks such as character localization, grid-based graphical password , and few-shot learning under positional variation. The structured spatial layout and extensive position coverage also make it suitable for research in visual attention models, object detection benchmarks, and spatially-aware neural architectures.

  18. Z

    Identity as Service Market By Component Type (Provisioning, Directory...

    • zionmarketresearch.com
    pdf
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zion Market Research (2025). Identity as Service Market By Component Type (Provisioning, Directory services, Password management , Single sign-on , Advanced authentication and Others) , By End Use (BFSI, IT & Telecom, Public ,Healthcare, Retail, Education, and Manufacturing): Global Industry Perspective, Comprehensive Analysis, and Forecast, 2024 - 2032 [Dataset]. https://www.zionmarketresearch.com/report/identity-as-a-service-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    Zion Market Research
    License

    https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy

    Time period covered
    2022 - 2030
    Area covered
    Global
    Description

    Identity as Service Market size is set to expand from $ 6.53 Billion in 2023 to $ 57.73 Billion by 2032, with CAGR of around 27.4% from 2024 to 2032.

  19. w

    Training.gov.au - Web service access to sandbox environment

    • data.wu.ac.at
    • researchdata.edu.au
    Updated Apr 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Education and Training (2018). Training.gov.au - Web service access to sandbox environment [Dataset]. https://data.wu.ac.at/schema/data_gov_au/NjcxNmY0NzgtZjYxNi00ZjVkLTkyOGQtZTc2YjE1Mzg3ZGM0
    Explore at:
    Dataset updated
    Apr 4, 2018
    Dataset provided by
    Department of Education and Training
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    67e25f9b8bb1f25f4d0cc2401a81249522675030
    Description

    Introduction

    Training.gov.au (TGA) is the National Register of Vocational Education and Training in Australia and contains authoritative information about Registered Training Organisations (RTOs), Nationally Recognised Training (NRT) and the approved scope of each RTO to deliver NRT as required in national and jurisdictional legislation.

    TGA web-services overview

    TGA has a web service available to allow external systems to access and utilise information stored in TGA through an external system. The TGA web service is exposed through a single interface and web service users are assigned a data reader role which will apply to all data stored in the TGA.

    The web service can be broadly split into three categories:

    1. RTOs and other organisation types;

    2. Training components including Accredited courses, Accredited course Modules Training Packages, Qualifications, Skill Sets and Units of Competency;

    3. System metadata including static data and statistical classifications.

    Users will gain access to the TGA web service by first passing a user name and password through to the web server. The web server will then authenticate the user against the TGA security provider before passing the request to the application that supplies the web services.

    There are two web services environments:

    1. Production - ws.training.gov.au – National Register production web services

    2. Sandbox - ws.sandbox.training.gov.au – National Register sandbox web services.

    The National Register sandbox web service is used to test against the current version of the web services where the functionality will be identical to the current production release. The web service definition and schema of the National Register sandbox database will also be identical to that of production release at any given point in time. The National Register sandbox database will be cleared down at regular intervals and realigned with the National Register production environment.

    Each environment has three configured services:

    1. Organisation Service;

    2. Training Component Service; and

    3. Classification Service.

    Sandbox environment access

    To access the download area for web services, navigate to http://tga.hsd.com.au and use the below name and password:

    Username: WebService.Read (case sensitive)

    Password: Asdf098 (case sensitive)

    This download area contains various versions of the following artefacts that you may find useful

    • Training.gov.au web service specification document;

    • Training.gov.au logical data model and definitions document;

    • .NET web service SDK sample app (with source code);

    • Java sample client (with source code);

    • How to setup web service client in VS 2010 video; and

    • Web services WSDL's and XSD's.

    For the business areas, the specification/definition documents and the sample application is a good place to start while the IT areas will find the sample source code and the video useful to start developing against the TGA web services.

    The web services Sandbox end point is: https://ws.sandbox.training.gov.au/Deewr.Tga.Webservices

    Production web service access

    Once you are ready to access the production web service, please email the TGA team at tgaproject@education.gov.au to obtain a unique user name and password.

  20. Z

    Cloud IAM Market By components (password administration and audit, user...

    • zionmarketresearch.com
    pdf
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zion Market Research (2025). Cloud IAM Market By components (password administration and audit, user provisioning, access administration, directory services, single sign-on, authority and compliance administration), by end-user (instance, small and medium businesses (SMBs) and ventures), By verticals (telecom and IT, BFSI, energy, public sector and utilities, oil and gas, healthcare, edification, manufacturing, and retail) And By Region: - Global And Regional Industry Overview, Market Intelligence, Comprehensive Analysis, Historical Data, And Forecasts, 2023-2030 [Dataset]. https://www.zionmarketresearch.com/report/cloud-iam-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    Zion Market Research
    License

    https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy

    Time period covered
    2022 - 2030
    Area covered
    Global
    Description

    Cloud IAM Market was valued at $5.59 B in 2023, and is projected to reach $USD 25.31 B by 2032, at a CAGR of 18.26% from 2023 to 2032.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Utkarsh Singh (2023). Password Strength and Vulnerability Dataset [Dataset]. https://www.kaggle.com/datasets/utkarshx27/passwords
Organization logo

Password Strength and Vulnerability Dataset

Password Popularity, Crack Times, and Strength Ratings

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(6094 bytes)Available download formats
Dataset updated
Jul 31, 2023
Authors
Utkarsh Singh
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Data is sourced from Information is Beautiful, with the graphic coming from the same group here.

There's lots of additional information about password quality & strength in the source Doc. Please note that the "strength" column in this dataset is relative to these common aka "bad" passwords and YOU SHOULDN'T USE ANY OF THEM!

Wikipedia has a nice article on password strength as well.

Data Dictionary

passwords.csv

variableclassdescription
rankdoublepopularity in their database of released passwords
passwordcharacterActual text of the password
categorycharacterWhat category does the password fall in to?
valuedoubleTime to crack by online guessing
time_unitcharacterTime unit to match with value
offline_crack_secdoubleTime to crack offline in seconds
rank_altdoubleRank 2
strengthdoubleStrength = quality of password where 10 is highest, 1 is lowest, please note that these are relative to these generally bad passwords
font_sizedoubleUsed to create the graphic for KIB
Search
Clear search
Close search
Google apps
Main menu