Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data is sourced from Information is Beautiful, with the graphic coming from the same group here.
There's lots of additional information about password quality & strength in the source Doc. Please note that the "strength" column in this dataset is relative to these common aka "bad" passwords and YOU SHOULDN'T USE ANY OF THEM!
Wikipedia has a nice article on password strength as well.
passwords.csv| variable | class | description |
|---|---|---|
| rank | double | popularity in their database of released passwords |
| password | character | Actual text of the password |
| category | character | What category does the password fall in to? |
| value | double | Time to crack by online guessing |
| time_unit | character | Time unit to match with value |
| offline_crack_sec | double | Time to crack offline in seconds |
| rank_alt | double | Rank 2 |
| strength | double | Strength = quality of password where 10 is highest, 1 is lowest, please note that these are relative to these generally bad passwords |
| font_size | double | Used to create the graphic for KIB |
Facebook
Twitterhttps://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
The list contains every wordlist, dictionary, and password database leak that I could find on the internet (and I spent a LOT of time looking). It also contains every word in the Wikipedia databases (pages-articles, retrieved 2010, all languages) as well as lots of books from Project Gutenberg. It also includes the passwords from some low-profile database breaches that were being sold in the underground years ago. The format of the list is a standard text file sorted in non-case-sensitive alphabetical order. Lines are separated with a newline " " character. You can test the list without downloading it by giving SHA256 hashes to the free hash cracker or to @PlzCrack on twitter. Here s a tool for computing hashes easily. Here are the results of cracking LinkedIn s and eHarmony s password hash leaks with the list. The list is responsible for cracking about 30% of all hashes given to CrackStation s free hash cracker, but that figure should be taken with a grain of salt because s
Facebook
TwitterPassword dictionaries:
8-more-passwords.txt sorting only passwords with more than 8 characters, removed all numeric passes, removed consecutive characters (3 characters or more), removed all-lowercase passwords, passwords without a capital letter and also a number (61,682 passwords). 7-more-passwords.txt it consists of passwords 7 characters or more, and numeric passwords removed (528,136 passwords). 1000000_password_seclists.txt 1,000,000 password from SecLists bitcoin-brainwallet.lst bitcoin-brainwallet with 394,748 lines usernames.txt collection username of/at US - 403,355 lines us-cities.txt list 20,580 cities at US facebook-firstnames.txt 4,347,667 of Facebook first names. 2151220-passwords.txt 2,151,220 passwords from dazzlepod.com subdomains-10000.txt 10,000 subdomain for domain scanner. 38650-password-sktorrent.txt 38,650 passwords from sktorrent.eu. uniqpass_v16_password.txt UNIQPASS is a large password list for use with John the Ripper (JtR) wordlist mode to translate large number of hashes, e.g. MD5 hashes, into cleartext passwords indo-cities.txt list 102 cities at Indonesia 38650-username-sktorrent.txt 38,650 usernames from sktorrent.eu. forced-browsing every wordlist you need for forced browsing.
MIT License
Copyright (c) 2015 Van-Duyet Le
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
8-more-passwords.txt: Contains passwords with more than 8 characters. Excludes numeric-only passwords, consecutive characters (3 or more), all-lowercase passwords, and passwords without at least one capital letter and one number. Total: 61,682 passwords. 7-more-passwords.txt: Includes passwords with 7 characters or more. Numeric passwords are removed. Total: 528,136 passwords. 1000000_password_seclists.txt: A collection of 1,000,000 passwords from SecLists. bitcoin-brainwallet.lst: Bitcoin brainwallet with 394,748 entries. usernames.txt: Collection of 403,355 usernames from the US. us-cities.txt: List of 20,580 cities in the US. facebook-firstnames.txt: Contains 4,347,667 first names from Facebook. 2151220-passwords.txt: Collection of 2,151,220 passwords from dazzlepod.com. subdomains-10000.txt: List of 10,000 subdomains for domain scanning. 38650-password-sktorrent.txt: Contains 38,650 passwords from sktorrent.eu. uniqpass_v16_password.txt: UNIQPASS is a large password list for use with John the Ripper (JtR) in wordlist mode to convert large numbers of hashes, such as MD5, into cleartext passwords. indo-cities.txt: List of 102 cities in Indonesia. 38650-username-sktorrent.txt: Contains 38,650 usernames from sktorrent.eu. forced-browsing: Contains every wordlist needed for forced browsing.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Ff29f742e3d48f66bf0eccf60abf631d1%2Frockyo2.png?generation=1720539563047126&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">
The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.
I separated the single 160GB txt file into smaller files with filenames based on first character to make it easier to utilize for those with less powerful machines.
Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.
https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">
With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.
From The New RockYou2024 Collection has been published!
I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.
In case you'd like to process the RockYou2024.txt yourself, you can find it here ❗Original RockYou2024.txt zip file
In case you'd like to see only the "Strong Passwords", you can find it here ❗180 Million "Strong Passwords" in RockYou2024.txt
Generated with Bing Image Generator
Facebook
TwitterThe statistic shows the distribution of password character sets found among various databases leaked online as of 2017. From 320 million hashed passwords that were analyzed, 49 percent were found to be a mix of numbers and lowercase alphabetic characters. Just two percent of passwords were a mix of numbers, upper- and lowercase alphabetic characters, and symbols.
Facebook
Twitterhttps://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The size of the Password Management Market was valued at USD 2 Billion in 2023 and is projected to reach USD 6.37 Billion by 2032, with an expected CAGR of 18% during the forecast period. Recent developments include: July 2022: Google updated its password managers by integrating various highly requested features to help consumers, like auto-login, credential saving, and password generation. This led to enhanced market growth owing to the higher utilization of the Google Chrome browser for web surfing and remote working., June 2022: Lookout Inc. acquired SaferPass, offering simple and secure password managers for enterprises and individuals. The acquisition helps in delivering proactive security platforms to safeguard user data and privacy while broadening the business footprint., January 2022: Keepers Security launched Secrets Manager, which secured infrastructure credentials like API keys, certificates, access keys, and database passwords. The solution included cloud-based integration with a zero-knowledge security model similar to their enterprise password management platform..
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4a05853cd3e61cc5414534f8c8a82c32%2Fstrongpassword2.png?generation=1720631443593955&alt=media" alt="">
I extracted all entries from the RockYou2024.txt with the following characteristics - Between 8 to 32 characters - Has at least one upper-case character - Has at least one small-case character - Has at least one digit - Has at least one punctuation mark
https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">
The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.
Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.
https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">
With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.
From The New RockYou2024 Collection has been published!
I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.
Use responsibly
Generated with Bing Image Generator
Facebook
TwitterOver 3.2 billion email addresses with passwords posted on Dark Web from massive Google platform data leaks
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Minecraft database of the VimeWorld server, containing passwords in md5 and bcrypt hashes. I bruteforce passwords only for md5 hashes, as they are the easiest to crack. In total, I managed to decrypt just over 90% of all md5 hashes.
Facebook
Twitter
According to our latest research, the global Secretless Database Connectivity market size reached USD 1.47 billion in 2024, reflecting robust demand for secure, seamless, and scalable database access solutions across industries. The market is poised to expand at a CAGR of 17.8% from 2025 to 2033, with the forecasted market size projected to reach USD 6.13 billion by 2033. This remarkable growth is primarily driven by the urgent need to address rising cybersecurity threats, regulatory pressures, and the increasing adoption of cloud-native architectures that demand advanced, secretless approaches to database connectivity.
A significant growth factor for the Secretless Database Connectivity market is the escalating sophistication of cyberattacks targeting database credentials and access points. Traditional methods of database authentication, which rely on static secrets such as passwords and API keys, are increasingly vulnerable to breaches and leaks. The shift towards secretless architecture, where credentials are abstracted and managed dynamically without exposing them to applications or users, offers a powerful mitigation against these risks. Enterprises are recognizing the value of secretless solutions in reducing the attack surface, enhancing compliance with regulations like GDPR and HIPAA, and simplifying credential management processes. This awareness is fueling market adoption across sectors, particularly in industries handling sensitive or regulated data.
Another pivotal driver is the rapid proliferation of cloud-native technologies, microservices, and containerized environments. Modern application architectures demand agile, scalable, and automated approaches to database connectivity, which traditional secret management tools often struggle to deliver. Secretless Database Connectivity enables seamless integration with DevOps pipelines, supports dynamic scaling, and eliminates the operational burden of credential rotation and distribution. Organizations pursuing digital transformation and cloud migration initiatives are increasingly turning to secretless solutions to achieve continuous delivery, operational efficiency, and improved developer productivity. This alignment with broader IT modernization trends is expected to sustain high growth rates in the market.
Furthermore, the evolving regulatory landscape is compelling organizations to adopt more robust security postures, including secretless access mechanisms. Regulatory frameworks across North America, Europe, and Asia Pacific are imposing stricter requirements for data privacy, access control, and auditability. Secretless Database Connectivity solutions provide comprehensive logging, access policies, and real-time monitoring capabilities that help organizations demonstrate compliance and avoid costly penalties. As enterprises face mounting pressure to safeguard customer data and intellectual property, the adoption of secretless approaches is becoming a strategic imperative, further accelerating market expansion.
From a regional perspective, North America currently dominates the Secretless Database Connectivity market, accounting for the largest revenue share in 2024 due to the presence of leading technology vendors, early adoption of advanced cybersecurity solutions, and a highly regulated business environment. However, the Asia Pacific region is expected to exhibit the fastest growth rate over the forecast period, driven by rapid digitalization, increasing cloud adoption, and rising awareness of data security best practices among enterprises and government agencies. Europe also represents a significant market, underpinned by stringent data protection regulations and a mature IT ecosystem. The interplay of these regional dynamics is shaping the global competitive landscape and creating new opportunities for vendors and service providers.
The Secretless Database Connectivity market by component is segmented into software, hardware, and services. The softw
Facebook
TwitterThis database automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source database entitled “Individuals who have used the Internet in the last 12 months and have a digital certificate or certificate or one-time password generator smsPASS and the reasons why they do not have them, by age class and sex, Slovenia, 2019”.
Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.
NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
Datasets
The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).
The datasets contain both benign and malicious traffic. All collected datasets are balanced.
The version of NetFlow used to build the datasets is 5.
| Dataset | Aim | Samples | Benign-malicious traffic ratio |
|---|---|---|---|
| D1 | Training | 400,003 | 50% |
| D2 | Test | 57,239 | 50% |
Infrastructure and implementation
Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.
DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)
Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).
The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.
The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.
| Parameters | Description |
|---|---|
| '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema' | Enumerate users, password hashes, privileges, roles, databases, tables and columns |
| --level=5 | Increase the probability of a false positive identification |
| --risk=3 | Increase the probability of extracting data |
| --random-agent | Select the User-Agent randomly |
| --batch | Never ask for user input, use the default behavior |
| --answers="follow=Y" | Predefined answers to yes |
Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).
The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.
However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.
To run the MySQL server we ran MariaDB version 10.4.12.
Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd4dd9853c2214e89f179cfb72f85be9b%2Fhacker2.png?generation=1720601229197012&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1842206%2F0e4b20e3662c065318f7feefb42ef785%2Foriginal.png?generation=1720578063663708&alt=media" alt="">
The original RockYou.txt dataset was uploaded by @wjburns 5 years ago, with 95K downloads and 640 upvotes, which means Kaggle allows this type of data for research and educational purposes.
This is the original RockYou2024.txt file just Zipped and spliced into 11 parts.
Everyone involved with Capture The Flag (CTF) has used the infamous rockyou.txt wordlist at least once, mainly to perform password cracking activities. The file is a list of 14 million unique passwords originating from the 2009 RockYou hack making a piece of computer security history. The “rockyou lineage” has evolved over the years.
https://www.youtube.com/watch?v=0_mQACSn6XM" alt="">
With the 2021 version we touched high numbers but with the newest release is the (apparently) ultimate amalgamation. RockYou2024 has been released by the user “ObamaCare” . This new version added 1.5 billion of records to the 2021 version reaching the 10 billions records. A wordlist can potentially be used for a multitude of tasks and having this number of records in a single file, especially in 2024 with increasingly aggressive data breaches, is a dream come true for attackers. The user have not specified the nature of the additional records but punctualize the new data comes from recent leaked databases.
From The New RockYou2024 Collection has been published!
I got it from https://github.com/hkphh/rockyou2024.txt, but it was originally shared by a certain aka ObamaCare which I don't have any affiliation nor association with.
Generated with Bing Image Generator
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Billy Bob Brumley (Tampere University, Tampere, Finland)
Juha Nurmi (Tampere University, Tampere, Finland)
Mikko Niemelä (Cyber Intelligence House, Singapore)
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset containing all the major data breaches in the world from 2004 to 2021
As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.
This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?
Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches
Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a structured image dataset designed to facilitate research in spatial localization, pattern recognition, and character classification. It contains high-resolution images of 53 distinct alphabet characters, each systematically placed within a standardized 5×5 grid layout. Each 5×5 grid consists of 25 individual cells. Within each grid, we define 16 overlapping 2×2 sub-grids. These sub-grids serve as local regions of interest for fine-grained spatial analysis. In each 2×2 sub-grid, there are 9 specific positional locations where an alphabet image can be placed—cantered within or slightly offset relative to the subgrid to provide a range of spatial variation. This results in a total of 144 unique placement positions for each character across the entire 5×5 grid. For every alphabet character, the dataset includes an image placed in each of these 144 locations, leading to a comprehensive total of 7,632 labeled samples (53 characters × 144 positions). All samples are consistent in size and format, and the position of each character is precisely annotated to facilitate supervised learning tasks. The Devanagari 53 Alphabet dataset is ideal for training and evaluating models on tasks such as character localization, grid-based graphical password , and few-shot learning under positional variation. The structured spatial layout and extensive position coverage also make it suitable for research in visual attention models, object detection benchmarks, and spatially-aware neural architectures.
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Identity as Service Market size is set to expand from $ 6.53 Billion in 2023 to $ 57.73 Billion by 2032, with CAGR of around 27.4% from 2024 to 2032.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Training.gov.au (TGA) is the National Register of Vocational Education and Training in Australia and contains authoritative information about Registered Training Organisations (RTOs), Nationally Recognised Training (NRT) and the approved scope of each RTO to deliver NRT as required in national and jurisdictional legislation.
TGA has a web service available to allow external systems to access and utilise information stored in TGA through an external system. The TGA web service is exposed through a single interface and web service users are assigned a data reader role which will apply to all data stored in the TGA.
The web service can be broadly split into three categories:
RTOs and other organisation types;
Training components including Accredited courses, Accredited course Modules Training Packages, Qualifications, Skill Sets and Units of Competency;
System metadata including static data and statistical classifications.
Users will gain access to the TGA web service by first passing a user name and password through to the web server. The web server will then authenticate the user against the TGA security provider before passing the request to the application that supplies the web services.
There are two web services environments:
1. Production - ws.training.gov.au – National Register production web services
2. Sandbox - ws.sandbox.training.gov.au – National Register sandbox web services.
The National Register sandbox web service is used to test against the current version of the web services where the functionality will be identical to the current production release. The web service definition and schema of the National Register sandbox database will also be identical to that of production release at any given point in time. The National Register sandbox database will be cleared down at regular intervals and realigned with the National Register production environment.
Each environment has three configured services:
Organisation Service;
Training Component Service; and
Classification Service.
To access the download area for web services, navigate to http://tga.hsd.com.au and use the below name and password:
Username: WebService.Read (case sensitive)
Password: Asdf098 (case sensitive)
This download area contains various versions of the following artefacts that you may find useful
• Training.gov.au web service specification document;
• Training.gov.au logical data model and definitions document;
• .NET web service SDK sample app (with source code);
• Java sample client (with source code);
• How to setup web service client in VS 2010 video; and
• Web services WSDL's and XSD's.
For the business areas, the specification/definition documents and the sample application is a good place to start while the IT areas will find the sample source code and the video useful to start developing against the TGA web services.
The web services Sandbox end point is: https://ws.sandbox.training.gov.au/Deewr.Tga.Webservices
Once you are ready to access the production web service, please email the TGA team at tgaproject@education.gov.au to obtain a unique user name and password.
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Cloud IAM Market was valued at $5.59 B in 2023, and is projected to reach $USD 25.31 B by 2032, at a CAGR of 18.26% from 2023 to 2032.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data is sourced from Information is Beautiful, with the graphic coming from the same group here.
There's lots of additional information about password quality & strength in the source Doc. Please note that the "strength" column in this dataset is relative to these common aka "bad" passwords and YOU SHOULDN'T USE ANY OF THEM!
Wikipedia has a nice article on password strength as well.
passwords.csv| variable | class | description |
|---|---|---|
| rank | double | popularity in their database of released passwords |
| password | character | Actual text of the password |
| category | character | What category does the password fall in to? |
| value | double | Time to crack by online guessing |
| time_unit | character | Time unit to match with value |
| offline_crack_sec | double | Time to crack offline in seconds |
| rank_alt | double | Rank 2 |
| strength | double | Strength = quality of password where 10 is highest, 1 is lowest, please note that these are relative to these generally bad passwords |
| font_size | double | Used to create the graphic for KIB |