12 datasets found
  1. "Pwned Passwords" Dataset

    • academictorrents.com
    bittorrent
    Updated Aug 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    haveibeenpwned.com (2018). "Pwned Passwords" Dataset [Dataset]. https://academictorrents.com/details/53555c69e3799d876159d7290ea60e56b35e36a9
    Explore at:
    bittorrent(11101449979)Available download formats
    Dataset updated
    Aug 3, 2018
    Dataset provided by
    Have I Been Pwned?http://haveibeenpwned.com/
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Version 3 with 517M hashes and counts of password usage ordered by most to least prevalent Pwned Passwords are 517,238,891 real world passwords previously exposed in data breaches. This exposure makes them unsuitable for ongoing use as they re at much greater risk of being used to take over other accounts. They re searchable online below as well as being downloadable for use in other online system. The entire set of passwords is downloadable for free below with each password being represented as a SHA-1 hash to protect the original value (some passwords contain personally identifiable information) followed by a count of how many times that password had been seen in the source data breaches. The list may be integrated into other systems and used to verify whether a password has previously appeared in a data breach after which a system may warn the user or even block the password outright.

  2. All-time biggest online data breaches 2025

    • statista.com
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/
    Explore at:
    Dataset updated
    May 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2025
    Area covered
    Worldwide
    Description

    The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

    Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.

  3. a

    CrackStation's Password Cracking Dictionary (Human Passwords Only)

    • academictorrents.com
    bittorrent
    Updated Aug 10, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Defuse Security (2014). CrackStation's Password Cracking Dictionary (Human Passwords Only) [Dataset]. https://academictorrents.com/details/7ae809ccd7f0778328ab4b357e777040248b8c7f
    Explore at:
    bittorrent(257973006)Available download formats
    Dataset updated
    Aug 10, 2014
    Dataset authored and provided by
    Defuse Security
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The list contains every wordlist, dictionary, and password database leak that I could find on the internet (and I spent a LOT of time looking). It also contains every word in the Wikipedia databases (pages-articles, retrieved 2010, all languages) as well as lots of books from Project Gutenberg. It also includes the passwords from some low-profile database breaches that were being sold in the underground years ago. The format of the list is a standard text file sorted in non-case-sensitive alphabetical order. Lines are separated with a newline " " character. You can test the list without downloading it by giving SHA256 hashes to the free hash cracker or to @PlzCrack on twitter. Here s a tool for computing hashes easily. Here are the results of cracking LinkedIn s and eHarmony s password hash leaks with the list. The list is responsible for cracking about 30% of all hashes given to CrackStation s free hash cracker, but that figure should be taken with a grain of salt because s

  4. Number of data compromises and impacted individuals in U.S. 2005-2024

    • statista.com
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

  5. i

    Data from: Rockyou

    • ieee-dataport.org
    Updated Apr 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeeshan Shaikh (2021). Rockyou [Dataset]. https://ieee-dataport.org/documents/rockyou
    Explore at:
    Dataset updated
    Apr 27, 2021
    Authors
    Zeeshan Shaikh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Passwords that were leaked or stolen from sites. The Rockyou Dataset is about 14 million passwords.

  6. Italy number of data sets affected in data breaches Q1 2020-Q2 2024

    • statista.com
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Italy number of data sets affected in data breaches Q1 2020-Q2 2024 [Dataset]. https://www.statista.com/statistics/1453453/number-of-records-exposed-italy/
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Italy
    Description

    Between the first quarter of 2020 and the first second quarter of 2024, the number of records exposed in data breaches in Italy experienced a significant decrease. In the most recent measured period, approximately one million records were reported as leaked, down from around 10.2 million data sets affected in the first quarter of 2021.

  7. e

    Eximpedia Export Import Trade

    • eximpedia.app
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Eximpedia PTE LTD
    Eximpedia Export Import Trade Data
    Authors
    Seair Exim
    Area covered
    China, Indonesia, Barbados, Tanzania, Mauritania, Cambodia, Mozambique, American Samoa, Christmas Island, Ghana
    Description

    Access Leak Detector import export data of global countries with importers' & exporters' details, shipment date, price, hs code, ports, quantity etc.

  8. Number of accounts affected in data breaches Thailand Q2 2022-Q3 2024

    • statista.com
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of accounts affected in data breaches Thailand Q2 2022-Q3 2024 [Dataset]. https://www.statista.com/statistics/1404553/thailand-number-of-account-breaches-exposed/
    Explore at:
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Thailand
    Description

    Between the second quarter of 2022 and the third quarter of 2024, the number of records exposed to account breaches in Thailand fluctuated significantly. Over ******* datasets were reported as having been leaked in the third quarter of 2024, compared to around ******* during the same quarter of the previous year.

  9. Leaked U.S. Department of Defense DMED (Defense Medical Epidemiology...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Samuel Sigoloff (2024). Leaked U.S. Department of Defense DMED (Defense Medical Epidemiology Database) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5985601
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    United States Department of Defensehttp://www.defense.gov/
    Thomas Renz
    Lt. Col. Peter Chambers
    Leigh Dundas
    Dr. Samuel Sigoloff
    Lt. Col. Theresa Long
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This is a database and accompanying report first appearing on a google drive link shared by Attorney Thomas Renz of Renz Law with the Epoch Times as well as embedded into Thomas Renz' website.

    According to DHA's Armed Forces Surveillance Division, the data in this database is incorrect for the years 2016-2020. (Watson, 2022)

    DMED is a web tool to query medical event data contained within the Defense Medical Surveillance System. The AFHSD claims that due to a serious error in their system, data between the years of 2016-2020 has been incredibly under-reported which has lead to the appearance of a significant increase of occurrences of medical diagnoses in 2021. (Watson 2022)

    References:

    Renz, T. (2021, October 1). Attorney Tom Renz discovers leaked DOD covid files. Renz Law. Retrieved February 6, 2022, from https://renz-law.com/attorney-tom-renz-discovers-leaked-dod-covid-files/

    Watson, S. (2022, February 6). Pentagon responds to DOD whistleblowers' claim of spiking disease rates in the military after Covid Vaccine Mandate. InfoWars. Retrieved February 6, 2022, from https://www.infowars.com/posts/pentagon-responds-to-dod-whistleblowers-claim-of-spiking-disease-rates-in-the-military-after-covid-vaccine-mandate/

  10. d

    Acoustic detection for undersea oil leaks project: programs and algorithms...

    • search.dataone.org
    • data.griidc.org
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lu, Zhiqu (2025). Acoustic detection for undersea oil leaks project: programs and algorithms dataset [Dataset]. http://doi.org/10.7266/ZP35J344
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    GRIIDC
    Authors
    Lu, Zhiqu
    Description

    The U.S. outer continental shelf is a major source of energy for the United States. The rapid growth of oil and gas production in the Gulf of Mexico increases the risk of underwater oil spills at greater water depths and drilling wells. These hydrocarbons leakages can be caused by either natural events, such as seeping from fissures in the ocean seabed, or by anthropogenic accidents, such as leaking from broken wellheads and pipelines. In order to improve safety and reduce the environmental risks of offshore oil and gas operations, the Bureau of Safety and Environmental Enforcement recommended the use of real-time monitoring. An early warning system for detecting, locating, and characterizing hydrocarbon leakages is essential for preventing the next oil spill as well as for seafloor hydrocarbon seepage detection. Existing monitoring techniques have significant limitations and cannot achieve real-time monitoring. This project launches an effort to develop a functional real-time monitoring system that uses passive acoustic technologies to detect, locate, and characterize undersea hydrocarbon leakages over large areas in a cost-effective manner.

    In an oil spill event, the leaked hydrocarbon is injected into seawater with huge amounts of discharge at high speeds. With mixed natural gases and oils, this hydrocarbon leakage creates underwater sound through two major mechanisms: shearing and turbulence by a streaming jet of oil droplets and gas bubbles, and bubble oscillation and collapse. These acoustic emissions can be recorded by hydrophones in the water column at far distances. They will be characterized and differentiated from other underwater noises through their unique frequency spectrum, evolution and transportation processes and leaking positions, and further be utilized to detect and position the leakage locations.

    With the objective of leakage detection and localization, our approach consists of recording and modeling the acoustic signals induced by the oil-spill and implementing advanced signal processing and triangulation localization techniques with a hydrophone network.

    Tasks of this project are: 1. Conduct a laboratory study to simulate hydrocarbon leakages and their induced sound under controlled conditions, and to establish the correlation between frequency spectra and leakage properties, such as oil-jet intensities and speeds, bubble radii and distributions, and crack sizes. 2. Implement and develop acoustic bubble modeling for estimating features and strength of the oil leakage. 3. Develop a set of advanced signal processing and triangulation algorithms for leakage detection and localization.

    The experimental data have been collected in a water tank in the building of the National Center for Physical Acoustics, the University of Mississippi from 2018-2020, including hydrophone recorded underwater sounds generated by oil leakage bubbles under different testing conditions, such as pressures, flow rates, jet velocities, and crack sizes, and movies of oil leakages. Two types of oil leakages (a few bubbles and constant flow bubbles) were tested to simulate oil seepages either from seafloors or from oil well and pipe-line breaches. Two types of gases were investigated (nitrogen and methane). These data were analyzed for acoustic bubble modeling, oil leakage characterization, and localization.

    This dataset contains programs and algorithms. The folders of the dataset are described as follows: • the folder of “signal processing programs†contains programs (LabView VIs) for instrument control, data acquisition, and signal processing. • the folders of “modeling algorithms†contains algorithms (Matlab m-files) for acoustic bubble sound modeling. • the folder of “localization algorithms†contains algorithms (MatLab m-files) for oil leakage source localization.

    More details of this dataset can be found in the corresponding ReadMe files in each folder. Associated data may be found in S3.x911.000:0001 (bubble sound characterization and modeling data, doi:10.7266/3REPB7QM); S3.x911.000:0002 (test data, doi: 10.7266/NPYZ3XFV); S3.x911.000:0003 (raw sound data and validation of modeled source positions, doi: 10.7266/4S9EBZKX); S3.x911.000:0005 (imagery of the laboratory experiment, doi: 10.7266/BZY62EK0).

  11. d

    Replication Data and Code for \"Incentives and Information in Methane Leak...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lewis, Eric (2024). Replication Data and Code for \"Incentives and Information in Methane Leak Detection and Repair\" [Dataset]. http://doi.org/10.7910/DVN/BAVBGX
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Lewis, Eric
    Description

    Replication Data and Code for "Incentives and Information in Methane Leak Detection and Repair" Abstract: Capturing leaked methane can be a win for both firms and the environment. However, leakage volume uncertainty can be a barrier inhibiting leak repair. We study an experiment at oil and gas production sites which randomized whether site operators were informed of methane leakage volumes. At sites with high baseline leakage, we estimate a negative but imprecise effect of information on endline emissions. But at sites with zero measured leakage, giving firms information about methane leakage increased emissions at endline. Our results suggest that giving firms news of low leakage disincentivizes maintenance effort, thereby increasing the likelihood of future leaks. Package includes data from Wang et al. (2024) RCT as well as IEA data on estimated methane emissions and methane abatement costs. Package also includes code for replication.

  12. B

    The Nauru Files

    • borealisdata.ca
    • search.dataone.org
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Guardian (2024). The Nauru Files [Dataset]. http://doi.org/10.5683/SP3/JWHSU9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    Borealis
    Authors
    The Guardian
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    May 12, 2013 - Oct 29, 2015
    Area covered
    Australia, Nauru
    Description

    About The Nauru Files contain the largest set of documents published from inside Australia's immigration detention system. Leaked to The Guardian in 2016, they include nearly 2,000 incident reports from the Nauru detention centre, which were written by guards, caseworkers and teachers on the remote Pacific island. Summary Examples of events include assaults, injuries, abuse and other forms of violence reported at the detention centre between 2013 and 2015. As noted by The Guardian, as well as academic research, Australia has privatised its immigration detention centres and exported detention of asylum seekers offshore to places such as Nauru and Manus Island in Papua New Guinea. This strategy is part of a wider "Pacific Solution" implemented by the Government of Australia since the early 2000s as a hardline deterrent to "stop the boats." Effectively, asylum seekers intercepted and detained on Nauru are removed from access to Australia's asylum system. Data Structure These data are composed of incident reports. An incident report is a short summary of an event in the Nauru detention centre written by staff there. Some of the details found in the files may be triggering; we therefore advise caution with reading and analysing these data. According to The Guardian, these reports form part of the Government of Australia's requirements to document what is happening within its detention system. Each report holds detailed information of the incident at the detention centre along with a "summary log". Working with The Guardian, we have organised these data into two forms: a PDF of each incident report, sorted by name at the time of leak, and a CSV/JSON of all incident reports (see "nauru_files.csv/json"), which structures key details into variables within its columns. Examples of variables include time, incident type, severity and description. Combined, these form a structured database linking each incident report to these variables. Data Source The Guardian has modified the original, leaked data to remove any personally-identifying information within them. To achieve this, a stringent approach of redaction has been implemented to remove names of asylum seekers and staff, personal identification numbers of asylum seekers, signatures of detention staff, nationalities within small population groups and residential tent numbers, among other things. There are also a large number of acronyms used in these data. For your convenience, we have provided an RTF document with a listing of these acronyms and their meanings. If you use these data, please cite the original source at The Guardian: The Guardian. (10 August 2016). The Nauru Files: The lives of asylum seekers in detention detailed in a unique database. Retrieved from https://www.theguardian.com/australia-news/ng-interactive/2016/aug/10/the-nauru-files-the-lives-of-asylum-seekers-in-detention-detailed-in-a-unique-database-interactive. Should you have any comments, questions or requested edits or extensions to the Nauru files, please contact Haven at kira.williams@utoronto.ca. For more articles from The Guardian on these data, see: The Nauru files: cache of 2,000 leaked reports reveal scale of abuse of children in Australian offshore detention. A short history of Nauru, Australia’s dumping ground for refugees. ‘I want death’: Nauru files chronicle despair of asylum seeker children.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
haveibeenpwned.com (2018). "Pwned Passwords" Dataset [Dataset]. https://academictorrents.com/details/53555c69e3799d876159d7290ea60e56b35e36a9
Organization logo

"Pwned Passwords" Dataset

Explore at:
bittorrent(11101449979)Available download formats
Dataset updated
Aug 3, 2018
Dataset provided by
Have I Been Pwned?http://haveibeenpwned.com/
License

https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

Description

Version 3 with 517M hashes and counts of password usage ordered by most to least prevalent Pwned Passwords are 517,238,891 real world passwords previously exposed in data breaches. This exposure makes them unsuitable for ongoing use as they re at much greater risk of being used to take over other accounts. They re searchable online below as well as being downloadable for use in other online system. The entire set of passwords is downloadable for free below with each password being represented as a SHA-1 hash to protect the original value (some passwords contain personally identifiable information) followed by a count of how many times that password had been seen in the source data breaches. The list may be integrated into other systems and used to verify whether a password has previously appeared in a data breach after which a system may warn the user or even block the password outright.

Search
Clear search
Close search
Google apps
Main menu