37 datasets found

All-time biggest online data breaches 2025
statista.com
tokrwards.com
+1more
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/
Explore at:
Dataset updated
May 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2025
Area covered
Worldwide
Description
The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.
"Pwned Passwords" Dataset
academictorrents.com
bittorrent
Updated Aug 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
haveibeenpwned.com (2018). "Pwned Passwords" Dataset [Dataset]. https://academictorrents.com/details/53555c69e3799d876159d7290ea60e56b35e36a9
Explore at:
bittorrent(11101449979)Available download formats
Dataset updated
Aug 3, 2018
Dataset provided by
Have I Been Pwned?http://haveibeenpwned.com/
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Version 3 with 517M hashes and counts of password usage ordered by most to least prevalent Pwned Passwords are 517,238,891 real world passwords previously exposed in data breaches. This exposure makes them unsuitable for ongoing use as they re at much greater risk of being used to take over other accounts. They re searchable online below as well as being downloadable for use in other online system. The entire set of passwords is downloadable for free below with each password being represented as a SHA-1 hash to protect the original value (some passwords contain personally identifiable information) followed by a count of how many times that password had been seen in the source data breaches. The list may be integrated into other systems and used to verify whether a password has previously appeared in a data breach after which a system may warn the user or even block the password outright.
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
thefarmdosupply.com
+1more
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
Jul 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
AOL Search Data 20M web queries (2006)
academictorrents.com
bittorrent
Updated Dec 17, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AOL (2016). AOL Search Data 20M web queries (2006) [Dataset]. https://academictorrents.com/details/cd339bddeae7126bb3b15f3a72c903cb0c401bd1
Explore at:
bittorrent(460409936)Available download formats
Dataset updated
Dec 17, 2016
Dataset authored and provided by
AOLhttp://aol.com/
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
500k User Session Collection This collection is distributed for NON-COMMERCIAL RESEARCH USE ONLY. Any application of this collection for commercial purposes is STRICTLY PROHIBITED. #### Brief description: This collection consists of ~20M web queries collected from ~650k users over three months. The data is sorted by anonymous user ID and sequentially arranged. The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research. The data set includes AnonID, Query, QueryTime, ItemRank, ClickURL. AnonID - an anonymous user ID number. Query - the query issued by the user, case shifted with most punctuation removed. QueryTime - the time at which the query was submitted for search. ItemRank - if the user clicked on a search result, the rank of the item on which they clicked is listed. ClickURL - if the user clicked on a search result, the domain portion of the URL i
e
Eximpedia Export Import Trade
eximpedia.app
Updated Oct 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Oct 2, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Barbados, Indonesia, China, Tanzania, Christmas Island, Cambodia, Mozambique, American Samoa, Mauritania, Ghana
Description
Access Leak Detector export import data including profitable buyers and suppliers with details like HSN code, Price, Quantity.
Minimizing Search Areas for Leak Detection in Water Distribution Networks -...
data.europa.eu
unknown
Updated Jun 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2022). Minimizing Search Areas for Leak Detection in Water Distribution Networks - Code [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6718875?locale=en
Explore at:
unknown(13186)Available download formats
Dataset updated
Jun 23, 2022
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database includes the code used to analyze and produce the results for the following research article: Minimizing Search Areas for Leak Detection in Water Distribution Networks by B. Snider, G. Lewis, A.S. Chen, L. Vamvakeridou-Lyroudia, S. Djordjevic, D.A. Savic. Journal of Hydroinformatics. (Accepted - awaiting publication).
v
Global exporters importers-export import data of Leakage detector
volza.com
csv
Updated Oct 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). Global exporters importers-export import data of Leakage detector [Dataset]. https://www.volza.com/p/leakage-detector/
Explore at:
csvAvailable download formats
Dataset updated
Oct 14, 2025
Dataset authored and provided by
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of exporters, Count of importers, Count of shipments, Sum of export import value
Description
712 Global exporters importers export import shipment records of Leakage detector with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
c
Using Decision Trees to Detect and Isolate Leaks in the J-2X
s.cnmilf.com
catalog.data.gov
+1more
Updated Aug 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Using Decision Trees to Detect and Isolate Leaks in the J-2X [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/using-decision-trees-to-detect-and-isolate-leaks-in-the-j-2x
Explore at:
Dataset updated
Aug 30, 2025
Dataset provided by
Dashlink
Description
Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt & Whitney Rocketdyne Fernando Figueroa, NASA Stennis Space Center Abstract The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically “learns” a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to “train” and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak _location. From the data, it “learned” a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other _location. Introduction The J-2X rocket engine will be tested on Test Stand A-1 at NASA Stennis Space Center (SSC) in Mississippi. A team including people from SSC, NASA Ames Research Center (ARC), and Pratt & Whitney Rocketdyne (PWR) is developing a prototype end-to-end integrated systems health management (ISHM) system that will be used to monitor the test stand and the engine while the engine is on the test stand[1]. The prototype will use several different methods for detecting and diagnosing faults in the test stand and the engine, including rule-based, model-based, and data-driven approaches. SSC is currently using the G2 tool http://www.gensym.com to develop rule-based and model-based fault detection and diagnosis capabilities for the A-1 test stand. This paper describes preliminary results in applying the data-driven approach to detecting and diagnosing faults in the J-2X engine. The conventional approach to detecting and diagnosing faults in complex engineered systems such as rocket engines and test stands is to use large numbers of human experts. Test controllers watch the data in near-real time during each engine test. Engineers study the data after each test. These experts are aided by limit checks that signal when a particular variable goes outside of a predetermined range. The conventional approach is very labor intensive. Also, humans may not be able to recognize faults that involve the relationships among large numbers of variables. Further, some potential faults could happen too quickly for humans to detect them and react before they become catastrophic. Automated fault detection and diagnosis is therefore needed. One approach to automation is to encode human knowledge into rules or models. Another approach is use data-driven methods to automatically learn models from historical data or simulated data. Our prototype will combine the data-driven approach with the model-based and rule-based appro
Number of accounts affected in data breaches Thailand Q2 2022-Q3 2024
statista.com
Updated Aug 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of accounts affected in data breaches Thailand Q2 2022-Q3 2024 [Dataset]. https://www.statista.com/statistics/1404553/thailand-number-of-account-breaches-exposed/
Explore at:
Dataset updated
Aug 8, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Thailand
Description
Between the second quarter of 2022 and the third quarter of 2024, the number of records exposed to account breaches in Thailand fluctuated significantly. Over ******* datasets were reported as having been leaked in the third quarter of 2024, compared to around ******* during the same quarter of the previous year.
n
QICS Paper: Impact and recovery of pH in marine sediments subject to a...
data-search.nerc.ac.uk
metadata.bgs.ac.uk
+2more
Updated Oct 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). QICS Paper: Impact and recovery of pH in marine sediments subject to a temporary carbon dioxide leak [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?orgName=Scottish%20Association%20for%20Marine%20Science
Explore at:
Dataset updated
Oct 5, 2020
Description
A possible effect of a carbon dioxide leak from an industrial sub-sea floor storage facility, utilised for Carbon Capture and Storage, is that escaping carbon dioxide gas will dissolve in sediment pore waters and reduce their pH. To quantify the scale and duration of such an impact, a novel, field scale experiment was conducted, whereby carbon dioxide gas was injected into unconsolidated sub-sea floor sediments for a sustained period of 37 days. During this time pore water pH in shallow sediment (5 mm depth) above the leak dropped >0.8 unit, relative to a reference zone that was unaffected by the carbon dioxide. After the gas release was stopped, the pore water pH returned to normal background values within a three-week recovery period. Further, the total mass of carbon dioxide dissolved within the sediment pore fluids above the release zone was modelled by the difference in DIC between the reference and release zones. Results showed that between 14 and 63% of the carbon dioxide released during the experiment could remain in the dissolved phase within the sediment pore water. This is a publication in QICS Special Issue - International Journal of Greenhouse Gas Control, Peter Taylor et. al. Doi:10.1016/j.ijggc.2014.09.006.
CO2 leakage detection
metadata.bgs.ac.uk
data-search.nerc.ac.uk
html
Updated Nov 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (2021). CO2 leakage detection [Dataset]. https://metadata.bgs.ac.uk/geonetwork/srv/api/records/d0f9cf5b-c17b-5aaf-e054-002128a47908
Explore at:
htmlAvailable download formats
Dataset updated
Nov 10, 2021
Dataset authored and provided by
British Geological Surveyhttps://www.bgs.ac.uk/
License
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
Time period covered
Oct 2, 2020 - Dec 2, 2020
Description
UKCCSRC Flexible Funding 2020. Experimental data are the acoustic emission (AE) signals collected with three AE sensors when CO2 leak from a CO2 storage cylinder under different pressures. '5MPa_20kgh-1' means the data was collected when the pressure was 5MPa and the leakage rate was 20 kg/h. The sampling frequency of AE signals is 3MHz. UKCCSRC Flexible Funding 2020: Monitoring of CO2 flow under CCS conditions through multi-modal sensing and machine learning.
v
Global exporters importers-export import data of Helium leak detector
volza.com
csv
Updated Sep 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2025). Global exporters importers-export import data of Helium leak detector [Dataset]. https://www.volza.com/p/helium-leak-detector/
Explore at:
csvAvailable download formats
Dataset updated
Sep 17, 2025
Dataset authored and provided by
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of exporters, Count of importers, Count of shipments, Sum of export import value
Description
772 Global exporters importers export import shipment records of Helium leak detector with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
d
Data from: Leak-resilient enzyme-free nucleic acid dynamical systems through...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajiv Teja Nagipogu (2025). Leak-resilient enzyme-free nucleic acid dynamical systems through shadow cancellation [Dataset]. http://doi.org/10.5061/dryad.g4f4qrfz7
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.g4f4qrfz7
Dataset updated
Jul 30, 2025
Dataset provided by
Dryad Digital Repository
Authors
Rajiv Teja Nagipogu
Description
DNA strand displacement (DSD) emerged as a prominent reaction motif for engineering nucleic acid-based computational devices with programmable behaviors. However, strand displacement circuits are susceptible to background noise that disrupts the circuit behavior, commonly known as leaks. The side effects of leaks are particularly severe in circuits with complex dynamical elements (e.g., feedback loops), as their leaks amplify nonlinearly, disrupting the circuit function. Shadow cancellation is a dynamic leak-elimination strategy originally proposed to control the leak growth in such circuits. However, the kinetic restrictions of the proposed method introduce a significant design overhead, making it less accessible. In this work, we use domain-level DSD simulations to examine the method's capabilities, the inner workings of its components, and, most importantly, robustness to practical deviations in its design requirements. First, we show that the method could stabilize the dynamics of s..., , , # Leak-resilient enzyme-free nucleic acid dynamical systems through shadow cancellation

Abbreviations

RPS: Rock-Paper-Scissors oscillator

UNIAMP: Unimolecular autocatalytic amplifier

BIAMP: Bimolecular autocatalytic amplifier

Basic commands

To run the peppercorn command to generate the *_enum.pilÂ file and the corresponding plotting data

$FOLDERÂ - The folder containing the .pilÂ file

$NAMEÂ - The name of the .pilÂ file without the file extension

$INTERMEDIATE_PREFIXÂ - Prefix of the intermediates generated

$LABELSÂ - Space separated list of the chemical species that need to be tracked

$TIMEÂ Â - Time to run the simulation forÂ

./sim.sh $FOLDER $TIME $NAME $LABELS $INTERMEDIATE_PREFIXÂ

To run the *_enum.pilÂ file in the folder $FOLDER

$NAMEÂ - The name of the .*_enum_pilÂ file without the _enum.pil

./pil.sh $FOLDER $TIME $NAME $LABELS $NAMEÂ

Produce-Helper Leak mechanism

`sLeakWaste = hcjr( fcr mcr scr + fcr( hckr...
s
QICS Paper: Detection and monitoring of leaked CO2 through sediment, water...
ckan.publishing.service.gov.uk
hosted-metadata.bgs.ac.uk
+3more
Updated Feb 26, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). QICS Paper: Detection and monitoring of leaked CO2 through sediment, water column and atmosphere in a sub-seabed CCS experiment [Dataset]. https://ckan.publishing.service.gov.uk/dataset/qics-paper-detection-and-monitoring-of-leaked-co2-through-sediment-water-column-and-atmosphere-
Explore at:
Dataset updated
Feb 26, 2020
Description
Carbon capture and storage in sub-seabed geological formations (sub-seabed CCS) is currently being studied as a realistic option to mitigate the accumulation of anthropogenic CO2 in the atmosphere. In implementing sub-seabed CCS, detecting and monitoring the impact of the sequestered CO2 on the ocean environment is highly important. The first controlled CO2 release experiment, Quantifying and Monitoring Potential Ecosystem Impacts of Geological Carbon Storage (QICS), took place in Ardmucknish Bay, Oban, in May–September 2012. We applied the in situ pH/pCO2 sensor to the QICS experiment for detection and monitoring of leaked CO2, and carried out several observations. The cabled real-time sensor was deployed close to the CO2 leakage (bubbling) area, and the fluctuations of in situ pH and pCO2 above the seafloor were monitored in a land-based container. The long-term sensor was placed on seafloor in three different observation zones. The sediment pH sensor was inserted into the sediment at a depth of 50 cm beneath the seafloor near the CO2 leakage area. Wide-area mapping surveys of pH and pCO2 in water column around the CO2 leakage area were carried out by using an autonomous underwater vehicle (AUV) installed with sensors. Atmospheric CO2 above the leakage area was observed by using a CO2 analyzer that was attached to the bow of ship of 50 cm above the sea-surface. The behavior of the leaked CO2 is highly dependent on the tidal periodicity (low tide or high tide) during the CO2 gas release period. At low tide, the pH in sediment and overlying seawater decreased due to strong eruption of CO2 gas bubbles, and the CO2 ascended to sea-surface quickly with a little dissolution to seawater and dispersed into the atmosphere. On the other hand, the CO2 bubbles release was lower at high tide due to higher water pressure, and slight low pH seawater and high atmospheric CO2 were detected. After stopping CO2 gas injection, no remarkable variations of pH in sediment and overlying water column were observed for three months. This is a publication in QICS Special Issue - International Journal of Greenhouse Gas Control, Kiminori Shitashima et. al. Doi: 10.1016/j.ijggc.2014.12.011.
QICS Paper: Small-scale modelling of physiochemical impacts of CO2 leaked...
ckan.publishing.service.gov.uk
hosted-metadata.bgs.ac.uk
+2more
Updated Feb 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2020). QICS Paper: Small-scale modelling of physiochemical impacts of CO2 leaked from sub-seabed reservoirs or pipelines within the North Sea and surrounding waters [Dataset]. https://ckan.publishing.service.gov.uk/dataset/qics-paper-small-scale-modelling-of-physiochemical-impacts-of-co2-leaked-from-sub-seabed-reserv
Explore at:
Dataset updated
Feb 26, 2020
Dataset provided by
CKANhttps://ckan.org/
Area covered
North Sea
Description
A two-fluid, small scale numerical ocean model was developed to simulate plume dynamics and increases in water acidity due to leakages of CO2 from potential sub-seabed reservoirs erupting, or pipeline breaching into the North Sea. The location of a leak of such magnitude is unpredictable; therefore, multiple scenarios are modelled with the physiochemical impact measured in terms of the movement and dissolution of the leaked CO2. A correlation for the drag coefficient of bubbles/droplets free rising in seawater is presented and a sub-model to predict the initial bubble/droplet size forming on the seafloor is proposed. With the case studies investigated, the leaked bubbles/droplets fully dissolve before reaching the water surface, where the solution will be dispersed into the larger scale ocean waters. The tools developed can be extended to various locations to model the sudden eruption, which is vital in determining the fate of the CO2 within the local waters. This is a publication in Marine Pollution Bulletin, Marius Dewar et. al. doi:10.1016/j.marpolbul.2013.03.005.
a
CrackStation's Password Cracking Dictionary (Human Passwords Only)
academictorrents.com
bittorrent
Updated Aug 10, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Defuse Security (2014). CrackStation's Password Cracking Dictionary (Human Passwords Only) [Dataset]. https://academictorrents.com/details/7ae809ccd7f0778328ab4b357e777040248b8c7f
Explore at:
bittorrent(257973006)Available download formats
Dataset updated
Aug 10, 2014
Dataset authored and provided by
Defuse Security
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The list contains every wordlist, dictionary, and password database leak that I could find on the internet (and I spent a LOT of time looking). It also contains every word in the Wikipedia databases (pages-articles, retrieved 2010, all languages) as well as lots of books from Project Gutenberg. It also includes the passwords from some low-profile database breaches that were being sold in the underground years ago. The format of the list is a standard text file sorted in non-case-sensitive alphabetical order. Lines are separated with a newline " " character. You can test the list without downloading it by giving SHA256 hashes to the free hash cracker or to @PlzCrack on twitter. Here s a tool for computing hashes easily. Here are the results of cracking LinkedIn s and eHarmony s password hash leaks with the list. The list is responsible for cracking about 30% of all hashes given to CrackStation s free hash cracker, but that figure should be taken with a grain of salt because s
d
Replication data for estimating the below-ground leak rate of a natural gas...
search.dataone.org
dataone.org
Updated Jul 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fancy Cheptonui; Riddick N. Stuart; Anna L. Hodshire; Mercy Mbua; Kathleen M. Smits; Daniel J. Zimmerle (2025). Replication data for estimating the below-ground leak rate of a natural gas pipeline using above-ground downwind measurements: THE ESCAPEâˆ’1 MODEL [Dataset]. http://doi.org/10.5061/dryad.8931zcrwq
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.8931zcrwq
Dataset updated
Jul 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
Fancy Cheptonui; Riddick N. Stuart; Anna L. Hodshire; Mercy Mbua; Kathleen M. Smits; Daniel J. Zimmerle
Time period covered
Jan 1, 2023
Description
Gas leak detectors are currently used to survey for below-ground leaks. However, measurements from gas detectors are not precise in quantifying the below-ground leak rateâ€”data in this submission aimed to quantify the below-ground leak rate of a Natural Gas pipeline., This dataset consists of raw, unprocessed data. The data was collected at 2m Above ground level using a path-integrated methane detector (measurements were sampled using equipment that reports the mixing ratios in parts per million meter). Measurements were taken above a below-ground gas leak., , # Above-Ground Methane Measurements:

*Summary of dataset contents, contextualized in experimental procedures. *

A path-integrated methane detector was set 2 m above ground level.

A laser screen was positioned 15 to 20 m opposite of the detector.

Methane measurements were transmitted every 2 seconds during measurements (varied for some data sets)

The downwind distance from the leak center was either 5 m or 10 m depending on the controlled leak rate

A total of 11 experiments were conducted.

Data and file structure

The files are .csv meaning it can be opened with Microsoft excel or converted to .txt to re-use it

The data sets contain 2 columns; 1st column is the time column, and the 2nd column is the methane mixing ratios column

The 1st column is the time column in Mountain Time (one can always convert to any time zone)

The 2nd column is in parts per million meters (ppm-m)

The data can be used to re-produce a gas plume above-ground at various distances f...
Affects of leakage on ground stability
data.wu.ac.at
metadata.bgs.ac.uk
Updated Aug 18, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (2018). Affects of leakage on ground stability [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/ZGY2YmVjMmQtODdlNi00NWYyLWI1MWYtNThhNWJlNzAyMzg4
Explore at:
Dataset updated
Aug 18, 2018
Dataset provided by
British Geological Surveyhttps://www.bgs.ac.uk/
Area covered
faee65d1d2c8e45bdffeeea52ca7a47f1cf8c023
Description
This national digital GIS product produced by the British Geological Survey indicates the potential for leakage to have a negative effect on ground stability. It is largely derived from the digital geological map and expert knowledge. The GIS dataset contains seven fields. The first field is a summary map that gives an overview of where leakage may affect ground stability. The other six fields indicate the properties of the ground with respect to the extent to which hazards associated with soluble rocks, landslides, compressible ground, collapsible ground, swelling clays and running sands will be increased due to leakage. The data is useful to asset managers in water companies, local authorities and utility companies who would like to understand where. and to what extent, leaking underground pipes or other structures may initate or worsen ground stability.
Cybersecurity Ethics and Data Leakage: A Case Study of Google Dorking in...
zenodo.org
zip
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siti Qalimatus Zahra; Devika Rahman; Aulia Salma Anjani; Nur Aini Rakhmawati; Siti Qalimatus Zahra; Devika Rahman; Aulia Salma Anjani; Nur Aini Rakhmawati (2025). Cybersecurity Ethics and Data Leakage: A Case Study of Google Dorking in East Java's Class B Public Hospitals [Dataset]. http://doi.org/10.5281/zenodo.15663907
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15663907
Dataset updated
Jun 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Siti Qalimatus Zahra; Devika Rahman; Aulia Salma Anjani; Nur Aini Rakhmawati; Siti Qalimatus Zahra; Devika Rahman; Aulia Salma Anjani; Nur Aini Rakhmawati
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
East Java
Description
Overview

This dataset is part of a cybersecurity case study focused on public hospitals (Class B) in East Java, Indonesia, specifically analyzing their exposure to data leakage via Google Dorking techniques. The research investigates ethical considerations, risk severity, and common vulnerabilities in the healthcare sector.

Files Included

List of Hospital.xlsx : Contains a list of Class B public hospitals in East Java.

List of Query.xlsx : A collection of Google Dork queries used to simulate reconnaissance activities.

Risk Assessment.xlsx : Assessment results of how many dorks triggered a hit on each hospital and the severity level.

Risk Parameter.xlsx : Defines the criteria used to assess and categorize the risk severity.

Heat Map.png : A bubble heatmap visualizing the distribution of risk ratings. X-axis: Risk code (R1–R15). Y-axis: Frequency of risk occurrences. Bubble size: Proportional to number of hospitals affected. Color gradient: Indicates increasing severity.

Key Features

Ethical analysis: This dataset is built to support studies in cybersecurity ethics, particularly around public data exposure without breaching systems.

Passive reconnaissance only: All data was obtained via open-source methods (OSINT) and public search engine indexing—no active intrusion or exploitation.

Mapped risks: Each dork query's effect was mapped onto a risk code (R1–R15) for cross-hospital comparison.

Quantitative risk levels: Uses a structured risk matrix with impact and likelihood to assign severity (e.g., R4 = 16, R9 = 24).

Ethical Considerations

No real exploitation: Only publicly indexed data was used.

Anonymization: No sensitive personal or health data is disclosed.

Purpose: Educational and awareness-based research to encourage better data hygiene in public institutions.

Suggested Use Cases

Academic research on cybersecurity in healthcare

Teaching material for IT ethics and digital reconnaissance

Risk modeling and visualization projects

Policy analysis for public digital infrastructure
Kinetics of enhanced cementation reactions for CO2 leakage remediation and...
ckan.publishing.service.gov.uk
metadata.bgs.ac.uk
+1more
Updated Mar 31, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2021). Kinetics of enhanced cementation reactions for CO2 leakage remediation and fault healing processes [Dataset]. https://ckan.publishing.service.gov.uk/dataset/kinetics-of-enhanced-cementation-reactions-for-co2-leakage-remediation-and-fault-healing-proces
Explore at:
Dataset updated
Mar 31, 2021
Dataset provided by
CKANhttps://ckan.org/
Description
This dataset presents the amount of different magnesium carbonates under different conditions. Here, using batch reactor experiments and mineralogical characterization, we explored magnesite precipitation kinetics in chemically complex fluids whereby the impact of fluid acidity and alkalinity, NaCl, and MgO nanoparticles was investigated. The dataset was created within SECURe project (Subsurface Evaluation of CCS and Unconventional Risks) - https://www.securegeoenergy.eu/. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 764531

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/

All-time biggest online data breaches 2025

Explore at:

35 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

May 26, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jan 2025

Area covered

Worldwide

Description

The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.

Clear search

Close search

Google apps

Main menu

All-time biggest online data breaches 2025

"Pwned Passwords" Dataset

Number of data compromises and impacted individuals in U.S. 2005-2024

AOL Search Data 20M web queries (2006)

Eximpedia Export Import Trade

Minimizing Search Areas for Leak Detection in Water Distribution Networks -...

Global exporters importers-export import data of Leakage detector

Using Decision Trees to Detect and Isolate Leaks in the J-2X

Number of accounts affected in data breaches Thailand Q2 2022-Q3 2024

QICS Paper: Impact and recovery of pH in marine sediments subject to a...

CO2 leakage detection

Global exporters importers-export import data of Helium leak detector

Data from: Leak-resilient enzyme-free nucleic acid dynamical systems through...

Abbreviations

Basic commands

Produce-Helper Leak mechanism

QICS Paper: Detection and monitoring of leaked CO2 through sediment, water...

QICS Paper: Small-scale modelling of physiochemical impacts of CO2 leaked...

CrackStation's Password Cracking Dictionary (Human Passwords Only)

Replication data for estimating the below-ground leak rate of a natural gas...

Data and file structure

Affects of leakage on ground stability

Cybersecurity Ethics and Data Leakage: A Case Study of Google Dorking in...

Kinetics of enhanced cementation reactions for CO2 leakage remediation and...

All-time biggest online data breaches 2025